Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Monitor Stage

Direction

Vision

Using GitLab, you automatically get broad and deep insight into the health of your deployment.

Mission

We provide a robust monitoring solution to give GitLab users insight into the performance and availability of their deployments and alert them to problems as soon as they arise. We provide data that is easy to digest and to relate to other features in GitLab. With every piece of the devops lifecycle integrated into GitLab, we have a unique opportunity to closely tie our monitoring features to all of the other pieces of the devops flow.

We work collaboratively and transparently and we will contribute as much of our work as possible back to the open source community.

Responsibilities

The monitoring team is responsible for:

This stage consists of the following groups:

These groups map to the Monitor Stage product category.

How to be successful in this stage

Team members who are successful in this stage typically demonstrate stakeholder mentality. There are many ways to demonstrate this but examples include:

This stage is only successful when each team member collaborates to make one another successful.

Rhythms

Monthly Cadence

Since GitLab releases on a monthly basis, we have supporting activities that also take place on monthly rhythms. In addition, since our releases take place on the 22nd of each month, each monthly cadence does not map to actual months of the Gregorian calendar.

Meetings

Meetings are not required but attendance/reviewing the recordings to the important ones will generally make team members successful. These are ordered in order of importance and are all stored in the Monitor Stage Calendar(Viewable to all GitLab team members)

  1. Group Weekly Meeting (Monitor:Health and Monitor:APM each have their own)
  2. Monitor Stage Demo Hour (Bi-weekly)
  3. Monitor Social Hour (weekly)

Async Daily Standups

Groups in this stage also participate in async daily standups. The purpose is to give every team member insight into what others are working on so that we can identify ways to collaborate and unblock one another as well as foster relationships within the team. We use the geekbot slack plugin to automate our async standup, following the guidelines outlined in the Geekbot commands guide.

Our questions change depending on the day of the week. Participation is optional but encouraged.

Monday

Question Why we ask it
Do you need help from anyone to unblock you this week? One of our main goals with our standups is to help ensure that we are unblocking one another as a top priority. We ask this first because we think it's the question that other team members can take action on.
What do you plan on working on this week? We want to understand how our daily actions drive us toward our weekly goals. This question provides broader context for our daily work, but also helps us hold ourselves accountable to maintaining proper scopes for our tasks, issues, merge requests, etc. This answer may stay the same for a week, this would mean things are progressing on schedule. Alternatively, seeing this answer change throughout the week is also okay. Maybe we got side tracked helping someone get unblocked. Maybe new blockers came up. The intention is not to have to justify our actions, but to keep a running record of how our work is progressing or evolving.
Any personal tidbits you'd like to share? This question is intentionally open ended. You might want to share how you feel, a personal anecdote, funny joke, or simply let the team know that you will have limited availability that afternoon. All of these answers are welcome.

Tuesday/Wednesday/Thursday

Question Why we ask it
Are you facing any blockers requiring action from others? Same reason as Monday's first question
Are you on track with your plan for the week? We want to understand how each team member is doing on achieving our week goal(s). It is meant to highlight progress while also identifying if there are things getting in the way. This could also be used to update the plan for the week as things change.
What will be your primary focus for today? This question is aimed at the most impactful task for the day. We aren't tyring to account for the entire day's worth of work. Highlighting only a primary task keeps our answers concise and provides insight into each team member's most important priority. This doesn't necessarily mean sharing the task that will take the most time. We focus on results over input. Typically this will mean highlighting the task that is most impactful in closing the gap between today and our end of the week goal(s).
Any personal tidbits you'd like to share? Same reason as Monday's last question

Friday

Question Why we ask it
What went well this week? What did you enjoy? The end of the week is a good time to reflect on our goals, and this question is meant to be a short retrospective of the week. This focusing on things that went well during the week.
What didn’t go so well? What caused you to slow down? Like the previous question, this question is a way to review our week. This one is a way to surface things that did not go so well or things that go in the way of meeting our weekly goal(s).
What have you learned? This could be something about work or personal. We hope that by sharing things we have learned that others can also learn from us.
Any plans for the weekend you'd like to share? Like the "personal tidbit" question we ask other days of the week, this one is very opened ended. You can share as much or as little as you want and all answers are welcome.

Initiatives

SRE Shadow Program

With the support of GitLab's SRE team, we implemented the SRE shadow program as a means of improving the team's understanding of our ideal user personas so that we can build a better product.

In this program, engineers are expected to devote 1 entire week to shadow SREs. There is no expectation for the engineer to complete their assigned issues during this time. Engineers are added to PagerDuty and will follow the existing SRE shadow format of interning (except scaled down to a shorter duration of 1 week). Although typical SREs on-call for multiple days at a time, shadows are only expected to shadow during their regular business hours. This can be set as a preference in PagerDuty.

Objectives

Outcomes

How to participate

Engineers interested in the program should notify their respective frontend/backend engineering managers. Managers should collaborate and determine an optimal schedule in the slack channel #monitor-sre-shadow and create an access request for PagerDuty. Assign the access request to the SRE manager (this is a departure from established processes). We are currently limited to 2 max shadows per release so that we do not overload the SRE team. If you are shadowing during the same release as another engineer, coordinate to create a combined access request for the duration of the release.

Before starting your rotation, coordinate with the SRE(s) who will be on-call to determine which areas it makes sense for you to shadow (incidents, other on-call tasks, SRE daily tasks, etc). You can either check PagerDuty or coordinate with the SRE manager to figure out who you'll be shadowing.

Alumni

Alumni of the program are encouraged to add themselves to this list and document/link to the observations/outcomes they were able to share with the wider team.

Name Outcomes
Tristan Read My week shadowing a GitLab Site Reliability Engineer
Sarah Yasonik Created 4 issues for the team to consider adding to the product

Resources

Demo Environments

In order to make it more efficient to verify changes and demonstrate our product features to customers and other stakeholders. The engineers in this stage maintain a few demo environments.

Use Case URL
Customer simulation environment tanuki-inc
Verifying features in Staging monitor-sandbox (Staging)
Verifying features in Production monitor-sandbox (Production)

Video and Tutorials