Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Monitor Stage

Groups

The groups within this stage are:

Vision

Using GitLab, you automatically get broad and deep insight into the health of your deployment.

Mission

We provide a robust monitoring solution to give GitLab users insight into the performance and availability of their deployments and alert them to problems as soon as they arise. We provide data that is easy to digest and to relate to other features in GitLab. With every piece of the devops lifecycle integrated into GitLab, we have a unique opportunity to closely tie our monitoring features to all of the other pieces of the devops flow.

We work collaboratively and transparently and we will contribute as much of our work as possible back to the open source community.

Responsibilities

The monitoring team is responsible for:

This team maps to Monitor Stage.

Async Daily Standups

We use the geekbot slack plugin to automate our async standup, following the guidelines outlined in the Geekbot commands guide. Answers are concise and focused on top priority items. All question prompts are optional and only answered when the information should be surfaced to the team:

Recurring Meetings

Every-other week we have a Monitor Stage Demo Hour for engineering and design demos by members of the Monitor Stage group. Demos are voluntary and on a sign-up basis.

There is also an optional Monitor Social Hour meeting every week. This call has no agenda and alternates times every other week to be more inclusive of team members in different time zones.

The Health and APM groups have their own regular meetings as well.

Retrospective

We follow the same retrospective process as the rest of the engineering department, which can be found here.

To encourage a more iterative retrospective process, we create a new retrospective issue at the beginning of each milestone, using the Monitor retrospective template. We leave this issue open for the duration of the milestone so any team member can add feedback as it happens instead of waiting until the end of the milestone.

Monitor Stage PTO

Just like the rest of the company, we use PTO Ninja to track when team members are traveling, attending conferences, and taking time off. The easiest way to see who has upcoming PTO is to run the /ninja whosout command in the #g_monitor_standup slack channel. This will show you the upcoming PTO for everyone in that channel.

SRE shadow program

Not everyone in the Monitor stage has a background that resonates with our primary user personas:

In this program, engineers are expected to devote 1 entire week to shadow SREs. There is no expectation for the engineer to complete their assigned issues during this time. Engineers are added to PagerDuty and will follow the existing SRE shadow format of interning (except scaled down to a shorter duration of 1 week). Although typical SREs on-call for multiple days at a time, shadows are only expected to shadow during their regular business hours. This can be set as a preference in PagerDuty.

Objectives

Outcomes

How to participate

Engineers interested in the program should notify their respective frontend/backend engineering managers. Managers should collaborate and determine an optimal schedule in the slack channel #monitor-sre-shadow and create an access request for PagerDuty (and assign to the SRE manager). We are currently limited to 2 max shadows per release so that we do not overload the SRE team.

Alumni

Alumni of the program are encouraged to add themselves to this list and document/link to the observations/outcomes they were able to share with the wider team.

Name Outcomes
Tristan Read My week shadowing a GitLab Site Reliability Engineer

Useful Resources