Gitlab hero border pattern left svg Gitlab hero border pattern right svg

On-Call

On this page


Reporting Issues on GitLab.com

If you're observing issues on GitLab.com—or on a team that works with customers or users who are observing issues—a member of the chatops project can use the command /chatops run oncall prod in the #production Slack channel. If you're not a member of the chatops project you can ask someone who is a member to run that command for you and then add you to chatops. Login to ops.gitlab.net, change your username to be the same as on GitLab.com and then have the oncall add you with /chatops run member add USERNAME gitlab-com/chatops --ops. The GitLab ChatOps bot will return the names of the Engineer On Call (EOC) and the Incident Manager On Call (IMOC). Please @ mention the engineer in Slack and reference the GitLab issue that contains details of the issue, if one exists.

But, per communication handbook page, you aren't guaranteed an immediate response.

PagerDuty

If you need an immediate response from the EOC type /pd <and add a description of the issue> and PagerDuty will alert the Engineer On Call. To alert the IMOC, type /pd-mgr <and add a description of the issue>.

We use PagerDuty to set the on-call schedules, and to route notifications to the appropriate individual(s). There are escalation policies in place for Production issues (i.e. GitLab.com downtime), Security concerns, and Customer emergencies.

Expectations for On-Call

Swapping On-Call Duty

To swap on-call duty with a fellow on-call hero:

Customer Emergency On-Call Rotation

Reliability Engineering Team On-Call Rotation

The Infrastructure department's Reliability Engineering teams provide 24/7 on-call coverage for the production environment. There are three primary job functions with their own PagerDuty schedules: Site Reliability Engineers (SRE), Database Reliability Engineers (DBRE), and Reliability Engineering Managers. Each individual has a unique set of responsibilities. (For details, please see incident-management.)

SRE

DBRE

Managers

Security Team On-Call Rotation

More information is available in the Security Incident Response Guide.

How to page current production on-call

From Slack you can page by using the slash pd command, like so: /pd message for the on call

This will trigger high urgency notification rules and escalates as needed.

Development Team On-Call Rotation

Adding and removing people from the roster

In principle, it is straightforward to add or remove people from the on-call schedules, through the same "schedule editing" links provided above for setting overrides. However, do not change the timezone setting (located in the upper left corner of the image below) unless you absolutely most certainly intend to. As indicated in the image below, when editing a schedule (adding, removing, changing time blocks, etc.), make sure you keep the timezone setting in the upper left corner constant. If you change the timezone setting, PagerDuty will not move the time 'blocks' for on-call duty, but instead it will assume that you meant to keep the selected time blocks (e.g. "11am to 7pm") in the new timezone. As a result, your new schedule may become disjointed from the old ones (old = the schedule as set before the "change on this date" selection), and gaps may appear in the schedule.

changing pagerduty