Gitlab hero border pattern left svg Gitlab hero border pattern right svg


On this page

Reporting Issues on

If you're observing issues on—or on a team that works with customers or users who are observing issues—a member of the chatops project can use the command /chatops run oncall prod in the #production Slack channel. If you're not a member of the chatops project you can ask someone who is a member to run that command for you and then add you to chatops. Login to, change your username to be the same as on and then have the oncall add you with /chatops run member add USERNAME gitlab-com/chatops --ops. The GitLab ChatOps bot will return the names of the Engineer On Call (EOC) and the Incident Manager On Call (IMOC). Please @ mention the engineer in Slack and reference the GitLab issue that contains details of the issue, if one exists.

Please keep in mind that communication through Slack is asynchronous, so you aren't guaranteed an immediate response.


If you need an immediate response from the engineer on-call (EOC) type /pd trigger in the #production Slack channel, and choose the "gitlab-production" service. Include a brief title. To summon the incident manager on-call (IMOC), choose the "SRE Managers" service instead. If in doubt you'll want the engineer on-call. This should only be used for production emergencies.

We use PagerDuty to set the on-call schedules, and to route notifications to the appropriate individual(s). There are escalation policies in place for Production issues (i.e. downtime), Security concerns, and Customer emergencies.

Expectations for On-Call

Swapping On-Call Duty

To swap on-call duty with a fellow on-call hero:

Customer Emergency On-Call Rotation

Reliability Engineering Team On-Call Rotation

The Infrastructure department's Reliability Engineering teams provide 24/7 on-call coverage for the production environment. There are three primary job functions with their own PagerDuty schedules: Site Reliability Engineers (SRE), Database Reliability Engineers (DBRE), and Reliability Engineering Managers. Each individual has a unique set of responsibilities. (For details, please see incident-management.)


Database Reliability Engineer (DBRE)

For database-related issues the DBRE on-call should be paged. We have support from OnGres, a consultancy that specializes in Postgresql databases. Only EOC or IMOC should be paging OnGres.


Security Team On-Call Rotation

More information is available in the Security Incident Response Guide.

How to page current production on-call

From Slack you can page by using the slash pd command, like so: /pd message for the on call

This will trigger high urgency notification rules and escalates as needed.

Development Team On-Call Rotation

Adding and removing people from the roster

In principle, it is straightforward to add or remove people from the on-call schedules, through the same "schedule editing" links provided above for setting overrides. However, do not change the timezone setting (located in the upper left corner of the image below) unless you absolutely most certainly intend to. As indicated in the image below, when editing a schedule (adding, removing, changing time blocks, etc.), make sure you keep the timezone setting in the upper left corner constant. If you change the timezone setting, PagerDuty will not move the time 'blocks' for on-call duty, but instead it will assume that you meant to keep the selected time blocks (e.g. "11am to 7pm") in the new timezone. As a result, your new schedule may become disjointed from the old ones (old = the schedule as set before the "change on this date" selection), and gaps may appear in the schedule.

changing pagerduty