Gitlab hero border pattern left svg Gitlab hero border pattern right svg

On-Call

On this page


Reporting Issues on GitLab.com

If you're observing issues on GitLab.com—or on a team that works with customers or users who are observing issues—a member of the chatops project can use the command /chatops run oncall prod in the #production Slack channel or in a direct message (DM) with the ChatOps app. If you're not a member of the chatops project, you can ask request access.

The GitLab ChatOps bot will return the names of the Engineer On Call (EOC) and the Incident Manager On Call (IMOC). Please @ mention the engineer in Slack and reference the GitLab issue that contains details of the issue, if one exists.

Please keep in mind that communication through Slack is asynchronous, so you aren't guaranteed an immediate response.

PagerDuty

If you need an immediate response from the engineer on-call (EOC) type /incident declare in the #incident-management Slack channel.

We use PagerDuty to set the on-call schedules, and to route notifications to the appropriate individual(s). There are escalation policies in place for Production issues (i.e. GitLab.com downtime), Security concerns, and Customer emergencies.

Expectations for On-Call

Swapping On-Call Duty

To swap on-call duty with a fellow on-call hero:

Changing the rotation of the current schedule

Customer Emergency On-Call Rotation

Reliability Engineering Team On-Call Rotation

The Infrastructure department's Reliability Engineering teams provide 24x7 on-call coverage for the production environment. For details, please see incident-management.

Site Reliability Engineers (SREs)

Database Support

For database-related issues we have support from OnGres, a consultancy that specializes in Postgresql databases. Only a responding EOC or IMOC should ever page OnGres.

Managers

How to page current production on-call

From Slack you can page by using the /pd trigger slash command. This will trigger high urgency notification rules and it will escalate as needed.

Security Team On-Call Rotation

Security Operations (SecOps)

More information is available in the Security Operations On-Call Guide and the Security Incident Response Guide.

Security Managers

Development Team On-Call Rotation

Quality Team On-Call Rotation

Adding and removing people from the roster

In principle, it is straightforward to add or remove people from the on-call schedules, through the same "schedule editing" links provided above for setting overrides. However, do not change the timezone setting (located in the upper left corner of the image below) unless you absolutely most certainly intend to. As indicated in the image below, when editing a schedule (adding, removing, changing time blocks, etc.), make sure you keep the timezone setting in the upper left corner constant. If you change the timezone setting, PagerDuty will not move the time 'blocks' for on-call duty, but instead it will assume that you meant to keep the selected time blocks (e.g. "11am to 7pm") in the new timezone. As a result, your new schedule may become disjointed from the old ones (old = the schedule as set before the "change on this date" selection), and gaps may appear in the schedule.

changing pagerduty

GIT is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license