Gitlab hero border pattern left svg Gitlab hero border pattern right svg

How to Perform CMOC Duties


As the Communications Manager on Call (CMOC) it's your job to be the voice of GitLab during an incident to our users, customers, and stakeholders. To do this you must communicate with them through our status page,

The CMOC rotation is one of the rotations that make up GitLab Support On-call.

The basics of how to create, update, and close incidents in are covered by their Incident Overview documentation. However, this document covers how we specifically use to perform those tasks.

You may also be asked to contact a user on behalf of Infrastructure or Security, which may or may not be related to an Incident.

Things To Know

Before getting into the actual process of managing an incident, the following sections should be noted.

How Are Incidents Declared?

Infrastructure uses Woodhouse to declare incidents through Slack. Doing so will:

  1. Automatically page the EOC, IMOC, and CMOC.
  2. Create an issue for the incident in the gl-infra/production issue tracker.
  3. Provide a link to the Zoom call for the incident.
  4. Create a dedicated Slack channel for the incident.

This information will all be posted to Slack in the #incident-management channel by Woodhouse and it'll look similar to the following example.

Incident declared by Woodhouse

GitLab team members are encouraged to use this method of reporting incidents if they suspect is about to face one. Updates

The following should be noted specifically regarding making updates to

Frequency of Updates should be updated whenever we have new information about an active incident that our stakeholders should be aware of. Outside of that, it should be updated at a consistent rate depending on the severity of the incident as outlined in the table below.

Once you join the incident Zoom call, take note of any updates that have been made to and the time they were made at. Set a timer to remind yourself and stick to the time intervals below unless you make a note of how long it will be until the next status update. For example, if you're in "monitoring" it may be appropriate to specify an hour before the next update.

Incident Status severity::1 Update Frequency severity::2 Update Frequency severity::3/severity::4 Update Frequency
Investigating 10m 15m 15m
Identified 10m 30m 30m
Monitoring 30m 60m 60m
Resolved No further updates required    

What If I Don't Know What to Say?

EOC vs. Incident Manager

In later sections of this workflow it's called out that at times you should be asking the Incident Manager of the incident for permission to move an incident between certain states (updating, monitoring, resolving). On the rare occasion that an incident does not have an Incident Manager and EOC has assumed Incident Manager responsibilities you may ask them instead.

In some circumstances, the Incident Manager may ask you to find the number of tickets that an incident may have raised in order to evaluate the impact of the incident.

Because the default views will only show unassigned tickets in your region, start with using this Zendesk Search to find all recent tickets.

Alternatively, you can paste the following search string into the Zendesk search bar (useful if you are using Zendesk Quicktab extension): created>4hours order_by:created_at sort:desc group:none group:"support" -form:billing -form:security

This shows new tickets created in the previous 4 hours - change the range if the incident began earlier than that.

Tagging Tickets

If there is any customer contact regarding an incident regardless of severity, you should create an incident tag in Zendesk as soon as possible. You can check for customer tickets by using the tips above, by scanning the FRT & Free ticket queue and validating the tickets, or by asking the wider Support team. You can create a tag on a ticket directly by finding the tags field and using the format com_incident_####. Replace #### with the production incident number of the issue. Once you've added the tag, submit the ticket with an appropriate Incident First Response macro and the tag will become available to use on other tickets.

Tagging tickets can be done throughout the incident process but the CMOC should check the queue for accurate tagging during the incident resolution stage. The tagging of tickets is useful for gauging support impact, ease of finding related tickets for active incident troubleshooting, and ease of finding related tickets for historical reference.

For details on tagging and tracking incidents, please see Tracking Incidents workflow.

Reviewing Past Incidents

Keep in mind that you can always review past incidents if you need examples or inspiration for how to fill in the details for a current incident.

Contacting a User

Whether related to an ongoing incident or not, infra or security may ask you to reach out to one or more users if they detect unusual usage. Please follow the internal requests workflow to log the request.

Setting Up Incidents

As the CMOC you'll guide the incident through the following three stages.

  1. Stage 1: Incident Creation - During this stage we're creating the actual incident in, linking up with Reliability via the incident Zoom call, and notifying the GitLab Social team, if necessary.
  2. Stage 2: Incident Updates - During this stage we're following along with the work being performed by the EOC and any assisting engineers to resolve the incident and making updates to along the way while adhering to the Frequency of Updates schedule.
  3. Stage 3: Incident Resolution - During this stage we're focused on setting the incident to Monitoring in for a period of time to ensure that the issue does not recur before we close it out, eventually setting the incident to Resolved, and adding a link to the post-mortem issue in

The following sections outline how to perform each of the steps within these stages.

Stage 1: Incident Creation

The following steps should be taken immediately after receiving a PagerDuty page for an incident.

  1. Acknowledge the PagerDuty page.
  2. Join the incident Zoom call, provided by Woodhouse.
  3. Create the incident in
  4. Notify internal stakeholders, if necessary.
  5. Add ~Incident-Comms::Status-Page label to the GitLab Incident Issue
  6. Resolve the PagerDuty page.

PagerDuty Status

NB: "Resolved" in PagerDuty does not mean the underlying issue has been resolved.

1. Join The Incident Zoom Call

Before you create an incident in you should join the Zoom call that will be used by all GitLab team members involved in the incident. A link to the call is provided in the incident declaration by Woodhouse in the #incident-management channel.

Your role as CMOC while in this room is to follow along while the incident is worked and make updates to either when asked or when it's necessary. Oftentimes chatter in this room will be lively, especially in the early stages of an incident while the source of the issue is being discovered. Use your best judgment on when it's appropriate to speak up to avoid vocalizing at inopportune times. You can always ping anyone on the call through Slack if you need to ask a non-urgent question about the situation.

When first joining

The first thing you should do is to verify that you can be heard by others in the room. To do this, say something like:

"Hi, I'm the CMOC on duty. I intend to send an update, please review this in the Slack thread."

"Hi, I'm the CMOC on duty, how can I help?"

Whatever you choose to say, make sure that you receive a verbal acknowledgement directed at you before you move on to focus on other aspects of the incident.

When CMOC is verbally mentioned or asked to do something

From time to time, you may be asked to perform some specific tasks in the room. Always verbally acknowledge any such asks by repeating your understanding of the ask back to the requestor. This helps everyone understand that the ask was heard, and also serves to verify that everyone has the same understanding of some action to be taken.

In some cases, the ask may be implicit, rather than explicit. If you're in doubt, always speak up and ask for confirmation. For example:

IM: CMOC is here, we need to roll out a first update.

A good response would be to ask for confirmation that an action was requested:

CMOC: IM, do you want me to send a first update on

A better response would be to assume that an action was requested, relay your intended course of action in response, and give the requestor the opportunity to provide input:

CMOC: IM, acknowledged, I will draft an update for and ping you in Slack for input.

2. Create the Incident

After logging in to you should be met with the dashboard that displays various statistics about our current status. A new incident can be created by clicking New Incident along the top bar.

New incident

This takes you to the new incident screen where you'll be asked to fill in the details of the incident. The following is an example of what a new incident would look like if we're experiencing an issue with a delay in job processing on

Incident details

Change the following values:

Title - Titles should be brief and concise. The incident title should answer the question: In simple terms, what is the issue?

Current State - In nearly all cases an incident should be created in the Investigating state. If it's been communicated to you that we're aware of what is causing the current incident this could be set to Identified from the beginning.

Details - In keeping with our value of transparency, we should go above and beyond for our audience and give them as much information as possible about the incident on its creation. This field should always include a link to the incident issue from the production issue tracker so that our audience can follow along.

Incident Status - When creating a new incident this will never be Operational. The status of an incident depends entirely on its scope and how much of the platform it's impacting.

Broadcast - Always check each box in this section.

Message Subject - Always leave this at its default value.

Affected Infrastructure - This should almost always be unchecked so that the value of the Incident Status field is only applied to the specific aspects of the platform that are affected by the incident. In the example above we're only experiencing an issue with job processing so only CI/CD is selected.

3. Notify Stakeholders

Once the severity of the incident has been set and it is on our status page, the CMOC should notify internal stakeholders using the Incident Notifier application in Slack. Internal stakeholders should be notified any time there is a public post on the status page, regardless of severity.

This application prompts you to fill out a form and then posts its contents automatically to a direct message to the submitter along with the #community-relations and #customer-success channels, notifying them of the incident. To engage it:

  1. Click the lightning bolt in the message composition box within support_gitlab-com and select Incident Notifier.

    Incident Notifier Application

  2. Fill in all of the details.
  3. Click Submit
  4. Copy the contents of the form that are direct messaged to you by Slackbot and paste them in a message to the #e-group channel.
  5. Start a thread off of your initial message and provide updates to the incident after you make them to the status page.

Note: You are not required to post updates to the Incident Notifier posts made to Slack channels other than #e-group.

This process should be followed when all of the following are true:

4. Label the GitLab Incident Issue to reflect customer communications status

It is important that we are able to differentiate incidents which included outbound Status page and related notifications from those incidents which were deemed less impactful to our customers. This can be useful both in filtering for active incidents which include outbound notification as well as for after-incident reporting.

Whenever a GitLab service incident includes the use of a Status Page incident this should be identified on the GitLab Incident Issue. See this, and other uses of this scoped label in the Incident Management section of the handbook.

  1. Add the ~Incident-Comms::Status-Page scoped label to the GitLab Incident Issue

Stage 2: Incident Updates

When updating incidents, there are 2 actions to take:

  1. Update the incident.
  2. Update the E-Group slack thread if the update is material in nature.

1. Update the incident

To update an active incident, click the incidents icon from the dashboard.

Active incident dashboard icon

Then click on the edit button next to the incident.

Incident edit button

Change the following values:

  1. Current State - Change this depending on the current state of the incident and whether or not we've identified the cause (Identified) or implemented a fix (Monitoring).
  2. Details - Be as descriptive as possible about the update and include a link to the production issue.
  3. Broadcast - Check all boxes.
  4. Current Status - If the incident has improved or worsened update this value. If neither, leave it as it was from when the incident was created.
  5. Set Status Level - Uncheck this and keep only the affected component selected unless the incident has increased in scope and now affects other components of our infrastructure. IMPORTANT These must be checked individually as in the screenshot below.

A ready to be published update should look similar to the following.

Incident update

Make sure to verify the update length before publishing it. If it exceeds 280 characters, the update won't be published on twitter with no failure notification from

After publishing the update, visit the live GitLab Status Page to verify the update went through and looks clear.

2. Update the E-Group

  1. When the update would help keep the e-group informed of progress, copy/paste the update in to the #e-group slack thread that was created in Stage 1.

It is not always necessary to perform this step. The goal is to equip the e-group with information that allows them to know approximately where we are in the process of resolving the incident. For example, "no material update" type messages do NOT need to be shared on the e-group incident thread.

Stage 3: Incident Resolution

When it comes time to close an incident out as resolved, the following flow will generally be used.

  1. Switch to a monitoring state for a time.
  2. Resolve the incident.
  3. Notify the E-Group that the incident is resolved.
  4. Add a link to the production issue to the post-mortem section of the incident.

As noted in the specific sections below, some of these steps are situational and may not be used for every incident.

1. Begin Monitoring (Situational)

Once the component affected by the incident has returned to operating normally we will often switch the incident over to a monitoring period to ensure that the problem does not recur. The monitoring period typically lasts for 30 minutes by default, but it can vary and a different amount of time may be requested by the Incident Manager. It may also be requested that the monitoring period be skipped entirely.

If a monitoring period will be used simply edit the incident, and configure the update similar to the following.

Switch to monitoring

Take special note of the changes made to the following fields at this stage.

  1. Current State - Change to Monitoring.
  2. Details - Along with any information specific to the incident be sure to mention that all systems have returned to normal operation, that we're monitoring in order to ensure the issue doesn't recur, and provide an estimate for how long we'll be monitoring before we resolve the incident. For example:

    While all systems are online and fully operational, out of an abundance of caution we'll leave affected components marked as degraded as we monitor. If there are no recurrences in the next 30 minutes, we'll resolve this incident and mark all components as fully operational.

  3. Incident Status - At this point, the affected component should be back to normal operation. However, to be clear that we're still in the incident management process we will not flip this back to Operational until we leave the monitoring state.

2. Resolve Incident

Once we're confident that systems have returned to normal operation, the Incident Manager has given the all-clear, and we've completed a monitoring period (if we chose to) of the incident we should mark it as resolved.

Once these conditions are met, make an update to the incident and change the following fields.

  1. Current State - Change to Resolved.
  2. Details - State that the issue has been resolved and that systems have returned to operating normally. Be sure to also include a link to incident issue even if you've already done so in previous updates so that any users who missed them know where to go for more info.
  3. Incident Status - Change to Operational. IMPORTANT: Make sure the "Apply status level to all affected infrastructure" box is checked.
  4. Double check the status page to make sure everything looks good.

Before resolving the incident your draft should look similar to the following:

Resolve incident

3. Notify E-Group of Resolution

After the incident has been resolved on the status page, edit the Slack message you sent to #e-group and provide a final update that the incident has been resolved. If you're resolving an incident that was created by another CMOC, post this message in a thread instead and react to the post with the :white_check_mark: emoji.

Add Post-Mortem

A review will be conducted by production engineering for every incident that matches a certain criteria. allows us to add a link to a post-mortem after an incident has been resolved which will then be viewable on our status page for that specific incident.

Do the following to add a post-mortem to a resolved incident:

  1. From the dashboard click the Incidents button.

    Active incident dashboard icon

  2. Scroll down and click on the title of the incident.

    Incident history list

  3. Click Add Post-Mortem and supply the link to the issue being used for the incident review, this is usually the same issue that was opened for the incident.

    Add post-mortem link

Setting Up Maintenance Events

Infrastructure will at times plan scheduled maintenance events for, some of which will directly impact users. New maintenance events are announced as issues created in the gl-infra/production issue tracker using the issue template accompanied by the Scheduled Maintenance label.

In the event that a maintenance will affect users, infrastructure can request that the maintenance be visible on our status page, and if required, that the CMOC actively provide status updates during the maintenance window. In these cases infrastructure will apply the CMOC Required label to the issue, causing a notification to be sent to the #support_gitlab-com channel that mentions the on-call CMOC. Once this notification is received the CMOC uses the details within the issue to create the maintenance in

To create a new maintenance event, click New Maintenance from the dashboard.

New Maintenance

The contents of the maintenance should be filled out according to the details provided in the maintenance issue. Once complete, it might look something like the following.

Maintenance Details

Rescheduling a Maintenance Event

In case you are required to reschedule a maintenance window, Go to > Maintenances tab Maintenance Tab

Select the maintenance you need to reschedule. Maintenance selected

Update the new schedule time by hitting on the Reschedule Maintenance button Make sure you have the correct timezone details when updating Then hit save.

Sending Updates About Maintenance Events

To send an update about a maintenance event, such as a reminder, go to the Maintenances tab in and select the one that needs an update. On the maintenance's information page, make note of whether automatic email reminders are set to go out. If yes, make sure not to send email broadcasts for your update in order to avoid sending duplicate reminders to subscribers. Once ready to update, select the Post Update Without Starting button.

Post Update Without Starting

Enter the update details provided by the Infrastructure team and have them confirm the appropriate broadcast channels before proceeding to send the update. If "Send Reminders" was enabled in the maintenance information page, be sure not to check "Notify email subscribers" in the broadcast settings.

Broadcast Maintenance Update

Once the GitLab Status Twitter account has posted about the maintenance schedule, send a link of the tweet to the #social_media_action Slack channel to let the social team know that you'd like amplification on our GitLab brand twitter account. This should only be used once during a selected scheduled maintenance timeline, preferably mid-week prior to the scheduled maintenance.

Handover Procedure

At the end of each on-call shift its necessary to inform the next CMOC of any relevant activity that occurred during it or is still ongoing. To perform a handover create an issue in the CMOC Handover issue tracker using the Handover issue template. Create the handover issue even if nothing happened during your shift, signaling that everything is fine is also useful information. It's critical to remember that since we work out in the open by default, the CMOC Handover issue tracker is open to the public. A handover issue should be made confidential if it must contain any sensitive information.

If handover occurs during an active incident where the quick summary you'd provide in the handover issue is insufficient to properly prepare the incoming CMOC of the situation, you are encouraged to start up a quick Zoom call in the #support_gitlab-com Slack channel with the incoming CMOC. Slash commands such as the following can be used to expedite getting the meeting setup.

/zoom meeting CMOC Handover Briefing

In case there are any uncertainties around the status of an incident, please contact the Incident Manager for clarification.

CMOC Shadow PagerDuty Schedule

The CMOC Shadow Schedule can to be used by people who wish to shadow the CMOC as a learning process before acting as CMOC. A soon-to-be-CMOC can adjust the schedule to match their working hours by clicking Edit this schedule > Add Another Layer; add your username, and the days/hours that you wish to shadow.

CMOC Training Videos

It is recommended to watch this video on how to perform CMOC duties effectively: CMOC training video

Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license