Gitlab hero border pattern left svg Gitlab hero border pattern right svg

GitLab Support On-Call Guide

On this page


For the customers that have Priority Support, the Support Engineering Team is on-call and available to assist with emergencies. What constitues an emergency is defined in our definitions of support impact.

We take on-call seriously. There are escalation policies in place so that if a first responder does not respond fast enough another team member or members is/are alerted. Such policies aren't expected to ever be triggered, but they cover extreme and unforeseeable circumstances.

Expectations for on-call

Be alert and available

When you are on call you are expected to be available and ready to respond to PagerDuty pings as soon as possible, but certainly within the emergency response time set by our Service Level Agreements.

If you have plans outside of your work space while being on call, then being available may require bringing a laptop and reliable internet connection with you.

You should not be chained to your desk, but you should be equipped to acknowledge and action on PD alerts in a timely manner.

Be proactive in communicating your availability. Sometimes you can't be immediately available for every minute of your on-call shift. If you expect to be unavailable for a short period of time, send an FYI in Slack.

Communicate

When you get an alert, you should immediately start a Slack thread and take notes within. Tag the technical account manager - something like "cc @user" is good enough - if the customer has one. This creates visibility around the situation and opens the door to let the team join in.

Good notes in Slack will help people follow along and will help you with your follow-up email after the call.

Try to communicate complete ideas rather than snippets of thought. Something like "that's not good" as response to something happening within the call isn't as helpful as "their gitaly timings are really high".

Take and share screenshots of useful info the customer is showing you. Make sure you're not sharing anything sensitive. Let the customer know that you're doing that. "Could you pause there? I'm gonna screenshot this little bit and share it with my team".

Ask for help when needed

Rest assured that escalation is okay, and that other GitLabbers are happy to help. The care of the customers is a shared responsibility. Tag the support team if you haven't started getting help in your Slack thread. Tag the support managers if you need to escalate further.

If another support engineer joins your emergency call, feel free to assign them a role to divide up the labor.

So and so would you please (take notes, reach out to this product team and ask for help, look up the code for this and see what you can find)?

Take care of yourself

Make an effort to actively de-stress during your on-call shift. After being on-call you should consider taking time off, as noted in the main handbook. Being available for issues and outages will wear on you even if there were no pages. Resting is critical for proper functioning. Just let your team know.

When you're in a call, don't feel too much pressure to have immediate answers. You're allowed to pause for a few minutes for researching, asking for help, etc. A five minute reply is still much better than waiting for SLA email replies. Make sure to communicate and let the customer know what you're doing. Example: "I'm gonna take a few minutes to work through the code here and make sense of it".

How it works

Schedule

We do 7 days of 8-hour shifts in a follow-the-sun style, based on your location.

You can view the schedule and the escalation policy on PagerDuty. You can also opt to subscribe to your on-call schedule, which is updated daily.

Swapping On-Call Duty

To swap on-call duty with a fellow support engineer:

Starting on-call

Please double-check that your alerts are working. You can send a test page to make sure that you're being alerted appropriately.

When your on-call shift starts, you should get some sort of notification(s) that your shift is starting.

PagerDuty Alerts

  1. When an emergency is triggered, you'll receive an alert from PD. This will probably be a text, phone call, email, Slack message, and/or boot to the head (depending on your notification preferences).
  2. Acknowledge the alert in PagerDuty or Slack. This means that you are handling the emergency and are starting to look into it. After 10 minutes, if the alert has not been acknowledged, everyone on the customer on-call rotation is alerted. After a further 5 minutes, management is alerted.
  3. Compare the customer's problem statement with our definitions of support impact. If you don't think that this is an actual emergency, then ask for a second opinion and kindly let the customer know if that's the decision we make.
  4. If the situation is actually an emergency, then please reply to the customer and offer a Zoom call.
  5. Click "resolve" after you've replied to the customer. This means that you are actively handling the emergency now and will see it through.
  6. Start a thread in #support_self-managed with the ticket link. "Thread for emergency ticket LINK HERE".

  7. After 30 minutes, if the customer has not responded to our initial contact with them, let them know that the emergency ticket will be closed and that you are opening a normal priority ticket on their behalf. Also let them know that they are welcome to open a new emergency ticket if necessary.