The Support Manager On-call helps coordinate responses to urgent and important situations that arise within the scope of delivering a quality experience to GitLab customers.
The Support Manager On-call is one of the rotations that make up GitLab Support On-call.
As part of GitLab Support on-call, Support Managers serve in a rotation. The support manager on-call is responsible generally for:
Needs Org / FRT
view. Ping your region in #support_team-chat
if tickets are building up to remind people to take new tickets. See Meeting our FRT SLA for details of how we meet our FRT SLA.The Support Engineer on-call is the first responder for customer emergencies. Managers support this work as follows:
Support Escalations are handled by the Support Manager on-call.
Your responsibilities are as follows:
#support_escalations
.You can use Support Team Skills by Subject to find appropriate engineers to assign.
A very high percentage of support escalations involve licenses and renewals. For guidance in handling these escalations, please see the Plan/License Escalations Workflow page.
NOTE: GitLab team members may attempt to draw attention to tickets in regular support Slack channels (#support_self-managed
, #support_gitlab-com
, #spt_managers
). Any such attempt constitutes an escalation. Redirect the team member by responding to their post with only the :escalate:
emoji, which will send an automated and anonymous reply describing the correct process.
NOTE: There are two other distinct situations, not discussed on this page:
Some steps of escalation management are handled by bots and auto-responders. The text **BOT**
is used below to show these steps.
#support_escalations
, with an @mention
of the current on-call Support Manager's name.:eyes:
emoji to acknowledge you are looking at the escalation.Support::Managers::Escalated Ticket
macro to crosslink the escalation issue, discussion thread and tag the ticket as escalated.#support_gitlab-com
, #support_self-managed
or #support_licensing-subscription
. Then return to the thread in #support_escalations
and comment that all technical ticket-related discussion is happening in the ticket (or in the new thread). This helps ensure all technical discussion stays in one channel/thread.
There are times when an escalation request does not meet the threshold for escalation. In such situations, return to the thread in #support_escalations
and notify the escalation initiator.
An escalation is considered resolved when the correct next-step is identified and underway; it does not require the Zendesk ticket to be Solved or Closed.
When an escalation is resolved:
:green-check-mark:
emoji to the escalation notification in #support_escalations
.~Escalation::License-Issue
: Identifies the core issue at hand resolves around licensing / subscriptions~Escalation::Response-Time
: Useful when the purpose of the escalation is to expedite a response to an issue or caseWhen GitLab experiences a security incident, the Support Manager on-call is responsible for triaging and responding to customer communications stemming from the security incident. This may include involving the CMOC.
Upgrade assistance requests are currently triaged by engineers as part of the Working on Tickets but in some cases the triaging agent(s) may need assistance from Support management.
If you will be unable to handle on-call for a few hours on a weekday due to being engaged in a customer call or otherwise, arrange for another manager to handle on-call responsibilities temporarily:
#spt_managers
for any manager to volunteer to cover.To swap your on-call duty with someone, follow the steps listed under Swapping on-call duty.
At times, you may receive an escalation where the customer is reporting a situation that qualifies for emergency support under our definitions of support impact. In such cases you may elect to trigger an emergency directly, rather than asking the customer to open a new emergency ticket.
You can trigger a PagerDuty notification by using the Support::Managers::Trigger manual emergency
macro in Zendesk.
Alternatively, you can manually trigger a PagerDuty notification through PagerDuty itself.
Login to gitlab.pagerduty.com and select + New Incident from the upper right corner. Then fill out the form as follows:
No other fields need to be filled out, therefore you may then click Create Incident
Special handling notes are documented on the customer emergencies on-call workflow. As a Support Manager, you are empowered to handle these (and other) unique situations according to your judgment. If you need help or advice, don't hesitate to escalate to unblock.
We advise Support Engineers to contact a Support Manager before offering a call in the case of a compromised instance.
Support's role in these cases is to help the customer get to a good, known working state as quickly as possible. The fastest route will be to restore to a previously known good state (most often by restoring from a backup). Customers with an instance in this state will have other concerns though, and likely be in a heightened emotional state:
If moving towards a call is the right thing to do, consider joining the call before (or instead of) the engineer to communicate the scope of what can be accomplished.
Example framework for a call we establish (or a bridge call the customer is leading):
Hi
customer
. Based on the ticket it sounds likely that your instance is compromised. In cases like these we've prepared a set of best practices (GitLab internal link) to help you get back up and running as quickly as possible. We're here to support and advise where GitLab is concerned. Unfortunately, GitLab cannot provide a one-size-fits-all solution or comprehensive checklist to completely secure a server that has been compromised. GitLab recommends following your organization's established incident response plan whenever possible.
The first step is to shut down the instance, create a new one at the same version, and restore your most recent backup. This ensures you are operating on a "clean" environment, where you have confidence that all the software installed is unmodified. Please get that process started; we are monitoring this ticket with HIGH priority. If you have any problems getting set up or restoring, please let us know in the ticket immediately.
After your new instance is set up, you need to upgrade to a more recent version of GitLab before you expose this server to the public Internet. If you have any trouble with the upgrade process, let us know in the ticket immediately.
Finally, as described in the recovery guide previously sent (should have been shared in the ticket via the Compromised Instance Zendesk macro, you should do an audit of the integrity of GitLab itself: checking for any users, code, webhooks, runners or other settings that you did not enable yourselves. If you have any questions, please let us know in the ticket.
I'm going to leave the call now, but rest assured that we're on standby and ready to help as you work through this.