The Support Manager On-call helps coordinate responses to urgent and important situations that arise within the scope of delivering a quality experience to GitLab customers.
The Support Manager On-call is one of the rotations that make up GitLab Support On-call.
As part of GitLab Support on-call, Support Managers serve in a rotation. The support manager on-call is responsible generally for:
Note: You (or the CMOC/CEOC) may sometimes be required to contact GitLab users on behalf of another GitLab team (such as the SIRT team). Please follow the Sending Notices workflow to action these requests.
The Support Engineer on-call is the first responder for customer emergencies. Managers support this work as follows:
At times, an emergency page may come in for a situation that is not quite yet an emergency, but may quickly become one. In this situation, we want to assist the customer in preventing the situation from becoming an emergency. If this situation arises during working hours, the Support Engineer on-call may reach out for assistance. The on-call manager should respond by finding additional staff to handle the request as a high
priority ticket that requires an immediate response. If this situation arises on a weekend, the Support Engineer on-call will reach out to the manager on-call if they are handling another emergency. In that case, the manager on-call should assist or attempt to find additional staff to assist.
See more examples of situations that might be emergencies and situations that are not emergencies.
During FY23Q4-FY24Q1, the APAC region will be trialling a pool of backup engineers that are available to reach out to during the weekend on-call hours, in the event that a concurrent emergency occurs.
If you are the Support Manager on-call and a concurrent emergency occurs, you will be paged by the Support Engineer On-call escalated via Pagerduty. You will then be responsible for checking the current situation and determining if the backup engineers need to be paged. If so, the Support Manager will then manually page the backup engineers. At this point, the backup engineers are all pinged. Only one backup engineer needs to acknowledge the page and lend assistance, and there is no expectation that backup engineers will be available to respond to a page.
To page the backup pool, you can:
/pd trigger
command in any Slack channel to create a new incident to notify the current list of support engineers; or+ New Incident
directly in PagerDuty.When prompted, update:
For further details, please refer to STM#4583.
STARs (Support Ticket Attention Requests) are handled by the Support Manager on-call.
Your responsibilities are as follows:
#support_ticket-attention-requests
Slack channel.You can use Support Team Skills by Subject to find appropriate engineers to assign.
A very high percentage of starred tickets involve licenses and renewals. For guidance in handling these, please see the Workflow for handling Plan/License Ticket Attention Requests.
NOTE: GitLab team members may attempt to draw attention to tickets in regular support Slack channels (#support_self-managed
, #support_gitlab-com
, #spt_managers
). Redirect the team member by responding to their post with only the :escalate:
emoji, which will send an automated and anonymous reply describing the correct process.
NOTE: There are two other distinct situations, not discussed on this page:
Some steps of STAR treatment are handled by bots and auto-responders. The text **BOT**
is used below to show these steps.
#support_ticket-attention-requests
, with an @mention
of the current on-call Support Manager's name.:eyes:
emoji to the Slack thread to acknowledge you are looking at the STAR.Support::Managers::Escalated Ticket
macro to crosslink the STAR Issue and discussion thread, and tag the ticket.#support_gitlab-com
, #support_self-managed
or #support_licensing-subscription
. Then return to the thread in #support_ticket-attention-requests
and comment that all technical discussion is happening in the ticket (or in the new thread). This helps ensure all technical discussion stays in one channel/thread.
There are times when a STAR does not meet the threshold for additional attention. See the main STAR page for details. In such situations, return to the thread in #support_ticket-attention-requests
and notify the initiator.
A STAR is considered resolved when the correct next-step is identified and underway; it does not require the Zendesk ticket to be Solved or Closed.
When a STAR is resolved:
:green-check-mark:
emoji to the notification in #support_ticket-attention-requests
.~Escalation::License-Issue
: Identifies the core issue at hand resolves around licensing / subscriptions~Escalation::Response-Time
: Useful when the purpose of the request is to expedite a response to an issue or caseMid-ticket feedback link – each Public Comment from a GitLab Support Engineer or Manager has a link to a form where a customer can provide feedback or request contact from a manager while the ticket is open (introduced in issue 2913). This feedback form creates issues in the customer feedback project, with a subject format of Positive/Negative/Neutral feedback for ticket nnnnnn, and is automatically assigned to the SSAT reviewing manager. If the feedback is negative, there is an option to request manager contact (within 48hrs Mon-Fri). If this option is chosen, a Slack notification is sent to the #support_ticket-attention-requests channel. The following action should be taken promptly:
When GitLab experiences a security incident, the Support Manager on-call is responsible for triaging and responding to customer communications stemming from the security incident. This may include involving the CMOC.
Upgrade assistance requests are currently triaged by engineers as part of the Working on Tickets but in some cases the triaging agent(s) may need assistance from Support management.
If you will be unable to handle on-call for a few hours on a weekday due to being engaged in a customer call or otherwise, arrange for another manager to handle on-call responsibilities temporarily:
#spt_managers
for any manager to volunteer to cover.To swap your on-call duty with someone, follow the steps listed under Swapping on-call duty.
At times, you may receive an escalation where the customer is reporting a situation that qualifies for emergency support under our definitions of support impact. In such cases you may elect to trigger an emergency directly, rather than asking the customer to open a new emergency ticket.
You can trigger a PagerDuty notification by using the Support::Managers::Trigger manual emergency
macro in Zendesk.
Alternatively, you can manually trigger a PagerDuty notification through PagerDuty itself.
Login to gitlab.pagerduty.com and select + New Incident from the upper right corner. Then fill out the form as follows:
No other fields need to be filled out, therefore you may then click Create Incident
Special handling notes are documented on the customer emergencies on-call workflow. As a Support Manager, you are empowered to handle these (and other) unique situations according to your judgment. If you need help or advice, don't hesitate to escalate to unblock.
We advise Support Engineers to contact a Support Manager before offering a call in the case of a compromised instance.
Support's role in these cases is to help the customer get to a good, known working state as quickly as possible. The fastest route will be to restore to a previously known good state (most often by restoring from a backup). Customers with an instance in this state will have other concerns though, and likely be in a heightened emotional state:
If moving towards a call is the right thing to do, consider joining the call before (or instead of) the engineer to communicate the scope of what can be accomplished.
Example framework for a call we establish (or a bridge call the customer is leading):
Hi
customer
. Based on the ticket it sounds likely that your instance is compromised. In cases like these we've prepared a set of best practices (GitLab internal link) to help you get back up and running as quickly as possible. We're here to support and advise where GitLab is concerned. Unfortunately, GitLab cannot provide a one-size-fits-all solution or comprehensive checklist to completely secure a server that has been compromised. GitLab recommends following your organization's established incident response plan whenever possible.
The first step is to shut down the instance, create a new one at the same version, and restore your most recent backup. This ensures you are operating on a "clean" environment, where you have confidence that all the software installed is unmodified. Please get that process started; we are monitoring this ticket with HIGH priority. If you have any problems getting set up or restoring, please let us know in the ticket immediately.
After your new instance is set up, you need to upgrade to a more recent version of GitLab before you expose this server to the public Internet. If you have any trouble with the upgrade process, let us know in the ticket immediately.
Finally, as described in the recovery guide previously sent (should have been shared in the ticket via the Compromised Instance Zendesk macro, you should do an audit of the integrity of GitLab itself: checking for any users, code, webhooks, runners or other settings that you did not enable yourselves. If you have any questions, please let us know in the ticket.
I'm going to leave the call now, but rest assured that we're on standby and ready to help as you work through this.