In order to effectively manage the various elements during an Incident (both internal and customer specific), it is necessary to define the roles and responsibilities of the participants. Having a clear set of guidelines to facilitate the management of the incident reduces confusion and improves the effectiveness of communication. This iis the first iteration of a high level Incident Management process for GitLab. This is loosely based on ITIL and ISO 20000, but will not adopt these methodologies in their entirety.
Incident Manager - responsible for ensuring the Incident Team adheres to the Incident Management framework and involves appropriate teams/team members in the Incident Analyst/Group for technical troubleshooting and Service Restoration.
Incident Coordinator - facilitates communication between the Incident Team and the business as well as ensuring SLA’s are met at all stages of the Incident. Also provides updates to the affected customer/customers through the appropriate channel.
Incident Analyst/Group - responsible for technical troubleshooting and returning the service to normal operation as quickly as possible. Initial focus is on Service Restoration, not Root Cause Analysis nor Service Improvement.
Executive Sponsor - where necessary, a member of leadership who will engage with customers’ leadership team/point of contact.
Severity 0 - multiple customers experiencing severe or total service degradation, unable to perform basic functionality with no workaround.
Severity 1 - single customer experiencing severe or total service degradation, unable to perform basic functionality with no workaround.
Internal Executive Summary (to be sent to GitLab Executive team)
Current Status of Incident:
Current Actions being performed:
Link to Slack channel: