The primary goals of writing an Incident Review are to ensure that the incident is documented, that all contributing root cause(s) are well understood, and, especially, that effective preventive actions are put in place to reduce the likelihood and/or impact of recurrence.1
Incident reviews are conducted as close to the incident date as possible. Every Incident Review Issue must be assigned a DRI. The DRI will usually be the assignee of the associated incident but it may be someone else like the service owner. Creating corrective actions and infradev issues is one of the goals of the review process. The DRI is responsible for ensuring that associated issues are created for them and linked to the original incident.
Both async and synchronous reviews can be requested by anyone by following the steps below. If you are uncertain about how to proceed, request help from the assignees of the incident issue.
A review may either be synchronous by adding it to the agenda of the weekly incident review meeting, or done asynchronously with an Incident Review issue. As a general guideline, it is recommended to follow the incident review process for any of the following events:
A review can be optionally conducted for incidents which do not meet the above criteria but keep in mind that synchronous meetings are demanding of our time and we do our best to embrace asynchronous communication.
For all reviews, it is not necessary to complete all sections. For the sake of expediency, you can complete areas of the review which highlight what you, as the review author, want to bring to the attention of the larger organization and which drive the generation of corrective actions related to the incident.
@here An incident review issue was created for this incident with USER assigned as the DRI. If you have any review feedback please put in on ISSUE_LINK.
~review-requestedlabel to the original Incident Issue to schedule a synchronous review. Following this, you should add the incident to the Incident Review agenda document and note the DRI. If you are unsure who the DRI should be, reach out to the assignees of the associated incident. It is important that the person requesting the review also add an explanation about why the review is being requested. This will help guide the DRI and set expectations.
Incident::Review-Completedlabel can be added to the incident.
Incident review sessions are open on the GitLab Team Meetings calendar with the title
Incident Review Recurring Sessions and occur at the following two times:
Incident reviews may require customer engagement through a point of contact such as a Technical Account Manager (TAM). In case of a customer requiring a sync to discuss a finding that comes out of review, the TAM can engage with the Infrastructure management to organize the discussion with important stakeholders.
Google SRE Chapter 15 - Postmortem Culture: Learning from Failure ↩