When you're working on an incident, every second counts. Team members and leadership are looking for updates. Any interruption can make you lose track of where you were. Finding the root cause or working on a code change to resolve the incident requires time and focus. After the incident is resolved, you'll need to provide a summary of what happened during the post-incident review. How can you provide updates and keep track of important events while working on the incident?
Incident timelines with GitLab
GitLab recently launched incident timelines. Incident timelines are the single source of truth (SSoT) for key updates and events that happen during an incident. They typically include things like when the incident was declared, who is actively working on the incident, and other important events during the incident; i.e. "Disabling Canary to test a hot fix."
Updating the timeline needs to be done quickly and efficiently. Use GitLab quick actions to add multiple timeline items programmatically.
Or add any comment from the incident to the timeline by clicking on the clock icon. This helps avoid the unnecessary shoulder taping for updates so users can focus on firefighting.
When you're at the end of your on-call shift, you can share the timeline as you hand off the incident to summarize what's happened so far. If you've missed adding something important to the timeline, you can always add the event retroactively and post-date it to the correct time. When you wake up for your next shift, you can review what happened while you were away.
Keeping a record with incident timelines
Once an incident has been resolved, it can be hard to piece together what actually happened. Sometimes, post-incident reviews don't happen until days after you've worked on the incident. Did the incident originate from an alert or was it from a customer email? Did we meet our Service Level Agreement (SLA)? Since you've kept track along the way, incident timelines can be a quick way to refresh your memory on what happened during the incident.
Establishing incident timelines as a SSoT minimizes the time spent on incident "paperwork." This gives you time to focus on resolving the incident. Once the incident resolves you can review with team members to minimize the chance of the same incident occurring again.
The GitLab Infrastructure Team has been testing dogfooding and using incident timelines. We'd love to hear about how you are constructing and recording what happens during an incident. You can also take a look at Improving the Incident Timeline and help influence what we build next.
“What's happening? Keeping track of incident timelines with @gitlab's Incident Management.” – Alana Bellucci
Click to tweet