Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Vision - Incident Management

Introduction and how you can help

Thanks for visiting this category page on Incident Management in GitLab. This page belongs to the Health group of the Monitor stage, and is maintained by Sarah Waldner who can be contacted directly via email. This vision is a work in progress and everyone can contribute. Sharing your feedback directly on issues and epics at GitLab.com is the best way to contribute to our vision. If you’re a GitLab user and have direct knowledge of your need for incident management, we’d especially love to hear from you.

Overview

Downtime costs companies an average of $5,600/minute, according to Gartner. This number, though an estimate based on a wide range of companies, communicates that downtime is expensive for organizations. This is especially true for those who have not invested in culminating process and culture around managing these outages and resolving them quickly. The larger an organization becomes, the more distributed their systems and teams tend to be. This distribution leads to longer response times and more money lost for the business. Investing in the right tools and fostering a culture of autonomy, feedback, quality, and automation leads to more time spent innovating and building software and less time spent reacting to outages and racing to restore services. The tools your DevOps teams use to respond during incidents critically affect MTTR (Mean Time To Resolve, also known Mean Time To Repair) as well as the happiness and moral of team members responsible for the IT services your business depends on. A robust incident management platform consumes inputs from all sources, transforms those inputs into actionable incidents, routes them to the responsible party, and then empowers the response team to quickly understand and remediate the problem at hand.

Mission

Our mission is to empower DevOps teams by automating the creation of rich, relevant incidents, enabling collaboration on multiple communication platforms, and supporting continuous improvement via Post Incident Reviews and system recommendations.

Challenges

As we invest R&D in building out Incident Management at GitLab, we are faced with the following challenges:

Opportunities

We are uniquely positioned to take advantage of the following opportunities:

High-level Design

Incidents in GitLab

We are leveraging GitLab's existing Issue features as a base for Incident Management. In its simplest form, an Incident should be the single source of truth (SSOT) for understanding:

Leveraging Existing Features

Incidents will be based on GitLab issues, as mentioned above. This allows us to take advantage of the following features, accelerating how quickly can get software into the hands of customers for feedback:

Even though we are taking advantage of existing features to launch Incident Management, that does not mean we are not investing in new functionality. Read on to find out what we have planned for the future and what is up next.

Target Audience and Experience

While the Incident Management product category matures through minimal and viable, we are creating an intuitive and streamlined experience for the Operations engineer, DevOps engineer, and Developer. The features we've prioritized are oriented towards getting the right person, the right information to enable them to restore the services they are responsible for as quickly as possible. As Incident Management progresses, we will turn our focus towards features that mobilize larger, distributed DevOps teams and eventually features that provide executive management and business stakeholders insight into holistic system health and status updates on current outages.

Strategy

Maturity Plan

We are currently working on maturing Incident Management from minimal to viable and we are targetting the end of FY20 Q3. Definitions of these maturity levels can be found on GitLab's Maturity page. The following epics group the functionality we have planned to mature Incident Management.

What is Next & Why?

Collaboration with teammates and rich, relevant, and well-organized incidents accelerate the fire-fight by enabling efficient knowledge sharing, providing guidelines for resolution, and minimizing the number of tools you need to check before finding the problem. These are the table-stakes of Incident Management and the functionality that will make this product category viable.

Focus Areas

We are immediately focused on the following functionality for maturing Incident Management to viable:

Use Cases

We recognize that the viable version of Incident Management will not work for everyone. We have been strategic in the functionality that we prioritized for this maturity up-level, targetting customers in our Ultimate tier who currently align with our Single Application for the DevOps Life-cylc vision. Incident management will be most successful for customers who are:

Beyond Viable

Once we achieve viable for Incident Management, we will be pursuing the following:

What is not planned right now

These features are currently out of scope for Incident Management and are not planned for any maturity levels at this time. This does not exclude them from future considerations.

Competitive Landscape

Atlassian OpsGenie
Splunk VictorOps
PagerDuty
ServiceNOW
XMatters

Analyst Landscape

Not yet, but accepting merge requests to this document.

Top Customer Success/Sales Issue(s)

Not yet, but accepting merge requests to this document.

Top Customer Issue(s)

Not yet, but accepting merge requests to this document.

Top Internal Customer Issue(s)

Not yet, but accepting merge requests to this document.

Top Vision Item(s)

Not yet, but accepting merge requests to this document.