Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Product Direction - Monitor

On this page

This is the product direction for Monitor. If you'd like to discuss this direction directly with the product managers for Monitor, feel free to reach out to Kevin Chu (Group PM of Monitor) (GitLab, Email Zoom call).

MANAGE SECURE PLAN RELEASE PACKAGE DEV OPS CREATE VERIFY CONFIGURE PROTECT MONITOR
Monitor

Vision

Keeping applications available and performant is table stakes for every business.

Our vision is to make every GitLab project observable by default, with monitoring tool that is easy to operate without specialized, expert skills. Teams can connect the dots between every deployment, incident, and other noteworthy events using and collaborating with telemetry data, which ultimately decreases the frequency and severity of production issues.

Market

The Monitor stage directly competes in several markets defined within our Ops Section, including Application Performance Monitoring (APM), Log Management, Infrastructure Monitoring, IT Service Management (ITSM), Digital Experience Management (DEM) and Product Analytics. The total addressable market for the Monitor stage is projected to be $2.7 billion by 2024.

These markets are competitive and innovative, with winning companies achieving spectacular growth as businesses continue to shift online.

Successful vendors, such as market leader Datadog are leveraging a platform strategy to expand their markets (for example, see DataDog's acquisition of Undefined Labs to expand beyond production applications to provide code insights during development, or their expansion to incident management in 2020). Competition among market leaders today is also geared toward making the whole stack observable for enterprises. New Relic's updated business model reflects the need for vendors to capture the increasing footprint (and spend) of enterprises while enabling future growth by making a significant part of their business free.

Focus

The Monitor stage currently consists of two teams. The Observability group will be focused on bringing observability, including Metrics, Tracing, and Logging to market within the GitLab platform. The Response group (Name TBD) will be focused on Incident Management, and contributing toward deployment by adding Continuous Verification capabilities to the platform.

We will not actively work on the following categories/capabilities:

Build a complete DevOps platform with monitoring out-of-the-box.

  1. With development shifting cloud-native and massive community-driven investment in tools and patterns, the opportunity to build boring solutions on top of the cloud-native solutions plays right to GitLab's strength.
  2. Instrumentation is commoditized. GitLab will not need to invest in agents since OpenTelemetry and most vendor agents are all open source and designed to work with multiple backends.
  3. Out-of-the-box monitoring capabilities saves time and money and lower the bar on the expertise required for enterprises and start-ups. The ease by which most users can start monitoring their service, using established vendors, such as DataDog or New Relic, and newer competitors like Honeycomb, is something we should strive to emulate, but do so using open source tools.
  4. Shift left. Monitoring is traditionally for production, there are opportunities to shift monitoring tools and techniques left so that developers can benefit from monitoring in development and staging environments.

Challenges

  1. Monitoring vendors offer generous free tiers (e.g. New Relic and Honeycomb) for smaller companies and complete solutions for enterprises.
  2. Huge investments are being made by market leaders. Market leaders are also expanding the scope of their solutions. This makes them more sticky with their customers.
  3. Monitoring must meet a high bar to be trusted in production. Running large scale monitoring systems is difficult and will be a big challenge for GitLab.

Newsworthy

On Dec 14, 2021, GitLab announced the acquisition of Opstrace. Opstrace is an observability distribution that will become integrated and usable out-of-the-box for all GitLab users. Using Opstrace, users gain the benefit of having a full observability platform, starting with Metrics, powered by Prometheus.

GitLab + Opstrace = On by Default Observability

Over the coming months, we will focus on integrating Opstrace into GitLab. Observability will be available, by default, for both GitLab SaaS and Self-Managed users, starting at the free tier. Using GitLab’s observability capability, you will be using a completely open-sourced platform. You do not have to worry about vendor lock-in from instrumentation to alerting.

Organizations often have the choice of using an observability vendor or building their own observability platform. With the former, teams outsource the problems of operating a system that needs to be scaled and consistently updated but are locked-in to proprietary software. With the latter, teams have to manage the complexity of an observability platform, and figure out how to make all the open-source components work together. We want to make the choice easy to make.

We are different because we won’t approach observability by reinventing the wheel with yet another observability storage backend. Rather, we will focus on making the toil of operating open-source observability tools easy. For example, instead of worrying about how to operate Prometheus at scale, simply add a couple of lines to your existing Prometheus configuration and we will take care of the rest. Instead of staffing a team to keep toolings up to date, you can easily and confidently upgrade with the observability distribution. Furthermore, this observability tool is integrated with the rest of GitLab so you have a single tool to build, test, collaborate, deploy, and monitor your applications.

Now that Opstrace is part of GitLab, one of the first things we’ll do is integrate Opstrace into GitLab. For additional details on the integration progress, follow this GitLab epic.

Over time, we’ll add additional delightful experiences such as:

  1. Add tracing to the Opstrace/GitLab stack. Tracing helps users understand the flow of requests and is particularly useful when you need to debug microservices-based applications.
  2. Add logging to the Opstrace/GitLab stack. Aggregating logs in a single, searchable interface helps operators find the relevant log message quickly.
  3. Cross-reference different observability data types. Making it easy to go from a metric, to the related log messages or to jump from a long-running trace to the impacted metric helps teams understand more holistically what is happening with their application.
  4. Enable more collaboration for teams when triaging and investigating issues using observability data. Using observability tools has mostly been a siloed activity until now. We want to enable teams to be able to tag each other, easily show observability data into GitLab issues and MRs, so that teams can collaborate more effectively without having to rely on synchronous communication.

Partnerships with Observability vendors

Observability is a cornerstone of a complete DevOps platform. As such, GitLab will include an on-by-default observability solution. In addition, we plan to build a vendor-agnostic continuous capability, enabling and encouraging partners to add their own solutions, thereby expanding customer choice.

What’s next?

  1. The Opstrace Group will work to help all GitLab customers monitor their apps by providing a simple on-by-default observability stack. We will know if we are trending in the right direction by looking at the Opstrace Monthly Active Users (internal link to be added); adoption should grow if we are providing value to our users.
  2. The Respond Group will continue to complete the incident management workflow. Using Incident management, teams can manage the coordination during an outage and have the most important information captured for sharing, learning, and future improvements. The main product indicator is currently the number of unique users that interact with alerts and incidents.

Deprecation of previous capabilities

GitLab users previously can monitor their services and applications by leveraging GitLab to install Prometheus to a GitLab-managed cluster. Similarly, users can also install the ELK stack to do log aggregation and management. Lastly, users can set up a Jaeger integration to trace for their applications.

With the acquisition of Opstrace, and the announcement to deprecate certificate-based integrations, we will be deprecating the features that currently exist in the Metrics, Logging, and Tracing categories. We also plan to schedule them for removal in GitLab 15.0

Letters from the Editor

Respond Group

TL;DR - For the next several milestones, the Respond Group will focus on dogfooding alerts, finishing Escalating manually created incidents and working to complete Incident Timelines. We will not be working on-call schedules or escalation policies for the next two quarters.

To the GitLab Community and customers,

It's been a busy and eventful year! We released On-Call Schedules, Escalation Policies, and our internal teams dogfooded and adopted GitLab Incidents. We received a lot of feedback from customers and many questions from the community. Thank you for all of your help over the last year!

Our Product Manager and two of our Front-end Engineers made internal transfers to other teams or received well-deserved promotions! We now have a new Product Manager, Alana Bellucci. She has been focusing on building more depth in our alerts and incidents to facilitate dogfooding and user adoption.

We have an opening on our team for a Sr. Frontend Engineer. This team has some interesting, challenging projects on our roadmap. Consider checking out the job posting and applying if would like to join us!

In FY22, we saw a lot of users accidentally creating Incidents. We originally thought we were seeing increased user adoption for Incidents. After hearing feedback from our users and recognizing that this was an issue, we worked to make changes to our documentation and permissions for who can create Incidents. While we initially saw a decline in users earlier this year, we have since seen alert and incident adoption reach an all-time high!

For FY23, we are working to move the Incident Management category from viable to complete and the On-Call Schedule Management category from minimal to viable. Outlined below are some of our current priorities and features that we would like to complete by the end of FY23. Some of these may shift and change, quarterly priorities are being tracked in epics on Monitor's Quarterly Direction Board.

Priorities for feature work:

Incident Timelines Escalating Manually Created Incidents Dogfooding Alerts Related Alerts Links widget for types of issues

What's coming within the next year?:

Routing Rules for Alerts Scheduled Overrides

If you have any questions, please feel free to comment on any above issues or epics.

Thank you for reading, Alana!

Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license