Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Direction - Metrics

Metrics

   
Stage Monitor
Maturity Viable

Introduction and how you can help

Thanks for visiting this category strategy page on Metrics in GitLab. This category belongs to and is maintained by the APM group of the Monitor stage.

This strategy is a work in progress, and everyone can contribute:

Background

Metrics help users understand how their applications are performing, and if they are healthy. Examples of common metrics include response metrics like latency and error rate, system metrics like cpu and memory consumption, as well as any other type of telemetry desired.

Actions and insights can then be derived from these metrics like setting Service Levels, Error Budgets, as well as triggering alerts.

Target audience and experience

Metrics are an important tool for all users across the DevOps spectrum. From pure developers who should understand the performance impact of changes they are making, as well as pure operators who are responsible for keeping production services online.

The target workflow includes a few important use cases:

  1. Configuring GitLab to monitor an application should be as easy as possible. To the degree possible given the environment, we should automate this activity for our users.
  2. Dashboards should automatically populate with relevant metrics that were detected, however still offer flexibility to be customized as needed for a specific use case or application. The dashboards themselves should offer the visualizations required to best represent the data.
  3. When troubleshooting, we should offer the ability to easily explore the data to help understand potential relationships and create/share one off dashboards.
  4. Alerts should be easy to create, and provide a variety of notification options include Issues and third party services like Slack. It would also be great if we could provide some automatic detection of outliers/anomalies, and out of the box alerts based on best practices.
  5. Service Level Objectives should be able to be defined, with corresponding impact on Error Budgets when they are not met.

The experience today offers our users to deploy Prometheus instance into a project cluster quickly. Once deployed, it will automatically collect key metrics from the running application (% of error rate, latency, and throughput). If you already have a running Prometheus deploy, you can connect to an external Prometheus and presents metrics on charts within GitLab UI.

What's Next & Why

TThe APM team current focus is on Dogfooding metrics. The team is leading the initiative of migrating all the useful dashboards our infrastructure team is using for monitoring Gitlab.com from Grafana to GitLab metrics charts. Today the team was able to migrate 20 dashboards while doing so successfully we've identified critical and non-critical gaps. Those issues will enable our Infrastructure team to start using GitLab metrics charts instead of Grafana. This will initiate a feedback loop to improve our solution.

Maturity Plan

Competitive Landscape

Datadog and New Relic are the top two competitors in this space.