Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Direction - Metrics

Metrics

Introduction and how you can help

Thanks for visiting this category strategy page on Metrics in GitLab. This category belongs to and is maintained by the APM group of the Monitor stage.

Please share feedback directly via email, Twitter, or on a video call. If you're a GitLab user and have direct knowledge of your Metrics usage, we'd especially love to hear your use case(s).

What are metrics

Metrics help users understand the health and status of your services and essential for ensuring the reliability and stability of those services., metrics normally represents by raw measurements of resource usage over time (e.g. measure memory usage every 10 second). Some metrics represent the status of an operating system (CPU, memory usage). Other types of data tied to the specific functionality of a component (requests per second, latency or error rates). The most straightforward metrics, to begin with, are those already exposed by your operating system hence easier to collect (e.g. Kubernetes metrics). For other components, especially your applications, you may have to add code or interfaces to expose the metrics you care about. Exposing metrics is sometimes known as instrumentation, the collection of metrics from an end point is called scraping.

Our mission

Provide users with information about the health and performance of their infrastructure, applications, and system for insights into reliability, stability, and performance.

Target audience

Metrics are essential for all users across the DevOps spectrum. From developers who should understand the performance impact of changes they are making, as well as operators responsible for keeping production services online.

Our vision is that in 1-2 years, GitLab metrics is the main day-to-day tool for monitoring cloud-native applications for SMBs.

We are targeting Software Developer and DevOps Engineer working in a SMB from the following reasons:

Since the team's current focus is dogfooding metrics, our immediate target audience is the GitLab Infrastructure Team. We plan to build the minimal work needed for them to start dogfooding our metrics dashboard before shifting focus to consider the overall needs of the target audience mentioned above.

Current experience

The experience today offers you to deploy Prometheus instance into a project cluster by a push of a button, the Prometheus instance will run as a GitLab managed application. Once deployed, it will automatically collect key metrics from the running application which are displayed on an out of the box dashboard. Our dashboards provide you with the needed flexability to display any metric you desire you can set up alerts, configure variables on a generic dashboard, drill into the relavant logs to troubleshoot your service and moreā€¦ If you already have a running Prometheus deploy into your cluster simply connect it to your GitLab and start using our GitLab metrics dashboard.

Targeted workflows

The target workflow, listed below, is our high-level roadmap. It is based on competitive analysis, user research, and customer interviews. The details of each workflow are listed in the epics and issues.

Getting data in

The first step in application performance management is collecting the proper measurements or telemetry data. Instrumenting critical areas and reporting metrics of your system are prerequisites to understanding the health and performance of your services and application. Our metric solution is powered by Prometheus, targeting users of Kubernetes. We need to make sure our users can successfully

  1. Enable Prometheus for a Kubernetes cluster - Whether you have a preinstalled Prometheus instance on your cluster that is linked to GitLab, or you'd like us to deploy Prometheus for you.
  2. Deploy exporters and instrument app - We should help you as much as possible with instrumenting your applications and deploying exporters into a cluster, so that you can leverage all of telemetry data from across your system.
  3. Enable metrics scraping - We need an easy way to enable metrics scraping. When possible, we should strive to do this automatically.

Supported research and epic

See metric in a dashboard

Once you've collected a set of metrics, the next step is to see those metrics in a dashboard.

  1. Out of the box dashboard (OOTB) - Everyone loves beautiful dashboards, however building one is a challenge; the more useful, OOTB dashboard we can provide, the better we can help users get started quickly. Established vendors take pride in the amount of OOTB dashboard they can provide to their customers. We could do them same and leverage GitLab community with a growing library of dashboards that anyone can contribute to.
  2. Workflow enablement (Dogfooding) - Connecting data from panels and dashboards together can help operators detect and diagnose problems. Solutions such as annotations, templating, drilldowns and improved visuaization are part of the toolkit in mature solutions. This area is where most of our dogfooding metrics effort is focused on today.
  3. Customize dashboard and add dashboards - Adding a metric to a dashboard and adding new dashboards are basic functionalities a monitoring solution should have. Today, this can be quite confusing for a first time GitLab Metrics user. We intend to make this step easy and discoverable for our users. Doing so will help us increase user adoption.

Supported research and epic

Alerting

Users need to be alerted on any threshold violation, ideally before their end-users

  1. Setting up an alert - Should be a straight forward action supported by all metrics and most of the chart types.

Kubernetes monitoring

Our vision is that in 1-2 years, GitLab metrics will be the primary day-to-day tool for monitoring cloud-native applications. To achieve that we would need to support:

  1. Cluster insight - Provide our users insight to clusters, pods and nodes.
  2. Key metrics and OOTB dashboard - Curated and scalable dashboards that provide out of the box k8s metrics and bird's eye view across environments.

Supported research and epic

Log aggregation

In the distributed nature of cloud-native applications, it is crucial and critical to collect logs across multiple services and infrastructure, present them in an aggregated view, so users could quickly search through a list of logs that originate from multiple pods and containers. Metrics and logs are related and we intend to make the correlation for users so that they can more quickly get to the answers they are looking for. You can review our logging direction page for more information.

What's Next & Why

Dogfooding metrics

We are actively Dogfooding GitLab Metrics with the Infrastructure team to migrate dashboards used for monitoring Gitlab.com from Grafana to GitLab Metrics. This will help us receive rapid feedback as we mature metrics. In terms of this overall roles and responsibilities:

We will iterate based on the following process:

  1. Identify key Grafana dashboards to clone into GitLab metrics dashboards - DRI Infra team
  2. Cloning Grafana dashboards to a GitLab project as GitLab metrics dashboards to a testbed project - DRI APM team
  3. Use GitLab metrics dashboard and provide feedback (in simulation day or after completing triaging an incident) - DRI Infra team
  4. Prioritize and implement issues based on feedback or find workarounds - DRI APM team
  5. At a certain point in time (which agreed by both teams) the infra team will become the DRI for those GitLab metrics dashboards and clone it to their project
  6. Turn off the original Grafana dashboard - the Infra team is the only DRI for making this decision, this is an independent decision that is not based only on the implementation issues.
Grafana Testbed Handbook Reference Status
General SLAs https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Fgeneral-slas.yml Tools for Engineers SLA Dashboard link Blocked by #219726
Public Dashboard Landing Page https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Fpublic-dashboard-splash-screen.yml Infrastructure KPI 6 image link Awaiting Feedback
Cloudflare Traffic Overview https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Fcloudflare-traffic-overview.yml N/A Awaiting Feedback
Logging https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Flogging.yml N/A Awaiting Feedback

Customize dashboard and add dashboards

As mentioned above, adding a metric to a dashboard and adding new dashboards are basic functionalities a monitoring solution should have. Today, this can be quite confusing for a first time GitLab Metrics user. We are actively working on improving these workflows and allow a better onboarding experience for our users. Detailed information can is available in the following:

GIT is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license