Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Direction - Metrics

Metrics

UPDATE: As of 2020-09-01: The Metrics category at GitLab has presently been reprioritized and we are not actively progressing this category. The vision you see displayed on this page represents the direction we will pursue if this becomes a priority again in the future.

Introduction and how you can help

Thanks for visiting this category strategy page on Metrics in GitLab. This category belongs to and is maintained by the Health group of the Monitor stage.

This page is maintained by Kevin Chu, group product manager. You can connect with him via Zoom or Email. If you're a GitLab user and have direct knowledge of your Metrics usage, we'd especially love to hear your use case(s).

What are metrics

Metrics help service operators understand the health and status of the provided services; as such, metrics are essential for ensuring the reliability and stability of those services.

In practice, metrics are typically represented by a name, a type, and a measurement. They can tell you simple things like the current CPU usage rate, and they can also represent more complex concepts such as an Apdex score. Users typically would gather metrics of their monitored system into their product analytics system (such as Prometheus) to observe how their systems are behaving. There are many metrics that are exposed by your operating system or framework (e.g. Kubernetes metrics) and these metrics are typically easy to collect. For other components, such as the applications you developed, you may have to add code or other interfaces (such as an agent running on the same host as your application) to expose the metrics that you care about. Exposing metrics is sometimes called instrumentation, the collection of metrics from an end point is called scraping.

Once you have started collected metrics of your system, you now have the basis to defineservice level indicators and service level objectives. Having SLIs and SLOs is critical for efficient alerting and incident remediation. Defining SLIs and SLOs requires time to understand, correlate, and fine-tune relationships between the different kinds of metrics you are capturing.

Our mission

Provide users with information about the health and performance of their infrastructure, applications, and system for insights into reliability, stability, and performance.

Target audience

Metrics are essential for all users across the DevOps spectrum. From developers who should understand the performance impact of changes they are making, as well as operators responsible for keeping production services online.

Our vision is that in 1-2 years, GitLab metrics is the main day-to-day tool for monitoring cloud-native applications for SMBs.

We are targeting Software Developer and DevOps Engineer working in a SMB from the following reasons:

Since the team's current focus is dogfooding metrics, our immediate target audience is the GitLab Infrastructure Team. We plan to build the minimal work needed for them to start dogfooding our metrics dashboard before shifting focus to consider the overall needs of the target audience mentioned above.

Current experience

The experience today offers you to deploy Prometheus instance into a project cluster by a push of a button, the Prometheus instance will run as a GitLab managed application. Once deployed, it will automatically collect key metrics from the running application which are displayed on an out of the box dashboard. Our dashboards provide you with the needed flexability to display any metric you desire you can set up alerts, configure variables on a generic dashboard, drill into the relavant logs to troubleshoot your service and moreā€¦ If you already have a running Prometheus deploy into your cluster simply connect it to your GitLab and start using our GitLab metrics dashboard.

Targeted workflows

The target workflow, listed below, is our high-level roadmap. It is based on competitive analysis, user research, and customer interviews. The details of each workflow are listed in the epics and issues.

Getting data in

The first step in application performance management is collecting the proper measurements or product analytics data. Instrumenting critical areas and reporting metrics of your system are prerequisites to understanding the health and performance of your services and application. Our metric solution is powered by Prometheus, targeting users of Kubernetes. We need to make sure our users can successfully

  1. Enable Prometheus for a Kubernetes cluster - Whether you have a preinstalled Prometheus instance on your cluster that is linked to GitLab, or you'd like us to deploy Prometheus for you.
  2. Deploy exporters and instrument app - We should help you as much as possible with instrumenting your applications and deploying exporters into a cluster, so that you can leverage all of product analytics data from across your system.
  3. Enable metrics scraping - We need an easy way to enable metrics scraping. When possible, we should strive to do this automatically.

Supported research and epic

See metric in a dashboard

Once you've collected a set of metrics, the next step is to see those metrics in a dashboard.

  1. Out of the box dashboard (OOTB) - Everyone loves beautiful dashboards, however building one is a challenge; the more useful, OOTB dashboard we can provide, the better we can help users get started quickly. Established vendors take pride in the amount of OOTB dashboard they can provide to their customers. We could do them same and leverage GitLab community with a growing library of dashboards that anyone can contribute to.
  2. Workflow enablement (Dogfooding) - Connecting data from panels and dashboards together can help operators detect and diagnose problems. Solutions such as annotations, templating, drilldowns and improved visuaization are part of the toolkit in mature solutions. This area is where most of our dogfooding metrics effort is focused on today.
  3. Customize dashboard and add dashboards - Adding a metric to a dashboard and adding new dashboards are basic functionalities a monitoring solution should have. Today, this can be quite confusing for a first time GitLab Metrics user. We intend to make this step easy and discoverable for our users. Doing so will help us increase user adoption.

Supported research and epic

Alerting

Users need to be alerted on any threshold violation, ideally before their end-users

  1. Setting up an alert - Should be a straight forward action supported by all metrics and most of the chart types.

Kubernetes monitoring

Our vision is that in 1-2 years, GitLab metrics will be the primary day-to-day tool for monitoring cloud-native applications. To achieve that we would need to support:

  1. Cluster insight - Provide our users insight to clusters, pods and nodes.
  2. Key metrics and OOTB dashboard - Curated and scalable dashboards that provide out of the box k8s metrics and bird's eye view across environments.

Supported research and epic

Log aggregation

In the distributed nature of cloud-native applications, it is crucial and critical to collect logs across multiple services and infrastructure, present them in an aggregated view, so users could quickly search through a list of logs that originate from multiple pods and containers. Metrics and logs are related and we intend to make the correlation for users so that they can more quickly get to the answers they are looking for. You can review our logging direction page for more information.

What's Next & Why

Dogfooding metrics

We are actively Dogfooding GitLab Metrics with the Infrastructure team to migrate dashboards used for monitoring Gitlab.com from Grafana to GitLab Metrics. This will help us receive rapid feedback as we mature metrics. In terms of this overall roles and responsibilities:

We will iterate based on the following process:

  1. Identify key Grafana dashboards to clone into GitLab metrics dashboards - DRI Infra team
  2. Cloning Grafana dashboards to a GitLab project as GitLab metrics dashboards to a testbed project - DRI APM team
  3. Use GitLab metrics dashboard and provide feedback (in simulation day or after completing triaging an incident) - DRI Infra team
  4. Prioritize and implement issues based on feedback or find workarounds - DRI APM team
  5. At a certain point in time (which agreed by both teams) the infra team will become the DRI for those GitLab metrics dashboards and clone it to their project
  6. Turn off the original Grafana dashboard - the Infra team is the only DRI for making this decision, this is an independent decision that is not based only on the implementation issues.
Grafana Testbed Handbook Reference Status
General SLAs https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Fgeneral-slas.yml Tools for Engineers SLA Dashboard link Awaiting Feedback
Public Dashboard Landing Page https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Fpublic-dashboard-splash-screen.yml Infrastructure KPI 6 image link Awaiting Feedback
Cloudflare Traffic Overview https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Fcloudflare-traffic-overview.yml N/A Awaiting Feedback
Logging https://gitlab.com/gitlab-org/monitor/sandbox/test-metrics-dashboard/-/environments/2115574/metrics?dashboard=.gitlab%252Fdashboards%252Flogging.yml N/A Awaiting Feedback

Customize dashboard and add dashboards

As mentioned above, adding a metric to a dashboard and adding new dashboards are basic functionalities a monitoring solution should have. Today, this can be quite confusing for a first time GitLab Metrics user. We are actively working on improving these workflows and allow a better onboarding experience for our users. Detailed information can is available in the following:

Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license