Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Infrastructure Department Performance Indicators

Executive Summary

KPI Maturity Health Reason(s)
Infrastructure Hiring Actual vs Plan Level 3 of 3 Okay
  • on plan
Infrastructure Non-Headcount Plan vs Actuals Level 2 of 3 Unknown
  • Currently finance tells me when there is a problem, I’m not self-service.
  • Get the budget captured in a system
  • Chart budget vs. actual over time in periscope
Infrastructure Average Location Factor Level 3 of 3 Okay
  • We are close to our target of 0.58 overall, but trending upward.
Infrastructure Recruiting Average Top-of-Funnel Location Factor Level 2 of 3 Unknown
  • We need to get this as a chart in periscope.
  • We have set the target location factors on our vacancies and make sure recruiting is targeting the right areas of the globe on a per role basis.
GitLab.com Availability Level 2 of 3 Attention
  • Today the apdex and error ratio thresholds that go into availability may change based on amount of alerts they create
  • None of the data used for these metrics is in the data warehouse yet, therefore we cannot embed Periscope charts in the handbook
Infrastructure Hosting Cost per GitLab.com Monthly Active Users Level 3 of 3 Okay
  • there is additional work needed to make it transparent what is driving this metric from a granular perspective
GitLab.com Performance Level 1 of 3 Attention
  • We are experiencing occasionally slowness of both the frontend and git operations.
Infrastructure Hosting Cost vs Plan Level 1 of 3 Okay
  • There was no locked plan for hosting services, so we are using the latest forecast until next plan lock

Key Performance Indicators

Infrastructure Hiring Actual vs Plan

Are we able to hire high quality workers to build our product vision in a timely manner? Hiring information comes from BambooHR where employees are in the division `Engineering` and department is `infrastructure`. This KPI is tracked and reported on a monthly basis but year to date (headcount v plan) is also measured. The target of 0.9 measures the actual number of hires in a given month v the planned number of hires, so 1.0 would mean we hired the exact number of employees that we planned to hire for that month.

Target: 1

URL(s)

Chart (Sisense↗)

Health: Okay

Maturity: Level 3 of 3

Infrastructure Non-Headcount Plan vs Actuals

This is a subset of an existing KPI. Please see the definition for the parent KPI.

We need to spend our investors' money wisely. We also need to run a responsible business to be successful, and to one day go on the public market.

Target: Unknown until FY21 planning process

URL(s)

Health: Unknown

Maturity: Level 2 of 3

Infrastructure Average Location Factor

We remain efficient financially if we are hiring globally, working asynchronously, and hiring great people in low-cost regions where we pay market rates. We track an average location factor by function and department so managers can make tradeoffs and hire in an expensive region when they really need specific talent unavailable elsewhere, and offset it with great people who happen to be in low cost areas.

Target: 0.58

URL(s)

Chart (Sisense↗)

Health: Okay

Maturity: Level 3 of 3

Infrastructure Recruiting Average Top-of-Funnel Location Factor

We need to be proactive in measuring our location factor, starting with candidates who are at the top of the recruiting funnel.

Target: 0.58

URL(s)

Health: Unknown

Maturity: Level 2 of 3

GitLab.com Availability

Percentage of time during which GitLab.com is fully operational and providing service to users within SLO parameters. GitLab.com Availability is calculated as the weighted average availability of GitLab's customer-facing services, these being, with the respective weights shown in parenthesis `git` (5), `web` (5), `api` (5), `ci-runners` (3), `registry` (2), and `sidekiq` (1). The availability of each of these services is measured according to two metrics. Apdex, for latency and error rate, for errors. A service is considered available when at least 50% of users are experiencing satisfactory latency, _and_ 50% of requests are completing successfully. If either of these conditions is not met, the service is experiencing an outage.

Target: 99.95%

URL(s)

Health: Attention

Maturity: Level 2 of 3

Infrastructure Hosting Cost per GitLab.com Monthly Active Users

This metric reflects an estimate of the dollar cost necessary to support one user in GitLab.com. It is an important metric because it allows us to estimate infrastructure costs as our user base grows. Infrastructure Hosting Cost comes from Netsuite; it is a sum of actual amounts with the unique account name '5026 - Hosting Services COGS' or '6026 - Hosting Services'. This cost is divided by MAU

Target: 1.5

This KPI cannot be public.

URL(s)

Health: Okay

Maturity: Level 3 of 3

GitLab.com Performance

This metric needs to reflect the performance of GitLab as experienced by users. It should capture both frontend and backend performance. Even though the Infrastructure will be responsible for this metric they will need other departments such as Development, Quality, PM, and UX to positively affect change.

URL(s)

Health: Attention

Maturity: Level 1 of 3

Infrastructure Hosting Cost vs Plan

Tracks our actual infrastructure hosting costs against our planned infrastructure hosting costs for GitLab.com. We need this metric to manage our financial position.

URL(s)

Health: Okay

Maturity: Level 1 of 3

Regular Performance Indicators

Infrastructure Discretionary Bonus Rate

Discretionary bonuses offer a highly motivating way to reward individual GitLab team members who really shine as they live our values. Our goal is to award discretionary bonuses to 10% of GitLabbers in the Infrastructure department every month.

Target: 10%

URL(s)

Health: Unknown

Maturity: Level 2 of 3

Apdex and Error SLO per Service

Each service at GitLab has two general metrics SLOS. "Apdex Score" is simply put, this is a measure of the percentage of requests to that service that complete within a satisfactory amount of time. The thresholds are defined per service, so for some services it will be in microseconds for others it could be seconds. "Error Ratio" is the percentage of requests to a service which end in error. For each service in the system we define an acceptable threshold for these values. For apdex we want the actual score to be above the threshold, for error ratio, we want it to be below the threshold. We don’t expect the apdex to be a perfect 100%, and we don’t expect the error rate to be a perfect 0%, but we would like these values to be within their predefined thresholds 100% of the time. The actual amount of time that they adhere to their SLO thresholds is far below this currently

URL(s)

Health: Unknown

Maturity: Level 1 of 3

Mean Time To Detection (MTTD)

Measures the elapsed time it takes us to detect the onset of an anomalous condition and its actual detection, and serves as an indicator of our ability to monitor the environment and minimize incident resolution.

URL(s)

Health: Unknown

Maturity: Level 1 of 3

Mean Time To Resolution (MTTR)

Measures the elapsed time in hours it takes us to recover when an incident occurs, and serves as an indicator of our ability to execute said recoveries. (Only includes S1 & S2 incidents)

Target: 1

URL(s)

Chart (Sisense↗)

Health: Attention

Maturity: Level 3 of 3

Mean Time Between Failures (MTBF)

Measures the mean amount of time in days elapsed between incidents that affect GitLab.com’s availability. (Only includes S1 & S2 incidents)

Target: 7

URL(s)

Chart (Sisense↗)

Health: Attention

Maturity: Level 3 of 3

Mean Time To Production (MTTP)

Measures the elapsed time (in hours) from merging a change in gitlab-org/gitlab projects master branch, to deploying that change to gitlab.com. It serves as an indicator of our speed capabilities to deploy application changes into production.

Target: 8

URL(s)

Chart (Sisense↗)

Health: Attention

Maturity: Level 3 of 3

Disaster Recovery (DR) Time-to-Recover

Tracks time to recover full operational status in case of a catastrophic incident in our primary production environment.

Target: 60m

Health: Unknown

Maturity: Level 2 of 3

Other PI Pages

Legends

Maturity

Level Meaning
Level 3 of 3 Has a description, target, and Sisense embed (if public) or URL (or not).
Level 2 of 3 Missing one of: description, target, or Sisense embed (if public) or URL (or not).
Level 1 of 3 Missing two of: description, target, or Sisense embed (if public) or URL (or not).
Level 0 of 3 Missing a description, a target, and Sisense embed (if public) or URL (or not).

Health

Level Meaning
Okay The KPI is at an acceptable level compared to the threshold
Attention This is a blip, or we’re going to watch it, or we just need to enact a proven intervention
Problem We'll prioritize our efforts here
Unknown Unknown

How to work with pages like this

Data

The heart of pages like this is a data file called /data/performance_indicators.yml which is in YAML format. Almost everything you need to do will involve edits to this file. Here are some tips:

Two flags to note:

Pages

Pages like /handbook/engineering/performance-indicators/ are rendered by and ERB template.

These ERB templates call the helper function performance_indicators() that is defined in /helpers/custom_helpers.rb. This helper function calls in several partial templates to do it's work.

This function takes a required argument named org in string format that limits the scope of the page to a portion of the data file. Possible valid values for this org argument are listed in the org property of each element in the array in /data/performance_indicators.yml.