Service Maturity Model

Introduction

This page shows the output of our service maturity model for each service in our metrics catalog. The model itself is part of the metrics catalog, and uses information from the metrics catalog and the service catalog to score each service.

To achieve a particular level in the maturity model, a service must meet all the criteria for that level and all previous levels. Some criteria do not apply to all services (for instance, services like PgBouncer do not need development documentation).

Maturity score by service

❌ indicates the service does meet even the Level 1 criteria

Service Level
ai-assisted Level 3
ai-gateway Level 2
api Level 2
atlantis Level 1
camoproxy Level 2
ci-runners Level 2
cloud-sql Level 1
consul Level 1
customersdot Level 3
errortracking Level 2
ext-pvs Level 3
external-dns Level 1
frontend Level 1
git Level 2
gitaly Level 3
gitlab-static Level 1
glgo Level 2
google-cloud-storage Level 2
internal-api Level 3
jaeger Level 2
kas Level 3
kube Level 2
logging Level 1
mailgun Level 3
mailroom Level 1
memorystore Level 1
mimir Level 2
monitoring Level 2
nat Level 1
nginx Level 1
ops-gitlab-net Level 3
packagecloud Level 1
patroni Level 2
patroni-ci Level 2
patroni-embedding Level 1
patroni-registry Level 1
pgbouncer Level 1
pgbouncer-ci Level 1
pgbouncer-embedding Level 1
pgbouncer-registry Level 1
plantuml Level 1
postgres-archive Level 1
praefect Level 2
redis Level 3
redis-cluster-cache Level 3
redis-cluster-chat-cache Level 3
redis-cluster-feature-flag Level 3
redis-cluster-queues-meta Level 3
redis-cluster-ratelimiting Level 3
redis-cluster-repo-cache Level 3
redis-cluster-shared-state Level 3
redis-db-load-balancing Level 3
redis-pubsub Level 3
redis-registry-cache Level 2
redis-sessions Level 3
redis-sidekiq Level 3
redis-tracechunks Level 3
registry Level 2
runway Level 1
search Level 1
sentry Level 2
sidekiq Level 3
thanos Level 2
tracing Level 2
vault Level 2
waf Level 1
web Level 2
web-pages Level 2
websockets Level 2
woodhouse Level 3

Maturity detail by service

Key:

  • ✅ Service meets the criteria
  • ❌ Service does not meet the criteria
  • ➖ The criteria is skipped. Some maturity criteria make less sense for some services. For example, an infrastructure-facing service like Patroni is crucial to ops, but not related to our Development department, hence it does not require development guidelines.
  • ⚪ We don’t measure the criteria yet. See https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/560 for progress

ai-assisted detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

ai-gateway detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Runway structured logs are temporarily available in Stackdriver
Service exists in the dependency graph
Reason: Runway services are deployed outside of the monolith
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
SRE guides exist in runbooks
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

api detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8, 9
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7, 8
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

atlantis detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613
Service exists in the dependency graph
Reason: Atlantis is a work in progress, see https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24613
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: Atlantis is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

camoproxy detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph
Reason: Camoproxy does not interact directly with any declared services in our system
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

ci-runners detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

cloud-sql detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Cloud SQL is a managed service of GCP. The logs are available in Stackdriver.
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate
SLO monitoring: request rate
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: Cloud SQL is an infrastructure component, powered by GCP
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

consul detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: Consul is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

customersdot detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: All logs are available in Stackdriver
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

errortracking detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

ext-pvs detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Runway structured logs are temporarily available in Stackdriver
Service exists in the dependency graph
Reason: Runway services are deployed outside of the monolith
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

external-dns detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Logs from external-dns are not ingested to ElasticSearch due to volume. Besides, the logs are also available in Stackdriver
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate
SLO monitoring: request rate
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: external-dns is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

frontend detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Logs from HAProxy are available in BigQuery, and not ingested to ElasticSearch due to volume.
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

git detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7, 8
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

gitaly detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

gitlab-static detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Logs from CloudFlare workers are available on-demand but they are not being ingested due to volume
Service exists in the dependency graph
Reason: This service is hosted by Cloudflare and does not depend on any other service
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation ⚪ Not Implemented
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

glgo detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Runway structured logs are temporarily available in Stackdriver
Service exists in the dependency graph
Reason: Runway services are deployed outside of the monolith
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
SRE guides exist in runbooks
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

google-cloud-storage detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Access logs of GCS and not enabled due to volume.
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

internal-api detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

jaeger detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Jaeger service is not deployed in production
Service exists in the dependency graph
Reason: Jaeger is an independent internal observability tool
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

kas detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7, 8
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

kube detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph
Reason: This service is managed by GKE at the moment. It does not interfact directly with any other services
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: Application logic does not interact with kube
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

logging detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Service exists in the dependency graph
Reason: The logging platform consumes logs via fluentd, but does not interact directly with any other services
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

mailgun detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Mailgun is a vendor
Service exists in the dependency graph
Reason: Mailgun is a vendor
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation ⚪ Not Implemented
SRE guides exist in runbooks ⚪ Not Implemented
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

mailroom detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

memorystore detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Memorystore is a managed service of GCP. The logs are available in Stackdriver.
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate
SLO monitoring: request rate
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: Memorystore is an infrastructure component, powered by GCP
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

mimir detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8
Service exists in the dependency graph
Reason: Mimir is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation ⚪ Not Implemented
SRE guides exist in runbooks ⚪ Not Implemented
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

monitoring detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

nat detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: NAT is managed by GCP, thus the logs are avaiable in Stackdriver.
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: NAT is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

nginx detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Logs from nginx are not ingested to ElasticSearch due to volume. Usually, workhorse logs will cover the same ground. Besides, the logs are also available in Stackdriver
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: Application logic does not interact with nginx
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

ops-gitlab-net detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
Service exists in the dependency graph
Reason: ops.gitlab.net is a standalone GitLab deployment
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation ⚪ Not Implemented
SRE guides exist in runbooks ⚪ Not Implemented
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

packagecloud detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation ⚪ Not Implemented
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

patroni detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

patroni-ci detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

patroni-embedding detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

patroni-registry detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: patroni is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

pgbouncer detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

pgbouncer-ci detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

pgbouncer-embedding detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

pgbouncer-registry detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: pgbouncer is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

plantuml detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: The logs are available in Stackdriver.
Service exists in the dependency graph
Reason: Platuml is a is a stateless web application that generates UML diagrams on the fly. The rendered markdown points to the platuml server in the frontends. It does not interact with any declared services
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

postgres-archive detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: postgres-archive is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

praefect detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-cluster-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-cluster-chat-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-cluster-feature-flag detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-cluster-queues-meta detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-cluster-ratelimiting detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks ⚪ Not Implemented
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-cluster-repo-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-cluster-shared-state detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-db-load-balancing detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-pubsub detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-registry-cache detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation
SRE guides exist in runbooks
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-sessions detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-sidekiq detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

redis-tracechunks detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability
Reason: Metadata can't be injected in redis logs
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

registry detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

runway detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Runway is a platform. The logs are available in Stackdriver.
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate
SLO monitoring: request rate
Level 3 Service health dashboards 1, 2, 3
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

search detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

sentry detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: We are migrating our self-managed Sentry instance to the hosted one. For more information: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/13963. Besides, Sentry logs are also available in Stackdriver.
Service exists in the dependency graph
Reason: Sentry is an independent internal observability tool
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

sidekiq detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

thanos detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Service exists in the dependency graph
Reason: Thanos is an independent internal observability tool. It fetches metrics from other services, but does not interact with them, functionally
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation ⚪ Not Implemented
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

tracing detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

vault detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Vault is a pending project at the moment. There is no traffic at the moment. We'll add logs and metrics in https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739
Service exists in the dependency graph
Reason: Vault is a pending project at the moment. There is no traffic at the moment. The progress can be tracked at https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/739
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: Vault is an infrastructure component, developers do not interact with it
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

waf detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Logs from CloudFlare are pushed to a GCS bucket by CloudFlare, and not ingested to ElasticSearch due to volume. See https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/cloudflare/logging.md for alternatives
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation
Reason: WAF is an infrastructure component, powered by Cloudflare
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

web detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6, 7, 8
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

web-pages detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6, 7
SLA calculations driven from SLO metrics ⚪ Not Implemented
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

websockets detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana 1, 2, 3, 4, 5, 6
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1, 2, 3, 4, 5, 6
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented

woodhouse detail

Level Criterion Passed
Level 1 Exists in the service catalog 1
Structured logs available in Kibana
Reason: Log volume is very low; tooling links to StackDriver provided which is sufficient for the purposes
Service exists in the dependency graph 1
Level 2 SLO monitoring: apdex 1
SLO monitoring: error rate 1
SLO monitoring: request rate 1
Level 3 Service health dashboards 1
SLA calculations driven from SLO metrics
Reason: Service is not user facing
All components include an apdex 1
Logging includes metadata for measuring scalability ⚪ Not Implemented
Developer guides exist in developer documentation 1
SRE guides exist in runbooks 1
Metrics on downstream service usage ⚪ Not Implemented
Level 4 Prepared Kibana dashboards ⚪ Not Implemented
Dashboards linked from metrics catalogs ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Level 5 Long-term forecasting utilization and usage ⚪ Not Implemented
70% of requests covered by at least one SLI ⚪ Not Implemented
Automatic alert routing ⚪ Not Implemented
Last modified December 8, 2023: Fix broken refs and update shortcodes (a29b81e8)