Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Infrastructure

On this page

GitLab.com Status Status Page
How to get help How to get help
Incident Management How we handle GitLab.com incidents
Change Management How we manage changes to GitLab.com
Workflow How may we be of service?    
Issue Trackers Infrastructure: Milestones, OnCall Production: Incidents, Changes, Deltas Delivery
Slack Channels #infrastructure-lounge, #database #alerts, #production #g_delivery
Operations Runbooks (please contribute!) On-call: Handover Document, Reports  

Other Pages

GitLab.com Architecture Environments Monitoring Performance
Production SRE Onboarding Readiness Guide Database Reliability On-call Handover

Mission

The Infrastructure Department is the primary responsible party for the availability, reliability, performance, and scalability of all user-facing services (most notably GitLab.com, the largest production GitLab Installation on the planet). Other departments and teams contribute greatly to these attributes of our service as well. In these cases it is the responsibility of the Infrastructure Department to close the feedback loop with monitoring and metrics to drive accountability.

Vision

We are a blend of operations gearheads and software crafters that apply sound engineering principles, operational discipline and mature automation to make GitLab.com ready for mission-critical customer workloads. We strive for excellence every day by living and breathing GitLab's values as our guiding operating principles in every decision we make and every action we take.

Design

The Infrastructure Library contains documets that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.

Blueprints scope out our initial thinking about specific problems and issues we are working on. Designs outline the specific architecture and implemetation.

OKRs

GitLab uses Objectives and Key Results (OKRs) as quarterly goals to execute our strategy to make sure [said] goals are clearly defined and aligned throughout the organization. We capture our OKRs through issues.

Meetings

GitLab is a widely distributed company, and we aim to work asynchronously most of the time. There are times, however, when we must get together to discuss topics in real time, and thus, we do have some meetings scheduled. Infrastructure has four primary meetings.

Teams

The Infrastructure Department is comprised of four teams teams:

For details on the Department's structure, see the Infrastructure Teams Handbook section.

Additionally, Infrastructure's mStaff is the loose denomination for the group of people who report diretly to the Director of Engineering, Infrastructure, a group composed of both managers and individual contributors responsible for the overall direction of Infrastructure

SRE Stable Counterparts and areas of ownership

Every SRE is aligned with an engineering team. Each SRE can help the teams at each stage of the process. Planning, discovery, implementation, and further iteration. The area an SRE is responsible for is part of their title, e.g. "SRE, Plan, Monitor." You can see which area of the product each SRE is aligned with in the team org chart.

Multiple SREs are aligned with areas of the product. This area will be listed on the team page under their title as an expertise or specialty, e.g. "Plan expert." This way there is a team of SREs available to provide help in the case that another is out of the office or busy with another incident or team.

There are 2 dimensions of ownership for the SRE teams. First along Product/Engineering team lines and second around infrastructure needs.

GitLab Product/Service to Infrastructure Label mapping:

Section Stages Label(s) Team
Dev Create, Plan, Manage ~Product:create, ~Product:Plan, ~Product:Manage Dev & Ops Infra team (Jose)
Ops Monitor, Configure ~Product:Monitor, ~Product:Configure Secure & Defend Infra team (Anthony)
CI/CD Verify, Package, Release ~Product:Verify, ~Product:Release CI/CD & Enablement Infra team (Dave)
Sec Secure ~Product:Secure Secure & Defend Infra team (Anthony)
Defend Defend ~Product:Defend Secure & Defend Infra team (Anthony)
Enablement & Growth Many ~Service:Singleton CI/CD & Enablement Infra team (Dave)

*** Security is an aspect of all three teams so a relationship exists for all 3 teams.

Operational/Infra axes of alignment:

  1. Observability: Metrics/Alerting (Prometheus, Grafana, PagerDuty, Pingdom) - Dev & Ops Infra team (Jose)
  2. Observability: Logging (ELK + log shipping design) - Secure & Defend Infra team (Anthony)
  3. Infrastructure tooling: Chef, Terraform - Secure & Defend Infra team (Anthony)
  4. Core Database: Postgres backups, tuning/configuration - Dev & Ops Infra team (Jose)
  5. Backup Tooling processes and testing: Git (CI/CD & Enablement Infra team (Dave)) and Postgres Data(Dev & Ops Infra team (Jose))