Site Availability Engineering

On this page

Workflow How may we be of service? GitLab.com Status STATUS
Issue Trackers Infrastructure: Milestones, OnCall Production: Incidents, Changes, Deltas Delivery
Slack Channels #sre-lounge, #database #alerts, #production #g_delivery
Operations Runbooks (please contribute!) On-call: Handover Document, Reports  

Mission

Site Availability is the gatekeeper and primary caretaker of the operational environment, focusing on its uptime and state as it exists in the present.

Vision

Over the next 12 to 18 months, we will focus relentlessly on the availability of GitLab.com so that it becomes engrained in everything we do. Thus, the team's priorities are driven, almost exclusively, by availability considerations, effecting the cultural shift necessary to achieve our uptime goals, primarily through operational discipline. This group has the greatest latitude in making changes to the environment that ensure uptime in the here and now, and is the final authority as it relates to changes in GitLab.com.

Site Availability is the primary owner (but not the only consumer) of the following operational processes and procedures:

Key Metrics

Key metrics related to this group include:

Team

Each member of the Site Avilability Team is part of this vision:

Team Members

The following people are members of the Site Availability Team:

Person Role
Jose Cores Finotto Engineering Manager, SAE
Ahmad Sherif Site Reliability Engineer
Andreas Brandl Senior Database Engineer
Amarbayar Amarsanaa Senior Site Reliability Engineer
Henri Philipps Senior Site Reliability Engineer
Peter Dam Senior Site Reliability Engineer
Hendrik Meyer Site Reliability Engineer