The CI (Verify) engineering team has had 1 person effectively oncall for all CI/CD services for GitLab.com. As the scope and scale of GitLab.com has grown, this operational load is too much for one person. This blueprint is the plan to have the core SRE infrastructure team take over oncall and operational work for CI / CD services for GitLab.com.
At a high level, this blueprint has two main areas to consider:
To start preparing the team to take on call, the SRE team will need to:
Most of the work for analysis will be done by breaking down the production readiness into multiple issues. The output of the operational readiness should end up in the runbooks repo and feed into the new service inventory work.
The SRE team that is a counterpart of the Release and Verify teams will also: