You are here:
Meet the GitLab team Organizational Structure Working Groups Disaster Recovery Working Group
On this page
This working group will determine what is needed to introduce a disaster recovery mechanism for GitLab.com, and what effort is necessary to leverage GitLab Geo as a mechanism for building reliable and predictable disaster recovery at the largest scale.
Scope and Definitions
In the context of this working group:
Recovery Point Objective (RPO) : targeted duration of time in which data might be lost due to a major incident.
Recovery Time Objective (RTO) : targeted duration of time and service level within which a business process must be restored
after a disaster to avoid unacceptable consequences of a break in business continuity.
This working group is working towards
the proposed targets for both RPO and RTO. Sequence Order Of Deliverables
Set up a multi-node Geo site on staging for the next iterations of failover tests.
Define a roadmap containing identified gaps and what is needed to provide the necessary failover functionality for GitLab.com production scale.
Regularly plan and execute failover tests on the staging secondary Geo site.
Demonstrate ability to execute a successful full failover of Staging.
A design of how GitLab Geo would be used in production in the form of a blueprint and readiness review.
Ensure that the cost is kept in check with the proposed design.
Decide on go/no-go for production rollout based on the proposed design.
Create and update
a single handbook page, and deprecate resources in other locations.
2020-11-30 Plan and execute a test of a staging failover leveraging GitLab Geo by 2020-11-30 with minimal disruption to the existing deployment and testing processes.
2021-01-13 Execute a follow up test of a staging failover, automating the testing and tooling processes Generated a proposal and received approval for building out
a staging secondary site
Evaluated the cost impact and received approval for a secondary site for production starting September 2021.
Defined the DR flow on GitLab.com and the need to find a balanced solution to ensure a fully operational site after failover Roles and Responsibilities
Working Group Role
VP of Infrastructure
Director of Infrastructure, Reliability
Principal Product Manager, Enablement
Senior Product Manager, Geo
Director of Engineering, Enablement
Data Analyst, Infrastructure
Backend Engineering Manager, Geo
Senior Software Engineer in Test, Geo