This working group will determine what is needed to leverage GitLab Geo as a mechanism for building reliable and predictable disaster recovery into GitLab.com production.
Scope and Definitions
In the context of this working group:
Recovery Point Objective (RPO) : targeted duration of time in which data might be lost due to a major incident.
Recovery Time Objective (RTO) : targeted duration of time and service level within which a business process must be restored
after a disaster to avoid unacceptable consequences of a break in business continuity.
Sequence Order Of Deliverables
Execute a third successful failover test for the current single-node Geo site, addressing issues from previous failover tests.
Set up a multi-node Geo site on staging for the next iterations of failover tests.
Iteratively plan and execute failover tests on the multi-node Geo site.
Demonstrate ability to execute a successful full failover of Staging
From failover test results, develop a gap analysis of what would be needed to provide the necessary failover functionality for GitLab.com production
Define a roadmap of how and when gaps identified will be addressed
A design of how GitLab Geo would be used in production in the form of a blueprint and readiness review