Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Disaster Recovery Working Group

On this page

Attributes

Property Value
Date Created November 11, 2020
End Date TBD
Slack #wg_disaster-recovery (only accessible from within the company)
Google Doc Working Group Agenda (only accessible from within the company)
Issue Board Working Group Issue Board
Epic Link

Charter

This working group will determine what is needed to leverage GitLab Geo as a mechanism for building reliable and predictable disaster recovery into GitLab.com production.

Scope and Definitions

In the context of this working group:

  1. Recovery Point Objective (RPO) : targeted duration of time in which data might be lost due to a major incident.
  2. Recovery Time Objective (RTO) : targeted duration of time and service level within which a business process must be restored after a disaster to avoid unacceptable consequences of a break in business continuity.

Sequence Order Of Deliverables

In Progress:

  1. Execute a third successful failover test for the current single-node Geo site, addressing issues from previous failover tests.
  2. Set up a multi-node Geo site on staging for the next iterations of failover tests.
  3. Iteratively plan and execute failover tests on the multi-node Geo site.
  4. Demonstrate ability to execute a successful full failover of Staging
  5. From failover test results, develop a gap analysis of what would be needed to provide the necessary failover functionality for GitLab.com production
  6. Define a roadmap of how and when gaps identified will be addressed
  7. A design of how GitLab Geo would be used in production in the form of a blueprint and readiness review
  8. Create and update a single handbook page, and deprecate resources in other locations

Completed:

  1. 2020-11-30 Plan and execute a test of a staging failover leveraging GitLab Geo by 2020-11-30 with minimal disruption to the existing deployment and testing processes.
  2. 2021-01-13 Execute a follow up test of a staging failover, automating the testing and tooling processes

Roles and Responsibilities

Working Group Role Person Title
Executive Stakeholder Steve Loyd VP of Infrastructure
Facilitator/DRI Brent Newton Director of Infrastructure, Reliability
Functional Lead Andrew Thomas Principal Product Manager, Enablement
Functional Lead Fabian Zimmer Senior Product Manager, Geo
Functional Lead Marin Jankovski Sr Engineering Manager, Infrastructure, Delivery & Scalability
Member Chun Du Director of Engineering, Enablement
Member Davis Townsend Data Analyst, Infrastructure
Member Henri Philipps Senior Site Reliability Engineer
Member Jennie Louie Software Engineer in Test, Geo
Member Nick Nguyen Backend Engineering Manager, Geo
Member Nick Westbury Senior Software Engineer in Test, Geo
Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license