Disaster Recovery Working Group

Date Created November 11, 2020
End Date TBD
Slack #wg_disaster-recovery (only accessible from within the company)
Google Doc Working Group Agenda (only accessible from within the company)
This working group will determine what is needed to leverage GitLab Geo as a mechanism for building reliable and predictable disaster recovery into production.

Scope and Definitions

In the context of this working group,

  1. Recovery Point Objective (RPO) : targeted duration of time in which data might be lost due to a major incident.
  2. Recovery Time Objective (RTO) : targeted duration of time and service level within which a business process must be restored after a disaster to avoid unacceptable consequences of a break in business continuity.

Sequence Order Of Deliverables

  1. Plan and execute a test of a staging failover leveraging GitLab Geo by 2020-11-30 with minimal disruption to the existing deployment and testing processes
  2. An evaluation of that failover in the form of a gap analysis of what would be needed to provide the necessary failover functionality for production
  3. Roadmap of how and when gaps identified will be addressed
  4. Successive Geo failover in Staging which result in a successful full failover of Staging
  5. A design of how GitLab Geo would be used in production in the form of a blueprint and readiness review

Roles and Responsibilities

Working Group Role Person Title
Executive Stakeholder Steve Loyd VP of Infrastructure
Facilitator/DRI Brent Newton Director of Infrastructure, Reliability
Functional Lead Andrew Thomas Principal Product Manager, Enablement
Functional Lead Fabian Zimmer Senior Product Manager, Geo
Functional Lead Marin Jankovski Sr Engineering Manager, Infrastructure, Delivery & Scalability
Member Chun Du Director of Engineering, Enablement
Member Henri Philipps Senior Site Reliability Engineer
Member Nick Nguyen Backend Engineering Manager, Geo
