- You are here:
- GitLab Direction
- Product Strategy - Geo
- Category Strategy - Disaster recovery
🚨 Disaster recovery
Introduction and how you can help
GitLab installations hold business critical information and data. The Disaster Recovery(DR) category helps our customers fullfill their business continuity plans by creating processes that allow the recovery of GitLab following a natural or human-created disastery. Disaster recovery complements GitLab's High Availability configuration and utilizes Geo nodes to enable a failover in a disaster situation. We want disaster recovery to be robust and easy to use for systems administrators - especially in a potentially stressful recovery situation.
⚠️ Currently, there are some limitations of what data is replicated. Please make sure to check the documentation!
Please reach out to Fabian Zimmer, Product Manager for the Geo group (Email) if you'd like to provide feedback or ask
any questions related to this product category.
This strategy is a work in progress, and everyone can contribute:
- Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.
🔭 Where we are Headed
Setting up a disaster recovery solution for GitLab requires significant investment and is cumbersome in more complex setups, such as high availability configurations. Geo doesn't replicate all parts of GitLab yet, which means that users need to be aware of what they can recover in case of disaster.
In the future, our users should be able to use a GitLab Disaster Recovery solution that fits within their business continuity plan. Users should be able to choose which Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are acceptable to them and GitLab's DR solutions should provide configurations that fit those requirements.
A systems administrator should be able to confidently setup a DR solution even when the setup is complex, as is the case for high availability. In case of an actual disaster, a systems administrator should be able to follow a simple and clear set of instructions that allows them to recover a working GitLab installation. In order to ensure that DR works, frequent failovers should be tested.
- Disaster Recovery should cover different scenarios based on acceptable Recovery Time Objective (RTO) and Recovery Point Objective (RPO). There is always a trade off between the complexity of the system needed given the requirements in a disaster recovery. GitLab's DR strategies should make this explicit to users.
- Disaster Recovery should clearly define which data is replicated and why it is relevant for customers.
- Disaster Recovery should by default allow the recovery of all customer relevant data that was available on the production instance. Users should not need to think about caveats or exclusions.
- Disaster Recovery procedures in case of an actual disaster should be as simple as possible. All instructions should fit on on one laptop screen (< 10 steps) that are linear and easy to follow.
- Setting up DR solution(s) should be simple and clearly explain the trade offs for users.
- Disaster Recovery should allow for frequent failover testing that ensure DR is fully functional.
- Disaster recovery should integrate into a more holistic approach that includes High Availability and Geo-distributed configurations.
- Disaster recovery should be complemented by monitoring that can detect a potential disaster.
- The Disaster Recovery solution is actively used on GitLab.com to ensure that all best practices are followed and to ensure that we dogfood our own solutions.
- Disaster recovery solutions should scale from small installations with hundreds of users to extremely large installations with millions of users.
🎭 Target Audience and Experience
- 🙂 Minimal - Sidney can manually configure a disaster recovery solution using Geo nodes. More complex configurations, such as HA, are supported but are highly manual to set up. Some data may not be replicated. Failovers are manual.
- 😊 Viable - Sidney can configure a disaster recovery setup and all data is replicated. HA configurations are fully supported and all data is replicated.
- 😁 Complete - Sidney can choose between different configurations that clearly link back to suggested RTO and RPO requirements. Configuration is simple and all solutions are constantly monitored. A dashboard informs users of the current status. A recovery process is less than <10 steps.
- 😍 Lovable - Automatic failovers are supported.
For more information on how we use personas and roles at GitLab, please click here.
🚀 What's Next & Why
- Use Geo to support the Disaster Recovery strategy on GitLab.com - we need to use our own solutions to ensure we are confident in them working properly
- Simplify the disaster recovery process - the current process is highly manual and long - we need to make it as simple as possible for systems administrators to recover in an actual disaster.
- Create an overview of which data is replicated, verified and tested.
What is Not Planned Right Now
The GitLab DR category is not a replacement for off-site backups and we currently do not plan to include any additional backup methods into our disaster recovery category.
This category is currently at the minimal maturity level, and our next maturity target is viable (see our definitions of maturity levels).
In order to move this category from
viable, one of the main initiatives is to create a simplified disaster recovery process. High Availability configurations are also fully supported and all data is replicated.
🏅 Competitive landscape
We have to understand the current DR landscape better and we are actively engaging in customer meetings to understand what features are required to move DR forward.
We do need to interact more closely with analysts to understand the landscape better.
Top Customer Success/Sales issue(s)
🎢 Top user issues
🦊 Top internal customer issues/epics
Top Strategy Item(s)