The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
Last updated: 2020-12-23
GitLab installations hold business critical information and data. The Disaster Recovery (DR) category helps our customers fulfill their business continuity plans by creating processes that allow the recovery of a GitLab instance following a natural or human-caused disaster. DR complements GitLab's Reference Architectures and utilizes Geo nodes to enable a failover in a disaster situation. We want DR to be robust, complete and easy to use for systems administrators - especially in a potentially stressful recovery situation.
Please reach out to Fabian Zimmer, Product Manager for the Geo group (Email) if you'd like to provide feedback or ask any questions related to this product category.
This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.
⚠️ Currently, there are some limitations of what data is replicated. Please make sure to check the documentation!
Setting up a Disaster Recovery solution for GitLab requires significant
investment and is cumbersome in more complex setups. Geo replicates around 80% of GitLab's data, which means that systems administrators
need to be aware of what is automatically covered and what parts need to be backed up separately, for example via
rsync. Geo provides documentation for planned and unplanned failover processes.
The Geo group completed all work required to increase the category from minimal to viable. After evaluating user experience scores, we will decide if the maturity of Disaster Recovery is consistent with viable maturity.
In the future, our users should be able to use a GitLab Disaster Recovery solution that fits within their business continuity plan. Users should be able to choose which Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are acceptable to them and GitLab's DR solution should provide configurations that fit those requirements.
A systems administrator should be able to confidently setup a DR solution even when the setup is complex, as is the case for reference architectures that support several thousand users. In case of an actual disaster, a systems administrator should be able to follow a simple and clear set of instructions that allows them to recover a working GitLab installation - in most cases a failover should be fully automatic and require minimal user intervention!
In order to ensure that DR works, failovers should be tested on a regular basis with minimal interruption to end-users.
We envision that GitLab's Disaster Recovery processes and solution should
For more information on how we use personas and roles at GitLab, please click here.
A complete overview of work required to reach complete maturity is available in the Disaster Recovery Complete maturity epic.
As part of a planned failover process, it is required to fully synchronize a primary Geo site and a secondary site so that no data is lost. We are working on a maintenance mode that blocks any write operations on the primary site thereby allowing both sites to fully get in sync. Additionally, a maintenance period may be useful in other situations e.g. during upgrades or other infrastructure changes.
As of December 2020, Geo replicates 80% of all data; however, not all data is automatically verified. We've created a self-service framework that supports replication strategies for Git repositories and blobs (files). We are adding blob verification support to the framework, with package files being supported first.
The Geo Primary site supports high-availability configuration of PostgreSQL using Patroni; however, Geo secondary sites have only experimental support for a similar configuration. This is problematic because it means that a failover to a designated secondary site can't utilize a high-availability configuration immediately. Exactly mirroring the architecture on the primary and secondary site is not yet possible. We are working on making PostgreSQL clusters generally available.
We've worked with systems administrators and are going to design and implement a new Geo overview page.
GitLab.com is by far the largest GitLab instance and is used by GitLab to dogfood GitLab itself. GitLab.com does not use GitLab Geo for DR purposes. This has many disadvantages and the Geo Team is working with Infrastructure to enable Geo on GitLab.com.
We currently don't plan to replace PostgreSQL with a different database e.g. CockroachDB.
This category is currently at the viable maturity level, and our next maturity target is viable (see our definitions of maturity levels).
In order to move this category from viable to viable we finished all work in the viable maturity epic. We are going to evaluate UX scores and then update the maturity level.
We currently track the total number of replication events, which scales with the overall amount of data and our ability to replicate more data types.
GitHub Enterprise Server 2.22 supports a passive replica server that can be used for disaster recovery purposes.
We do need to interact more closely with analysts to understand the landscape better.