Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Strategy - Disaster Recovery

🚨 Disaster Recovery

Last updated: 2021-06-22

Introduction and how you can help

GitLab installations hold business critical information and data. The Disaster Recovery (DR) category helps our customers fulfill their business continuity plans by creating processes that allow the recovery of a GitLab instance following a natural or human-caused disaster. DR complements GitLab's Reference Architectures and utilizes Geo sites to enable a failover in a disaster situation. We want DR to be robust, complete and easy to use for systems administrators - especially in a potentially stressful recovery situation.

Please reach out to Nick Nguyen, Acting Product Manager for the Geo group (Email) if you'd like to provide feedback or ask any questions related to this product category.

This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on is the best way to contribute to our strategy and vision.


⚠️ Currently, there are some limitations of what data is replicated. Please make sure to check the documentation!

Setting up a Disaster Recovery solution for GitLab requires significant investment and is cumbersome in more complex setups. Geo replicates around 83% of GitLab's data, which means that systems administrators need to be aware of what is automatically covered and what parts need to be backed up separately, for example via rsync. Geo provides documentation for planned and unplanned failover processes.

Where we are headed

In the future, our users should be able to use a GitLab Disaster Recovery solution that fits within their business continuity plan. Users should be able to choose which Recovery Time Objective (RTO) and Recovery Point Objective (RPO) are acceptable to them and GitLab's DR solution should provide configurations that fit those requirements.

A systems administrator should be able to confidently setup a DR solution even when the setup is complex, as is the case for reference architectures that support several thousand users. In case of an actual disaster, a systems administrator should be able to follow a simple and clear set of instructions that allows them to recover a working GitLab installation - in most cases a failover should be fully automatic and require minimal user intervention!

In order to ensure that DR works, failovers should be tested on a regular basis with minimal interruption to end-users.

We envision that GitLab's Disaster Recovery processes and solution should

Target audience and experience

Sidney - (Systems Administrator)

For more information on how we use personas and roles at GitLab, please click here.

What's Next & Why

A complete overview of work required to reach complete maturity is available in the Disaster Recovery Complete maturity epic.

Improved data verification

As of June 2021, Geo replicates ~86% and verifies ~48% of all data; however, not all data is automatically verified. We've created a self-service framework that supports replication strategies for Git repositories and blobs (files). We are adding blob verification support to the framework, with package files being supported first.

Simplifying Promotion of Secondary Sites

It is currently possible to promote a secondary site to a primary site, either during a planned failover or in a genuine disaster recovery situation. Geo supports promotion for a single node installation and for an HA configuration. The current promotion process is consists of a large number of manual preflight checks, followed by the actual promotion. The promotion is only possible in the command line, no UI flow is possible and for high-availability configurations modifications to the gitlab.rb file are required on almost all nodes. Given the critical nature of this process, Geo should make it simple to promote a secondary, especially for more complex high-availability configurations.

Migrating existing datatypes to the Self Service Framework

Some of our existing datatypes, such as LFS, do not yet use the self service framework. We are migrating these datatypes over to reduce technical debt and so that all datatypes can benefit from new features that are added to the framework.

In a year

Enable Geo on for Disaster Recovery is by far the largest GitLab instance and is used by GitLab to dogfood GitLab itself. does not use GitLab Geo for DR purposes. This has many disadvantages and the Geo Team is working with Infrastructure to enable Geo on

What is not planned right now

We currently don't plan to replace PostgreSQL with a different database e.g. CockroachDB.

Maturity plan

This category is currently at the viable maturity level, and our next maturity target is complete (see our definitions of maturity levels).

In order to move this category from viable to complete we are working on all items in the complete maturity epic.


We currently track the total number of replication events, which scales with the overall amount of data and our ability to replicate more data types.

Competitive landscape

GitHub Enterprise Server 2.22 supports a passive replica server that can be used for disaster recovery purposes.

Analyst landscape

We do need to interact more closely with analysts to understand the landscape better.

Top customer success/sales issue(s)

Top user issues

Top internal customer issues/epics

Top strategy item(s)

Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license