Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Strategy - Backup and Restore

🗄 Backup and Restore

Last updated: 2020-05-05

Introduction and how you can help

Please reach out to Fabian Zimmer, Product Manager for the Geo group (Email) if you'd like to provide feedback or ask any questions related to this product category.

This strategy is a work in progress, and everyone can contribute:

⚠️ The Geo group is currently focusing on improving GitLab's Disaster Recovery capabilities. We do support backup and restore on a best-effort basis. This means, we fix bugs in line with our SLOs but major feature improvements will likely happen later in Q3/Q4 and depending on overall prioritization.

Overview

GitLab already supports backup and restore procedures that rely on standard unix tools, such as rsync and tar. By default, backups cover most data but not GitLab’s configuration. For GitLab instances that contain several hundred gigabytes or even terabytes, the current solution does not scale well. Backing up or restoring such a GitLab instance can take many hours.

Target audience and experience

Backup and restore tools are primarily used by Sidney - (Systems Administrator).

Backups should be complete, easy to create, automate, and restore. They also need to complete as fast as possible, for example, to support point in time recovery.

What's Next & Why

We are going to assess all currently open issues to define the priorities further; these items reflect our current thinking and still need to be validated.

GitLab backups should scale well

GitLab is used by customers with thousands of users and terabytes of data. Backups should be performant at any scale. This means that GitLab should not only support base backups but also incremental backups. Copying terabytes of data every time when a backup is performed is not efficient at this scale and can take many hours. Backups should also because be agnostic with regards to the backend - local storage, cloud storage etc. should all be configurable.

Currently, GitLab uses rsync to create backups and we should investigate alternatives e.g restic to see if those help address some of these concerns.

In three years

Backup plans should be part of all reference architectures and should be enabled by default. If a backup architecture follows a defined deployment model e.g. via terraform. A GitLab Backup should be able to recreate the entire architecture, configuration and data from scratch.

Systems administrators should be able to use a complete UI and CLI that gives them access to all backup related functions.

In one year

Backups should be incremental and work better at scale. We should be able to utilise a Geo secondary.

Backup and Restore via the UI

Currently, all backup tasks are performed via rake tasks in the command line. This is not necessarily a problem but it can result in low visibility and requires systems administrators to switch between different user interfaces. There is no place in the GitLab user interface that allows systems administrators to access backups.

GitLab should offer a dedicated Backup and Restore section that allows some of these functionalities:

Restoring specific data should be easy

Sometimes, a user may remove a single project by accident. In those cases, it may be desirable to restore only individual items from the backup. This should ideally be possible via the UI and can be performed by a systems administrator.

Backing up from a Geo instance

Sometimes a GitLab primary instance is under pressure from heavy usage and backing up may add additional load that is not desirable. As Geo becomes more complete, it will contain most if not all data from a primary. This means, backups should be able to run on a secondary, thereby reducing the pressure on the secondary. This is especially desirable for the PostgreSQL database.

What is not planned right now

We are just in the process of defining the product direction and are not in a position to answer this yet.

Maturity plan

This category is currently at the minimal maturity level, and our next maturity target is complete (see our definitions of maturity levels).

We are still investigating what is required to move the category from minimal to complete. You can track the work in the viable maturity epic.

User success metrics

Why is this important?

GitLab is a crucial tool for many customers and if GitLab does not handle backups, customers will be forced to implement their own strategies. This is not efficient and leads to a very heterogeneous landscape that is difficult to maintain and support. GitLab should offer backup and restore capabilities for any scale and offer clear guidance for all reference architectures. This is important because because it enables customers to fit GitLab into their business continuity plans.

Competitive landscape

All major competitors offer backup solutions for their products.

Analyst landscape

We do need to interact more closely with analysts to understand the landscape better.

Top customer success/sales issue(s)

Top user issues

Top internal customer issues/epics

Not yet available.

Top strategy item(s)

GIT is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license