Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Strategy - Backup and Restore

🗄 Backup and Restore

Last updated: 2021-06-22

Introduction and how you can help

Please reach out to Nick Nguyen, Acting Product Manager for the Geo group (Email) if you'd like to provide feedback or ask any questions related to this product category.

This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.

Backups are incomplete and do not yet contain some data. Please review the list of excluded data and take manual steps to backup these data. {: .alert .alert-warning}

⚠️ The Geo group is focusing on improving GitLab's Disaster Recovery capabilities. We do support backup and restore on a best-effort basis. This means, we fix bugs in line with our SLOs but we have no capacity to contribute major feature improvements, such as incremental backups. We will re-evaluate the priority of this work in Q3 FY22. {: .alert .alert-warning}

Overview

GitLab supports backup and restore procedures that rely on standard unix tools, such as rsync and tar. By default, backups cover most data but not GitLab’s configuration. For GitLab instances that contain several hundred gigabytes or even terabytes, the current solution does not scale well. This means that backing up or restoring such a GitLab instance can take many hours.

Why is this important?

GitLab is a crucial tool for many customers and backups are a must-have for self-managed users. Our lack of a scalable backup solutions requires our largest and most valuable customers to spend time implementing their own solutions. This is not efficient and leads to a very heterogeneous landscape that is difficult to maintain and support. GitLab should offer backup and restore capabilities for any scale and offer clear guidance for all reference architectures.

Target audience and experience

Backup and restore tools are primarily used by Sidney - (Systems Administrator).

Backups should be complete, easy to create, automate, and restore. They also need to complete as fast as possible.

What's Next & Why

In Q3 FY22 we will re-evaluate the priority of the backup and restore category and are going to assess all currently open issues to define the priorities further.

In order to reach viable maturity, the following issues need to be addressed.

Add support for more data types in backup/restore

Our backups are incomplete - we don’t back up important data such as package files and Terraform state files. This is unacceptable for a backup/restore solution. Newly added data types are not automatically supported and will require additional work to support. Dedicated backup/restore engineers should be available to provide guidance and potentially implement a framework similar to Geo’s self-service framework.

GitLab backups should scale well

GitLab is used by customers with thousands of users and terabytes of data. Backups should be fast at any scale. This means that GitLab should not only support base backups but also incremental backups. Copying terabytes of data every time when a backup is performed is not efficient at this scale and can take many hours. Backups should also because be agnostic with regards to the backend - local storage, cloud storage etc. should all be configurable.

GitLab's backups also don't support backing up pool repositories, which makes it highly inefficient to backup instances with many forks.

Future Opportunities

Increase usability

Our current backup system is only command-line focused. By adding a Backup/Restore admin interface, we would allow systems administrators to manage backups from within GitLab’s UI. This would increase visibility of backups and increase the overall usability of GitLab. We also lack any alerting or monitoring to discover backup problems. We could significantly improve management of cloud-based storage and integrate backups as part GitLab Plus.

Additional capabilities

GitLab lacks support for many backup features, including incremental backups, selective restore, default encryption. By adding these additional features we would be able to move the maturity of our Backup and Restore capability to complete. We also need to invest in performance and scalability improvements to support our largest customers (10k+). One example is backing up Git data to object storage to drastically reduce the time it needs to create a new backup.

Currently, GitLab uses rsync to create backups and we should investigate alternatives e.g restic to see if those help address some of these concerns.

Utilize Geo sites

We could establish Geo as a backup site, which reduces load on the primary site and can offer advanced selective restore functionality. For example a customer could restore a project from a Geo secondary site. This would provide an avenue to establish Geo as a one-stop solution for Disaster Recovery including a warm standby and backups.

Restoring specific data should be easy

Sometimes, a user may remove a single project by accident. In those cases, it may be desirable to restore only individual items from the backup. This should ideally be possible via the UI and can be performed by a systems administrator.

What is not planned right now

We are in the process of defining the product direction and are not in a position to answer this yet.

Maturity plan

This category is currently at the minimal maturity level, and our next maturity target is complete (see our definitions of maturity levels).

User success metrics

To measure these success metrics, we also need to enable GitLab's usage ping and gather data specific to the backup and restore process. For example, the time it took for a backup to complete.

Competitive landscape

All major competitors offer backup solutions for their products. GitHub, for example, offers a more robust and scalable backup/restore solution that allows for incremental backups done on a separate host.

Analyst landscape

We do need to interact more closely with analysts to understand the landscape better.

Top customer success/sales issue(s)

Top user issues

Top internal customer issues/epics

Top strategy item(s)

Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license