The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
Last updated: 2022-07-05
Please reach out to Sampath Ranasinghe, Senior Product Manager for the Geo group (Email) if you'd like to provide feedback or ask any questions related to this product category.
This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.
⚠️ Backups are incomplete and do not yet contain some data. Please review the list of excluded data and take manual steps to backup these data.
⚠️ The Geo group is focusing on improving GitLab's Disaster Recovery capabilities. We do support backup and restore on a best-effort basis. This means, we fix bugs in line with our SLOs but we have no capacity to contribute major feature improvements, such as incremental backups. We will re-evaluate the priority of this work in Q3 FY23.
GitLab supports backup and restore procedures that rely on standard unix tools, such as rsync
and tar
. By default, backups cover most data but not GitLab’s configuration. For GitLab instances that contain several hundred gigabytes or even terabytes, the current solution does not scale well. This means that backing up or restoring such a GitLab instance can take many hours.
GitLab is a crucial tool for many customers and backups are a must-have for self-managed users. Our lack of a scalable backup solution requires our largest and most valuable customers to spend time implementing their own solutions. This is not efficient and leads to a very heterogeneous landscape that is difficult to maintain and support. GitLab should offer backup and restore capabilities for any scale and offer clear guidance for all reference architectures.
Jobs To be Done (JTBD) is a framework for viewing product in terms of the process customers trying to achieve. Learn more about the JTBD.
In Q3 FY23 we will re-evaluate the priority of the backup and restore category and are going to assess all currently open issues to define the priorities further.
In order to reach viable maturity, the following issues need to be addressed.
GitLab is used by customers with thousands of users and terabytes of data. Backups should be fast at any scale. This means that GitLab should not only support base backups but also incremental backups. Copying terabytes of data every time when a backup is performed is not efficient at this scale and can take many hours. Backups should also because be agnostic with regards to the backend - local storage, cloud storage etc. should all be configurable.
GitLab's backups also don't support backing up pool repositories, which makes it highly inefficient to backup instances with many forks.
Our current backup system is only command-line focused. By adding a Backup/Restore admin interface, we would allow systems administrators to manage backups from within GitLab’s UI. This would increase visibility of backups and increase the overall usability of GitLab. We also lack any alerting or monitoring to discover backup problems. We could significantly improve management of cloud-based storage and integrate backups as part GitLab Plus.
GitLab lacks support for many backup features, including incremental backups, selective restore, default encryption. By adding these additional features we would be able to move the maturity of our Backup and Restore capability to complete. We also need to invest in performance and scalability improvements to support our largest customers (10k+). One example is backing up Git data to object storage to drastically reduce the time it needs to create a new backup.
Currently, GitLab uses rsync
to create backups and we should investigate alternatives e.g restic to see if those help address some of these concerns.
We could establish Geo as a backup site, which reduces load on the primary site and can offer advanced selective restore functionality. For example a customer could restore a project from a Geo secondary site. This would provide an avenue to establish Geo as a one-stop solution for Disaster Recovery including a warm standby and backups.
Sometimes, a user may remove a single project by accident. In those cases, it may be desirable to restore only individual items from the backup. This should ideally be possible via the UI and can be performed by a systems administrator.
We are in the process of defining the product direction and are not in a position to answer this yet.
This category is currently at the minimal maturity level, and our next maturity target is viable (see our definitions of maturity levels).
To measure these success metrics, we also need to enable GitLab's usage ping and gather data specific to the backup and restore process. For example, the time it took for a backup to complete.
All major competitors offer backup solutions for their products. GitHub, for example, offers a more robust and scalable backup/restore solution that allows for incremental backups done on a separate host. There are also standalone solutions such as GitProtect.
We do need to interact more closely with analysts to understand the landscape better.