The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
Thanks for visiting this category strategy page for GitLab Geo Replication. This page belongs to the Geo group.
Geo replication improves the GitLab experience for geographically distributed teams.
Pulling a large Git repository, or interacting with the UI, can take a long time for locations geographically remote from the main GitLab instance.
Geo-replication provides an easily configurable, replica Geo site, which can be deployed in additional regions and accelerates the GitLab experience for nearby users. This is achieved by replicating the entire GitLab data set to the replica site in a coordinated, verifiable and coherent manner. Data can be accessed from any of the locations whilst intelligent proxying techniques guarantee users have access to the most recent data. Users are directed to the closest site using location aware URL.
Geo-replication requires a significant investment to be configured by systems administrators and is cumbersome in more complex setups.
Please reach out to Sampath Ranasinghe, Product Manager for the Geo group (Email) if you'd like to provide feedback or ask any questions related to this product category.
This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.
With more and more companies shifting towards working remotely, GitLab becomes a central place for collaboration and the DevOps Platform for many customers. GitLab should offer the same great experience to users, regardless of their location.
Geo operates transparently in the background, accelerating read operations by serving data from the closest Geo site and proxying write requests back to the primary all via a single URL.
Our vision for Geo-replication is for new Geo sites to be easily deployable to new locations as required and scaled to the requirements for these new remote teams. Geo will be easier to set up and operate, specially in multi-node architectures. It will be possible to deploy a new operational Geo site within minutes whether it is a single node or a multi-node site.
We will achieve 100% replication and verification of all core GitLab data types by leveraging the self-service framework. The self-service frame will allow other GitLab teams to easily add new data types to Geo ensuring Geo evolves with GitLab to meet the needs of our customers.
System administrators will have end-to-end observability of the replication process for each site via the UI with capabilities to troubleshoot and remediate failures. Make it easy for the systems administrator to understand what Geo operations are upcoming, in progress, completed and failed. For failed operations, Geo will provide information and capabilities in the UI to allow the system administrator to correct the fault.
Geo will accelerate more data types enabling more tasks to be accomplished with a data stored at the secondary sites thus helping to redistribute the load on the primary site. Geo will continue to intelligently proxy requests back to the primary when necessary to provide access to the most up-to-date data.
Customers will be able to choose which data types are replicated to each secondary Geo site with fine-grained controls thus being able to tailor each site to their needs.
Geo will also accelerate CI runners by allowing use of Geo secondary sites for CI jobs helping to reduce the load on the primary site.
For more information on how we use personas and roles at GitLab, please click here.
Geo will replicate and verify all core GitLab data types. We are currently at 90% for replication and 77% for verification. To achieve 100% replication we plan to add support for alert metric images, issue metric images and Dependency proxy
Geo performs complex operations by scheduling tasks in the background to replicate data to the secondary sites. Geo will improve the observability of these operations for the systems administrators such that they are able to monitor tasks that are currently running, tasks scheduled to run in the future and tasks that have failed. For tasks that have failed, Geo will surface information in the UI that will assist in troubleshooting the root cause of the failure. Improved observability will also enhance the experience for systems administrators who are setting up their secondary site for the first time, providing immediate feedback on whether the setup was successful and replication processes are successfully underway. Failed replication can hinder the availability and reliability of a secondary site. Further, it can negatively impact chances of successful recovery in disaster recovery scenarios resulting in the loss of data.
Today Geo accelerates access to a number of data types including projects, wikis and LFS objects by serving read requests from the secondary site closest to the user accessing the data. Geo replicates many other data types. Users can benefit from accelerating more data types with two key benefits:
We will initially focus on the high impact data types that will deliver the most value to our customer by identify data types that are large in size and most frequently accessed such as container registries and CI job and pipeline artifacts. As part of this effort, Geo will collect statistics on data types that are proxied to the primary from the secondary sites. This will allow us to make more informed decisions as to which data types to accelerate in the future. This will open up more use cases for Geo replication and help drive adoption.
Today Geo accelerates users by serving read requests from the secondary site, in future Geo should also accelerate CI jobs by enabling CI runners to use specific Geo secondary sites. This will allow the CI runners to fetch data directly from the closest secondary site reducing load on the primary site. Writes will continue to be proxied back to the primary to maintain consistency.
Setting up Geo is highly manual and cumbersome, especially in high-availability configurations. Simplifying the installation and configuration of Geo for single and multi-node sites will remove a pain point for systems administrators and help drive adoption.
To save bandwidth and resources, a systems administrator may want to selectively enable and disable Geo replication for certain types of data. Currently, this is not possible unless a data type is released behind a feature flag, and this is not the case for all data types. We want to provide administrators an easy way to enable or disable replication by data type in the Geo Administrator UI.
To configure and manage a multi-node Geo site requires logging in to multiple nodes to perform specific steps in the correct order.
It would make administrators jobs a lot easier if operations could be orchestrated from a single point of entry into the site and the appropriate operations performed on the sites. This will lower the technical barrier for adoption of Geo replication.
It is currently possible for systems administrators to get a basic overview of the Geo status using the Geo Web UI. However, administrators would like easier access to more in-depth Geo metrics such as the time it takes to mirror a commit. We want to define and implement key metrics that allow administrators to better monitor their Geo installations and publish the metrics to a preconfigured Grafana dashboard.
For Geo-replication only a subset of data may need to be replicated but Geo sites require spinning up the entire GitLab stack, less may be sufficient. Additionally, systems administrators can select a subset via selective sync, but they may be wrong.
We are investigating an advanced caching mode with the following properties:
We are currently not planning on moving away from PostgreSQL as a backend database in favour of e.g CockroachDB or Google Spanner. This has implications for writable Geo site Geo, but for now we will continue to support PostgreSQL.
Geo secondary sites are read-only. Customer feedback has indicated a desire for additional Active active git replication. With the availability of Gitaly Cluster we may start investigating writable Geo sites at some point in FY23.
Geo will remain an asynchronous solution with loose time constraints for replications and verification.
This category is currently at the
Complete maturity level, and our next maturity
Lovable (see our definitions of maturity
You can track the work that will move the category to
The top competitors for Geo-replication are
|Feature||GitHub||AzureDevOps||Bitbucket Smart Mirroring||GitLab|
|Mirror docker registries||❌||N/A||❌||✅|
|LFS and file upload support||✅||N/A||✅||✅|
✅ Fully available ⚠️ Partially available ❌ Not available N/A No information available