Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Strategy - Geo-replication

🌏 Geo-replication

Last updated: 2020-12-23

Introduction and how you can help

The Geo-replication category helps distributed developer teams be more productive. With a single GitLab instance working with large repositories can take a long time for developers located in different geographies. Geo-replication provides an easily configurable, read-only mirror (we call it a Geo site) of a GitLab installation that is complete, accurate, verifiable and efficient. This is valuable because using Geo reduces the time it needs to fetch and clone repositories, which increases developer productivity.

Please reach out to Fabian Zimmer, Product Manager for the Geo group (Email) if you'd like to provide feedback or ask any questions related to this product category.

This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.

Overview

Geo-replication requires a significant investment to be configured by systems administrators but allows users in different locations to accelerate git read operations. Write requests are transparently proxied to the primary site. Geo replicates around 80% of the data generated by GitLab and a number of data types can be disabled if required.

Where we are headed

Our goal for Geo-replication is to offer the same experience to users, regardless of their location. In the future, we want our users to be able to configure Geo within minutes - not hours. We envision Geo-replication to be fully transparent to users. This means that a developer should not need to actively decide to use Geo, or select the right Geo site - GitLab should be able to determine what Geo site should be used to provide the best user experience. For systems administrators, it should be simple to add, configure and remove new sites.

Target audience and experience

Sidney (Systems Administrator)

Sasha (Software Developer)

For more information on how we use personas and roles at GitLab, please click here.

What's next & why

Instrument usage ping

We need to understand better how software developers interact with secondary sites. As a next step, we are going to track Git operations performed on secondary sites. This data will help us understand usage patterns and how we can improve the overall user experience.

Geo automatically chooses the best GitLab site

Using a Geo site to overcome UX issues (e.g. latency) requires additional configuration for software developers, which is cumbersome. Using the secondary Web interface is a worse user experience than using the primary. A software developer needs to switch between a primary and secondary frequently, which can be highly confusing and frustrating.

We plan to automatically choose the best Geo node. This means that Geo will forward any requests from a secondary to a primary unless the user experience can be significantly improved by using the secondary. This will likely result in the deprecation of the read-only web interface because requests will be proxied from a secondary to a primary.

Package file verification

We are working on supporting more and more data types using a self-service framework and are adding verification capabilities to the framework to ensure that data was not corrupted during transfer or at rest. Following support for package file verification, LFS files will be supported next via the self-service framework.

In a year

Geo should be easy to setup

Setting up Geo is highly manual and cumbersome, especially in high-availability configurations. In the beginning of 2021, we are going to start investigating how we can simplify Geo's setup - especially for single-node Geo sites.

Geo supports an advanced caching mode

For Geo-replication only a subset of data may need to be replicated but Geo sites require spinning up the entire GitLab stack, less may be sufficient. Additionally, systems administrators can select a subset via selective sync, but they may be wrong.

We are investigating an advanced caching mode with the following properties:

What is not planned right now

We are currently not planning on moving away from Postgres as a backend database in favour of e.g CockroachDB or Google Spanner. This has implications for multi-mode Geo, but for now we will continue to support PostgreSQL.

Writable Geo sites

Geo secondary sites are read-only. Customer feedback has indicated a desire for additional Active active git replication. With the availability of Gitaly Cluster we may start investigating writable Geo sites in FY22.

Maturity plan

This category is currently at the viable maturity level, and our next maturity target is complete (see our definitions of maturity levels.

You can track the work that will move the category to complete in this epic.

Metrics

Competitive landscape

The top competitors for Geo-replication are

Feature overview

Feature GitHub AzureDevOps Bitbucket Smart Mirroring GitLab
Mirror repositories
Active-active replication N/A
Selective sync N/A N/A ⚠️
UI configuration N/A ⚠️
Kubernetes support ⚠️
Mirror docker registries N/A
LFS and file upload support N/A
Automatic DNS ⚠️
GUI Dashboard N/A
Request proxying N/A N/A ⚠️

✅ Fully available ⚠️ Partially available ❌ Not available N/A No information available

Analyst landscape

We do need to engage with analysts more closely to understand the current landscape better.

Top customer success/sales issue(s)

Top user issues

Top internal customer issues/epics

Top strategy item(s)

Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license