Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Category Strategy - Sharding

💎 Sharding

Last updated: 2021-10-14

Introduction and how you can help

Please reach out to Fabian Zimmer, Group Product Manager for Enablement (Email) if you'd like to provide feedback or ask any questions related to this product category.

This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics on this page. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.

Overview

GitLab.com, our SaaS offering, is growing rapidly. This growth requires that the underlying infrastructure components are able to scale to accommodate additional users. The GitLab.com production architecture highlights the different components and the reference architectures provide an overview for self-managed customers.

Scaling GitLab requires different strategies for the individual components. For example, web application nodes are stateless and can be scaled relatively easily by creating more individual servers. Stateful components are much harder to scale. As a single solution for the entire DevOps lifecycle, GitLab depends on a single data-store which serves as a the single source of truth of data. For GitLab, this data store is mostly a single PostgreSQL database. Over time, GitLab has added additional databases for specific features, such as Gitaly Cluster, Geo and the Container Registry. Adding new data stores requires approval from the CEO and all engineering fellows to avoid unnecessary proliferation of data stores.

GitLab's database on GitLab.com is provisioned as a single logical database with a primary server and several physical read-only replicas. Given the continuing growth of GitLab.com, this PostgreSQL database needs to handle more and more transactions per second. Reading data can be accelerated by provisioning additional replicas, writing new data, however, can't be easily scaled in the same way. There can only be one primary server and all writes have to go through it. In order to address this problem there are several possible solutions:

  1. Buy more capable hardware - Bigger servers can handle more transactions. This is generally referred to as vertical scaling

  2. Define a horizontal scaling strategy

GitLab.com is approaching a point where buying bigger servers is no longer easily possible. For this reason, the Database Scalability Working Group was founded to define and implement strategies to scale GitLab's database.

The Sharding Group is concerned with delivering application changes that allow GitLab and GitLab.com to scale to millions of users and implement the strategies defined in the Database Scalability Working Group.

Where we are headed

In the future we expect that GitLab.com can

These outcomes are also defined as the exit criteria of the database scalability working group.

What's Next & Why

Following a number of proof of concept implementations, the Sharding group is focusing on decomposing GitLab's database. This approach relies on moving all the tables associated with a feature into a separate logical database. We chose this approach because it is iterative and can be implemented in a shorter amount of time than sharding. When decomposing a certain feature, the team can focus on a smaller subset of tables while still solving some problems that are relevant to later strategies. We also gain confidence in operating multiple databases.

The Sharding group will focus on CI tables first. The reason for choosing CI tables is that they account for ~36% of the overall DB size and roughly 50% of writes. Decomposing these CI tables would effectively allow us to reduce writes on the main database by 50% because this additional logical database can be moved to a physically different database cluster. Decomposition is also sometimes referred to as vertical sharding.

Decomposing CI tables

The Sharding group is focusing on decomposing all tables (~50) that are connected to our CI feature. We've identified this feature because roughly 50% of writes can be attributed to CI. We are working to resolve all issues with the CI tables that are not easily fanned out to feature teams, mostly because the implementation is too technical and potentially dependent on other work in the group.

Supporting for many databases

To make decomposition a success, we need great support within the application to support many databases. Today, GitLab uses only one database (main) - when CI tables are decomposed the application needs to manage another database (ci).

This means that the application needs to be able to handle running database related tasks, such as migrations (normal, post, and background) on many databases and we need to decide on approaches for these generic database features.

With two databases, we also need to handle cross-database modifications. For example, cascading deletes won't work and foreign keys between tables located on different databases and we need to find alternative solutions via loose foreign keys.

Fixing broken features

The CI feature is used and referenced in many different parts of GitLab. We are fanning out issues to individual feature teams in the Dev, Ops, Sec and Enablement sections to fix broken features.

Deploying Decomposition

In order to benefit from Decomposition and realize the scalability improvements, we need to deploy Decomposition in Staging and Production. Given the scope of this change, we've defined an iterative rollout strategy for decomposition that allows us deploy changes in stages.

In a year

In Q1FY23 we expect GitLab.com will run on multiple databases that are decomposed by feature. We expect that at least two independent databases exist: main and ci. This will provide significant headroom and will allow the Sharding group to transition towards validating proposals for scaling GitLab even further.

Decomposition is only a first step to unlocking further scalability for GitLab. In a year the Sharding Group will have evaluated and implemented other scalability options for GitLab. There are several possible options that will have to be evaluated, including but not limited to sharding GitLab's database using existing PostgreSQL tooling, adopting Vitess, moving towards a Pod-like architecture and utilizing different datastores that fit specific use cases.

All self-managed customers will have transitioned to running decomposed logical databases within a single database cluster. All migrations will have completed with minimal interruption and all self-managed features, such as backups and restore, will work seamlessly when running multiple databases.

What is not planned right now

We currently don't plan to implement any scalability solutions for GitLab.com that would negatively impact our self-managed customers. We want all customers to benefit from further scalability

Metrics

Competitive landscape

This is a list of scaling solutions that others have implemented:

Top strategy item(s)

Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license