|Date Created||February 11, 2020|
|End Date||June 22, 2020|
|Slack||#wg_database-sharding (only accessible from within the company)|
|Google Doc||Sharding Working Group Agenda (only accessible from within the company)|
|Recordings||Sharding Working Group Playlist|
We have decided to close this Sharding focused working group and will open a Scaling Working Group with a different focus. The initial focus of this Sharding working group was to increase the scalability of our database with a long-term goal of 100x scalability. At the onset of this group, it was theorized that we would hit a database scalability wall within 6-12 months. Subsequent analysis and incremental scalbility efforts have indicated that we have significantly more scaling headroom. Based on the analysis we have a high degree of confidence that the current architecture is in good shape to handle our needs for the next 12 months: Database Capacity and Saturation Analysis (Iteration 1) This analysis will continue on a monthly basis. We have also identified areas of incremental database scalability that has been prioritzed by the database team: Reduce total size and growth of GitLab.com's PostgreSQL database. Between the ongoing analysis and incremental database improvements we have greatly reduced the urgency of database scalability.
Additionally, we have come to the consensus that sharding is not the desired approach for our long term scalability needs. This decision was informed through investigation, proofs of concept, research, interviews and various implementation proposals. Here's a brief list of items that helped to inform our decision to close this working group:
The core members of this working group will continue on with the Scaling Working Group to determine our long term scaling strategy and implementation. The rest of this working group page will remain for reference purposes.
A scalability approach that will give us 100x headroom over what we have now on GitLab.com. Additionally, the ability to isolate customer data is an influencing factor on the design and implementation.
At the onset of this working group, anecdotal information indicated that we were going to "hit a wall" on scaling our database to support our projected customer growth. Early estimates indicated us hitting a scaling wall anywhere from 6 - 12 months down the road. This estimate has since been revised to a rolling 12 month window due to Database Capacity and Saturation Analysis (Iteration 1). Database sharding was proposed as a solution to improve our scalability while simultaneously improving performance. We have since expanded our discussions from solely focusing on database sharding. Any solution, even if using database sharding technology, will require signifcant application changes as well.
The goal for customer isolation serves multiple purposes. Isolation of customer data would likely include distributing data across multiple servers. This level of distribution would improve availability by removing the single point of failure of our single database architecture. Additionally, we hearing more requests from customers to provide a solution that better separates customer data.
In support of our business goals of scalability and customer isolation we've identified the following areas of investigation.
Details can be found in the Postgres Sharding (&1854) epic. This area of investigation is focuses on sharding at the top-level namespace. The initial investigations were database-centric, focusing on sharding the tables. Our investigations have indicated the following:
A proposal titled Tenant Sharding was recently introduced. Instead of sharding by the namespace, we introduce a higher-level entity, the tenant. By introducing the tenant entity, we turn GitLab.com into a multi-tenant SAAS platform, in the model of SAAS multi-tenant applications. Well known examples include Slack, Pagerduty, Datadog, etc. Each of these examples offers their users a scoped, isolated tenancy.
In parallel with the sharding investigation, the database team continues to look for areas of incremental database scalability improvements. Those efforts are being tracked under these issues/epics:
Partitioning is an important subject to cover separate from sharding. If we ultimately decide that database sharding is the chosen solution to achieve our business objectives, then database partitioning is the foundation upon which database sharding is built in PostgreSQL. Even if we don't use it for sharding, partitioning directly improves query performance and is therefore a great tool to use on its own. Our first iteration of database partitioning will be implemented on audit events. We expect that the implementation of paritioning will result in performance improvements and tooling implementation (e.g. migrations) for subsequent partitioning and sharding implementations.
The different sharding approaches, Namespace vs. Tenant, are being evaluated. They are competing approaches but each have the same goal of achieving our business goals. We are still working through the potential first iteration and implementation details of these approaches. In both cases we will need to identify and quantify the changes required at the database and application level.
While we continue to investigate Namespace vs. Tenant sharding, we can continue with the Incremental Scalability Improvements and Database Partitioning Implementation and realize immediate performance and scalability improvements.
|Working Group Role||Person||Title|
|Executive Stakeholder||Christopher Lefelhocz||VP of Development|
|Facilitator||Craig Gomes||Engineering Manager, Database|
|DRI for Sharding Working Group||Craig Gomes||Engineering Manager, Database|
|Functional Lead||Nailia Iskhakova||Software Engineer in Test, Database|
|Functional Lead||Josh Lambert||Group Manager, Product Management, Enablement|
|Functional Lead||Gerardo "Gerir" Lopez-Fernandez||Engineering Fellow, Infrastructure|
|Functional Lead||Stan Hu||Engineering Fellow, Development|
|Functional Lead||Andreas Brandl||Staff Backend Engineer, Database|
|Member||Chun Du||Director of Engineering, Enablement|
|Member||Pat Bair||Senior Backend Engineer, Database|
|Member||Tanya Pazitny||Quality Engineering Manager, Enablement|
|Member||Mek Stittri||Director of Quality Engineering|
The agenda doc can be found in our Google Drive when searching for "Sharding Working Group Agenda"