Production Architecture

Our core infrastructure is currently hosted on several cloud providers, all with different functions. This document does not cover servers that are not integral to the public facing operations of

On this page

Current Architecture

Source, GitLab internal use only

Proposed Cloud Native Architecture

We are working on running on Kubernetes by containerizing all the different services and components that are necessary to run GitLab-EE at scale.

This is the proposed architecture to move from what we are running in static VMs to a container orchestration managed world. Production Environment

Source, GitLab internal use only

High Level Components View

Source, GitLab internal use only

Pods Definition

Source, GitLab internal use only

Database Architecture

Source, GitLab internal use only

Monitoring Architecture

Source, GitLab internal use only

Logging Architecture

Source, GitLab internal use only

Infrastructure "Services" and Their SLx's

In order for us to reach the goals around availability and latency for, we started by setting a target internal SLA for the service as a whole from the user's perspective. From those targets, we can work backwards through the architecture to determine what the Service Level objectives should be for the infrastructure "services" that support

Since we are relying on hardware that itself only offers an SLA of 99.9% availability, we face an "SLA inversion" (read more about this on blogspot or oreilly). For example, in the current situation, each time an NFS server goes down, this results in an outage of Since we are only guaranteed 99.9% uptime per NFS server, the maximum SLA for as a whole will be <= (99.9%)N, where N is the number of NFS servers. To overcome this, the service that is offered by the NFS servers either needs to be redesigned in some way (e.g. through using Gitaly), or the application that depends on it needs to have a way to not go down when the NFS service is unavailable (i.e. graceful degradation). Similar considerations apply to things such as the cache, background jobs processing, availability of the database, and so on.

To tackle this challenge, we consider the following elements of the infrastructure to be "services" that should be able to meet their own internal SLAs:

Host Naming Standards


A hostname shall be constructed by using the service offered by that node, followed by a dash, and a two digit incrementing number.

i.e.: sidekiq-NN, git-NN, web-NN

Service specific identifiers, when it connotes a difference in build or function, will be identified as -specific and precede the two digit numeric

i.e.: sidekiq-realtime-01

When services have both an internal and an external facing function, the usage of -int- or -ext- shall be used.

i.e.: api-int-01, api-ext-01

Service Tiers

Following the hostname shall be the service tier that the node belongs in:


Following the service tier shall be the environment:


Following the environment is the location where the host resides, for example:

TLD Zones

When it comes to DNS names all services providing as a service shall be in the domain, ancillary services in the support of GitLab (i.e. Chef, ChatOps, VPN, Logging, Monitoring) shall be in the domain.


Internal Networking Scheme

A visualization of the whole address space can be found by searching for " Status on Azure" on the google drive.


Virtual Network Name: GitLabProd

Resource Group: GitLabProd

IP space:

Subnet Name Subnet Range Tier Domain
ExternalLBProd Load balancers
InternalLBProd Load balancers
DBProd Databases
RedisProd Databases
ElasticSearchProd Databases
ConsulProd Support Services
VaultProd Support Services
DeployProd Support Services
DNSProd Support Services
MonitoringProd Logging
LogProd Logging
LogStorageProd Logging
APIProd Services
GitProd Services
SidekiqProd Services
WebProd Services
RegistryProd Services
MailroomProd Services
StorageProd Storage


Virtual Network Name: GitLabCanary

Resource Group: GitLabCanary

IP space:

Subnet Name Subnet Range Tier Domain
ExternalLBCanary Load balancers
InternalLBCanary Load balancers
APICanary Services
GitCanary Services
SidekiqCanary Services
WebCanary Services
RegistryCanary Services


Virtual Network Name: GitLabStaging

Resource Group: GitLabStaging

IP space:

Subnet Name Subnet Range Tier Domain
ExternalLBStaging Load balancers
InternalLBStaging Load balancers
DBStaging Databases
RedisStaging Databases
ElasticSearchStaging Databases
ConsulStaging Support Services
VaultStaging Support Services
DeployStaging Support Services
DNSStaging Support Services
LogStaging Logging
APIStaging Services
GitStaging Services
SidekiqStaging Services
WebStaging Services
RegistryStaging Services
ApiInternalStaging Services
MailroomStaging Services
StorageStaging Storage




Remote Access

Virtual Network Name: Mirrors Subnet Name

Resource Group: Mirrors Subnet Name

IP Space:

Subnet Name Subnet Range Tier Domain
VPN-East-US-2 Remote Access


The main portion of is hosted on Microsoft Azure. We have the following servers there.

Note that these numbers can fluctuate to adapt to the platform needs.

We also use availability sets to ensure that a minimum number of servers in each group are available at any given time. This ensures that Azure will not reboot all instances in the same availability set at the same time for anything that is planned.

All our servers run the latest Ubuntu LTS unless there is a specific need to do otherwise. Every server is configured with a fully fledged set of firewall rules for increased security.

Load Balancers

We utilize Azure load balancers in front of our HAProxy nodes. This allows us to leverage on the Azure infrastructure for HA as well as taking advantage of the power of HAProxy.

Additionally, we utilize an Azure load balancer to manage PostgreSQL failovers.

Service Nodes

Different services have different resource utilization patterns so we use a variety of instance types across our service nodes that are consistent for each group. We have recently isolated traffic by type on dedicated pools of nodes. We hope you noticed the performance improvement.

Digital Ocean

Digital Ocean houses several servers that do not need to directly interact with our main infrastructure. There are many of these that do a variety of things, however not all will be listed here.

The primary things on Digital Ocean at this time are:


We host our DNS with route53 and we have several EC2 instances for various purposes. The servers you will interact with most are listed below:

Google Cloud

We are currently investigating Google Cloud.


See how it's doing, for more information on that, visit the monitoring handbook.

Technology at GitLab

We use a lot of cool (but boring) technologies here at GitLab. Below is a non-exhaustive list of tech we use here.