Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Production Architecture

This document only covers our shared and GitLab shared runners, which are available for GitLab.com users and managed by the Infrastructure teams.

On this page

General Architecture

Our CI infrastructure is hosted on Google Cloud Engine (GCE) with a backup environment on Digital Ocean (DO). In GCE we use the us-east1-c and us-east1-d regions. In the event of GCE downtime, we can unpause the Digital Ocean runners which run in NYC1. All of them are configured via Chef. These machines are manually created and added to chef and do NOT use terraform at this time.

In each region we have few types of machines:

Runner managers connect to the GitLab.com and dev.gitlab.org API in order to fetch jobs that need to be run. The autoscaled machines clone the relevant project via HTTPS.

The runners are connected as follows:

Detailed Architecture

Source

Data Flow

Management Data Flow

GitLab Data Flow

Cloud Region Internal Communication

Cloud Region External Communication

Digital Ocean Specific Services

Deployment and Configuration Updates

The Runner and it’s configuration is handled with Chef and defined on chef.gitlab.com. The detailed upgrade process is described in the associated runbook.

In summary:

Why the difference?

When we’re updating Runner, the process needs to be stopped. If this is done during job’s execution, it will break the job. That’s why we use Runner’s feature named graceful shutdown. By sending SIGQUIT signal to the Runner, we’re causing Runner to not request new jobs but still wait for existing ones to finish. If this was done from inside of chef-client run it could fail in unexpected way. With the /root/runner_upgrade.sh script we’re first stopping Runner gracefully (with 7200 minutes timeout) and then starting chef-client to update the version.

For Runner’s configuration update there is no need to stop/restart Runner’s process and since we’re not changing Runner’s version, chef-client is not upgrading package (which could trigger Runner’s process stop). In that case we can simply run sudo chef-client. This will update the config.toml file and Runner will automatically update most of the configuration.

Some of the general configuration parameters can’t be refreshed without restarting the process. In that case we need to use the same script as for the Runner Upgrade.

Additional Features

We also have a few processes that are configured on some of the runner-manager-X machines and are not included in the graphs above:

All above metrics can be tracked on our CI Dashboard.

hanging-droplets-cleaner and droplet-zero-machines-cleaner processes are specific for DigitalOcean integration. We’ve discovered specific problems for this cloud provider and the tools were developed to automatically handle the cleanup.

Monitoring Information