Distribution Team Infrastructure and Maintenance

On this page

Build Machines

GitLab CI runner manager is responsible for creating build machines for package builds. This node configuration is managed by cookbook-gitlab-runner. Configuration values are stored in the vault named the same as the node, see example.

Currently, the version of GitLab CI runner is locked. We aim to be close to the current version of runner in order to get the fixes that we need without getting into issues that could cause a failure. These failures could prevent the release from going out so be careful with unnecessary changes on these nodes.

Runner manager machines

Distribution team maintains 2 runner manager machines for running different types of pipelines. Both these machines are in GCP project omnibus-build-runners.

  1. build-runners.gitlab.org:
  2. build-trigger-runner-manager.gitlab.org

build-runners.gitlab.org

This runner manager manages the machines used for building and publishing official GitLab CE and EE packages. It is locked to the omnibus-gitlab project in dev.gitlab.org.

It spins up three types of machines:

  1. Machines for building packages. They are n1-highcpu-32 machines with 60GB SSD disks, spawned inside GCP using google docker-machine driver.

  2. package-promotion machines for uploading packages. Since they are only used to upload packages, they are scaled down to save costs. They are n1-standard-2 machines, spawned inside GCP using google docker-machine driver.

  3. ARM machines for building Raspberry Pi packages. They are C1 machines spawned inside Scaleway using scaleway docker-machine driver.

build-trigger-runner-manager.gitlab.org

This runner manager manages the machines used for building packages as part of triggered pipeline used by developers to test their changes. It spins up n1-highcpu-32 machines with 50GB SSD disks inside GCP using google docker-machine driver.

Maintenance tasks

Requirements:

Changing version of GitLab CI runner

When builds are pending on dev.gitlab.org

The common reason for builds to be pending on dev.gitlab.org project is that the number of failed machines is high. Failed machines prevent the runner manager from starting up new machines and this can slow down or even block the release. To resolve this, we need to clean those failed machines. The steps to do this are:

  1. Login to the build machine node

  2. Enter the root session: sudo su. This is required because docker-machine command will list running machines for currently active user

  3. Run docker-machine ls. This will print out the list of machines that are either in Running, Error or have an empty state.

  4. To list only machines in Error state, you can use
     /root/machines_operations.sh list-failing
    
  5. To safely clean the machines with Error state, run
     /root/machines_operations.sh remove-failing
    
  6. If the machine has an empty state, you can always remove the machine manually. Running
     docker-machine ls | grep -v 'Running' | awk '{print $1}' | xargs docker-machine rm --force
    

    will remove all machines that do not have Running state.