Gitlab hero border pattern left svg Gitlab hero border pattern right svg

SRE Onboarding

On this page

Onboarding Template

SRE onboarding is mostly handled by an issue template that is assigned to the SRE when they start. This will guide them through different areas of the system, starting off with some simple tasks and help both the SRE and the SRE manager through various access issues. Infrastructure Management

The SRE teams use Terraform and Chef for configuration management of infrastructure.


Terraform configuration is currently divided into three environment:

There is shared terraform config for both staging and production to keep topology parity between these environments. Instance sizing, fleet sizes and other environment specific configuration is set in variable files for staging, production and ops.

The state for terraform is maintained in object storage, the master branch should always represent the current state of infrastructure. Changes should be applied after they are merged. There is ongoing work to improve terraform config management including automation using GitLab CICD. For more information about this see


Chef is a critical part of SRE infrastructure management. Currently it is used for OS patching, applying system level configuration and installing the omnibus package for releases. Here are a few notable cookbooks which will be a good starting-point for new SREs:


Releases candidates are deployed to until the official release on the 22nd. For information about how releases at read the release process documentation.

For information about deployments and patches see the following release docs:

Where to find things


The following repositories are used for infrastructure management. These repository locations are the remotes that the SRE team uses for pushes, issues and MRs. Mirrors are setup in case that is unavailable. Repositories that are necessary for assets, configuration, infrastructure, releases and patch management use as a remote.

  1. terraform: This is the repository that holds all terraform configuration for the staging, production and operations environments. There is a repository mirror on .

  2. chef cookbooks: These repositories are the cookbooks used for Runlists for the fleets are configured in roles. There are repository mirrors for these cookbooks on

  3. chef: This repository contains all role and node attributes for infrastructure. It also has the environment configurations for production, staging and ops for cookbook version pinning. There is a repository mirror on

  4. runbooks: This repository contains runbooks, howtos and alert definitions for Alerts defined in this repository are automatically applied to the monitoring infrastructure when merged to master. For more information see the alert manual. There is a repository mirror on


It is useful to have the following dashboards bookmarked and easily accessible

  1. Grafana
  2. Google Cloud
  3. System Logs
  4. Fastly CDN

Cloud Providers

  1. Google Cloud
  2. Amazon Web Services
  3. DigitalOcean
  4. Azure

Monitoring tools

  1. PagerDuty Alerting
  2. Grafana Peformance Monitoring
  3. Alert Dashboard
  4. AlertManager Production
  5. AlertManager Staging

Issue Trackers

It is useful to have the following issue trackers bookmarked and easily accessible

  1. On Call Issues
  2. Production Incidents Issues
  3. Change Management Issues


SREs should be using a YubiKey and should not have keys on their laptop.

Follow the yubikey runbook to set up


The following is intended to be a comprehensive list of credentials and access that need to be set up, which are not covered above or elsewhere in the handbook. The list may not be up to date. If something is missing, please add it.

  1. SSH Key - this is provided to you by the yubikey setup
  2. account
  3. admin account
  4. account
  5. account
  6. Chef access
  7. Cloud Providers
    • Amazon Web Services
    • Azure
    • Digital Ocean
    • Google Cloud

Slack Channels

  1. #production (Say hello to @marvin to create an account)
  2. #infrastructure-lounge
  3. #alerts (There are several alerts channels)


Every SRE should register for a “Light Agent” account in ZenDesk. Often times incidents are generated from customer reports, and it’s useful to see their submission and the back and forth with support. You can also leave internal notes for support engineers so that they can gather more information for troubleshooting purposes. See 'Light Agent' Zendesk accounts available for all GitLab staff

PTO Ninja

We use PTO Ninja to notify and delegate for planned timeoff. When setting up your integrations with Slack, be sure to run the /ninja settings command and add the team's shared Google Calendar (ID: as an "Additional Calendar".

Suggested Software Tools

As production engineers we are allowed to utilize a linux workstation. The list below is mostly comprised of OSX tools. You'll need to find the linux equivalent to match the linux distro of your choice.

In addition to the standard tools for interacting with the rest of GitLab, the following tools help when working on production issues.

Required tools

  1. Homebrew
  2. SSH, properly configured
  3. chef, knife, berkshelf
  4. kubectl (brew install kubernetes-cli)

Nice to have

  1. iTerm (brew cask install iterm2)
  2. A text editor such as Atom, Sublime, Textmate, MacVim, or neovim
  3. watch (brew install watch)
  4. tmux/tmate (brew install tmux tmate)
  5. A markdown editor such as macdown (brew cask install macdown)
  6. gsed (brew install gnu-sed)
  7. BitBar with GitLab Plugin

To replace mac utilities with gnu core utilities use the –with-default-names option.

Brew Files

There are sample brew files in the Infrastructure Project

iOS apps

  1. Slack
  2. Zoom
  3. PagerDuty
  4. Working Copy (Optional)

Reference Material

List of relevant reference material that an engineer may need to brush up on

  1. Chef
  2. Terraform Docs or getting started guide