Blog Engineering A visual guide to GitLab CI/CD caching
Published on: September 12, 2022
5 min read

A visual guide to GitLab CI/CD caching

Learn cache types, as well as when and how to use them.

cover.jpg

If you've ever worked with GitLab CI/CD you may have needed, at some point, to use a cache to share content between jobs. The decentralized nature of GitLab CI/CD is a strength that can confuse the understanding of even the best of us when we want to connect wires all together. For instance, we need to know critical information such as the difference between artifacts and cache and where/how to place setups.

This visual guide will help with both challenges.

Cache vs. artifacts

The concepts may seem to overlap because they are about sharing content between jobs, but they actually are fundamentally different:

  • If your job does not rely on the the previous one (i.e. can produce it by itself but if content already exists the job will run faster), then use cache.
  • If your job does rely on the output of the previous one (i.e. cannot produce it by itself), then use artifacts and dependencies.

Here is a simple sentence to remember if you struggle between choosing cache or artifact:

Cache is here to speed up your job but it may not exist, so don't rely on it.

This article will focus on cache.

Initial setup

We'll go with a simple representation of the GitLab CI/CD pipelining model and ignore (for now) that the jobs can be executed on any runners and hosts. It will help get the basics.

Let's say you have:

  • 1 project with 3 branches
  • 1 host running 2 docker runners

Initial setup

Local cache: Docker volume

If you want a local cache between all your jobs running on the same runner, use the cache statement in your .gitlab-ci.yml:

default:
  cache:
    path:
      - relative/path/to/folder/*.ext
      - relative/path/to/another_folder/
      - relative/path/to/file

local / container / all branches / all jobs

Using the predefined variable CI_COMMIT_REF_NAME as the cache key, you can ensure the cache is tied to a specific branch:

default:
  cache:
    key: $CI_COMMIT_REF_NAME
    path:
      - relative/path/to/folder/*.ext
      - relative/path/to/another_folder/
      - relative/path/to/file

local / container / one branch / all jobs

Using the predefined variable CI_JOB_NAME as the cache key, you can ensure the cache is tied to a specific job:

local / container / all branch / one jobs

Local cache: Bind mount

If you don't want to use a volume for caching purposes (debugging purpose, cleanup disk space more easily, etc.), you can configure a bind mount for Docker volumes while registering the runner. With this setup, you do not need to set up the cache statement in your .gitlab-ci.yml:

#!/bin/bash

gitlab-runner register                             \
  --name="Bind-Mount Runner"                       \
  --docker-volumes="/host/path:/container/path:rw" \
...

local / one runners / one host / all branch / all jobs

In fact, this setup even allows you to share a cache between jobs running on the same host without requiring you to set up a distributed cache (which we'll talk about later):

#!/bin/bash

gitlab-runner register                             \
  --name="Bind-Mount Runner X"                     \
  --docker-volumes="/host/path:/container/path:rw" \
...

gitlab-runner register                                 \
  --name="Bind-Mount Runner Y"                         \
  --docker-volumes="/host/path:/container/alt/path:rw" \
...

local / multiple runners / one host / all branch / all jobs

Distributed cache

If you want to have a shared cache between all your jobs running on multiple runners and hosts, use the [runner.cache] section in your config.toml:

[[runners]]
  name = "Distributed-Cache Runner"
...
  [runners.cache]
    Type = "s3"
    Path = "bucket/path/prefix"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "s3.amazonaws.com"
      AccessKey = "<changeme>"
      SecretKey = "<changeme>"
      BucketName = "foobar"
      BucketLocation = "us-east-1"

remote / multiple runners / multiple hosts / all branch / all jobs

Using the predefined variable CI_COMMIT_REF_NAME as the cache key you can ensure the cache is tied to a specific branch between multiple runners and hosts:

remote / multiple runners / multiple hosts / one branch / all jobs

Real-life setup

The above assumptions allowed you to harness your understanding of the concepts and possibilities.

In real life, you'll face more complex wiring and we hope this article will help you as a visual cheatsheet along with the reference documentation.

Just to give you a sneak peek, here is an exercise for you:

  • Set up a cache between all the jobs of a specific stage, running on any runner and any hosts, but only between pipeline of the same branches:

Real-life test assignment

Happy caching, folks!

Cover image by Alina Grubnyak on Unsplash

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert