If you've ever worked with GitLab CI/CD you may have needed, at some point, to use a cache to share content between jobs. The decentralized nature of GitLab CI/CD is a strength that can confuse the understanding of even the best of us when we want to connect wires all together. For instance, we need to know critical information such as the difference between artifacts and cache and where/how to place setups.
This visual guide will help with both challenges.
Cache vs. artifacts
The concepts may seem to overlap because they are about sharing content between jobs, but they actually are fundamentally different:
- If your job does not rely on the the previous one (i.e. can produce it by itself but if content already exists the job will run faster), then use cache.
- If your job does rely on the output of the previous one (i.e. cannot produce it by itself), then use artifacts and dependencies.
Here is a simple sentence to remember if you struggle between choosing cache or artifact:
Cache is here to speed up your job but it may not exist, so don't rely on it.
This article will focus on cache.
Initial setup
We'll go with a simple representation of the GitLab CI/CD pipelining model and ignore (for now) that the jobs can be executed on any runners and hosts. It will help get the basics.
Let's say you have:
- 1 project with 3 branches
- 1 host running 2 docker runners
Local cache: Docker volume
If you want a local cache between all your jobs running on the same runner, use the cache statement in your .gitlab-ci.yml
:
default:
cache:
path:
- relative/path/to/folder/*.ext
- relative/path/to/another_folder/
- relative/path/to/file
Using the predefined variable CI_COMMIT_REF_NAME
as the cache key, you can ensure the cache is tied to a specific branch:
default:
cache:
key: $CI_COMMIT_REF_NAME
path:
- relative/path/to/folder/*.ext
- relative/path/to/another_folder/
- relative/path/to/file
Using the predefined variable CI_JOB_NAME
as the cache key, you can ensure the cache is tied to a specific job:
Local cache: Bind mount
If you don't want to use a volume for caching purposes (debugging purpose, cleanup disk space more easily, etc.), you can configure a bind mount for Docker volumes while registering the runner. With this setup, you do not need to set up the cache statement in your .gitlab-ci.yml
:
#!/bin/bash
gitlab-runner register \
--name="Bind-Mount Runner" \
--docker-volumes="/host/path:/container/path:rw" \
...
In fact, this setup even allows you to share a cache between jobs running on the same host without requiring you to set up a distributed cache (which we'll talk about later):
#!/bin/bash
gitlab-runner register \
--name="Bind-Mount Runner X" \
--docker-volumes="/host/path:/container/path:rw" \
...
gitlab-runner register \
--name="Bind-Mount Runner Y" \
--docker-volumes="/host/path:/container/alt/path:rw" \
...
Distributed cache
If you want to have a shared cache between all your jobs running on multiple runners and hosts, use the [runner.cache] section in your config.toml
:
[[runners]]
name = "Distributed-Cache Runner"
...
[runners.cache]
Type = "s3"
Path = "bucket/path/prefix"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "<changeme>"
SecretKey = "<changeme>"
BucketName = "foobar"
BucketLocation = "us-east-1"
Using the predefined variable CI_COMMIT_REF_NAME
as the cache key you can ensure the cache is tied to a specific branch between multiple runners and hosts:
Real-life setup
The above assumptions allowed you to harness your understanding of the concepts and possibilities.
In real life, you'll face more complex wiring and we hope this article will help you as a visual cheatsheet along with the reference documentation.
Just to give you a sneak peek, here is an exercise for you:
- Set up a cache between all the jobs of a specific stage, running on any runner and any hosts, but only between pipeline of the same branches:
Happy caching, folks!
Cover image by Alina Grubnyak on Unsplash