The current deployment tool takeoff was not originally designed with CICD in mind and is not particularly well suited for deploying to multiple classes of infrastructure in parallel and displaying output that's easy to read.
The goal of this project is to create new tooling that is better suited for orchestrated ssh. This new tooling will not only provide immediate benefit for deployments but also provide some common libraries and configuration for performing maintenance that is driven from CICD.
The immediate benefits from this replacement will be the following:
The set of libraries and configuration for this project is also helpful for general maintenance. One example of this is the registry restarter which utilizes deploy-tooling to initiate a rolling drain and restart of the registry service to work around a memory leak.
Deployer uses the same set of scripts and configuration as the GitLab post-deployment patcher.
Deployer does this by referencing the patcher repository as a submodule for applying post-deployment-patches during deployments. This ensures that patches are applied in a way such that new releases will not revert previously applied post-deployment patches.
graph TD subgraph gl-infra/patcher; a1[gitlab-ci.yml]; a2[deploy-tooling submodule]; end; subgraph gl-infra/deployer; b1[gitlab-ci.yml]; b2[deploy-tooling submodule]; b3[patcher submodule]; end;
A new deployment is initiated by starting a pipeline in the Deployer project with the following two environment variables set:
The pipeline is designed with continuous deployments in mind, with the capability of deploying new omnibus packages to the entire fleet through multiple stages, all the way to production.
graph LR; subgraph GITLAB_ENV=gstg; a>staging] ==> b>staging QA]; end; subgraph GITLAB_ENV=gprd-cny; b ==> c>production canary]; c ==> d>canary QA]; end subgraph GITLAB_ENV=gprd; d ==> e>production]; end
GITLAB_ENV to one of the following values, only the CI jobs targeting the set stages will run
This however can be extended so that if
GITLAB_ENV is set to multiple stages (ex:
gstg,gprd-cny), more than one stage can be deployed to in a single pipeline.
graph LR; subgraph GITLAB_ENV=gstg,gprd-cny; a>staging] ==> b>staging QA]; b ==> c>production canary]; c ==> d>canary QA]; end subgraph GITLAB_ENV=gprd; d ==> e>production]; end
Within each environment, the following pipeline stages are executed to ensure that new versions of the omnibus package are safely deployed to with minimal community impact:
graph LR; a>prepare] ==> b>migrations]; b ==> c>gitaly deploy]; subgraph fleet; c -.- d>sidekiq]; c -.- d1>git]; c -.- d2>web]; c -.- d3>api]; c -.- d4>pages]; c -.- d5>registry]; c -.- d6>mailroom]; end d ==> e>postdeploy migrations]; e ==> f>cleanup]; f ==> g>gitlab-qa];
The prepare stage is used for any step that needs to happen before any change is made for the deployment. Nothing in this stage is destructive or would cause any user impact.
These stages all involve installing the GitLab EE omnibus package. This is done with an Ansible play that does the following in sequence on a set of servers within a fleet. By default, 10% at a time:
The migrations stage is for pre-deployment migrations. First the target version of GitLab-EE is deployed to a single server, then then following command is run to initiate migrations:
SKIP_POST_DEPLOYMENT_MIGRATIONS=1 /usr/bin/gitlab-rake db:migrate
The output of the command is displayed in the job output.
The Gitaly servers are deployed before the rest of the fleet. This is done to ensure that there is no newer rails code that takes advantage of new Gitaly features before they are upgraded.
The remaining fleet is divided into the following groups:
Each of these groups are deployed to in parallel with a dedicated CI job. Within the group the GitLab-EE packages are deployed 10% at a time.
Installing the Gitlab omnibus package and issuing a
gitlab-ctl reconfigure does not necessarily mean that a service will be restarted if it is upgraded. To handle this case the pre-install versions are compared against the post-install versions to see if there is a version change. If there is a version upgrade, the corresponding services will be restarted. The following services are checked for a version change:
After the entire fleet has been upgraded, the final migrations are run. This stage is the last step and is considered the point-of-no-return for upgrades. Once post-deploy migrations are complete it may be impossible to rollback.
The following command is issued on the deploy server:
In addition to this, the output of the following is displayed:
At the very end of the deployment the following tasks are run:
For staging and production canary deploy a final step is executed which runs the GitLab QA smoke test. For staging it tests the https://staging.gitlab.com endpoint, for canary production it sets the
gitlab_canary cookie and runs against https://gitlab.com.