|Slack Channels||#g_delivery /
|Delivery Handbook||Team training|
|Release Tools Project||Release tools|
|Release Manager Runbooks||release/docs/runbooks|
The Delivery Team enables GitLab Engineering to deliver features in a safe, scalable and efficient fashion to both GitLab.com and self-managed customers. The team ensures that GitLab's monthly, security, and patch releases are deployed to GitLab.com and publicly released in a timely fashion.
By its own nature, the Delivery team is a backstage, non-user feature facing team whose product and output has a direct impact on Infrastructure's primary goals of availability, reliability, performance, and scalability of all of GitLab's user-facing services as well as self-managed customers. The team creates the workflows, frameworks, architecture and automation for Engineering teams to see their work reach production effectively and efficiently.
The Delivery team is focused on our CI/CD blueprint by driving the necessary changes in our software development processes and workflows, as well as infrastructure changes, to embrace the benefits of CI/CD.
The Delivery team significantly contributes to the Infrastructure department direction for FY22 in the following ways:
The team regularly works on the following tasks, in the order of priority:
Each member of the Delivery team is part of this vision:
The following people are members of the Delivery Team:
|Amy Phillips||Engineering Manager, Delivery|
|Robert Speicher||Senior Backend Engineer, Delivery|
|Alessio Caiazza||Staff Backend Engineer, Delivery|
|Mayra Cabrera||Senior Backend Engineer, Delivery|
|John T Skarbek||Senior Site Reliability Engineer, Delivery|
|Reuben Pereira||Backend Engineer, Delivery|
|Henri Philipps||Senior Site Reliability Engineer, Delivery|
|Graeme Gillies||Senior Site Reliability Engineer, Delivery|
The following members of other functional teams are our stable counterparts:
|André Luís||Frontend Engineering Manager, Create:Source Code, Create:Code Review, Delivery & Scalability|
Delivery team contributes to Engineering function performance indicators through Infrastructure department performance indicators. The team's main performance indicator is Mean Time To Production (MTTP), which serves to show how quickly a change introduced through a Merge Request is reaching production environment (GitLab.com). At the moment of writing, the target for this PI is defined in this key result epic.
MTTP is further broken down into charts and tables at the Delivery Team Performance Indicators Sisense dashboard.
The Delivery team work is tracked through number of epics, issues, and issue boards.
Epics and issue boards are complementary to each other, and we always strive to have a 1-1 mapping between a working epic and an issue board. Epics describe the work and allows for general discussions, while the issue board is there to describe order of progress in any given epic.
Two tracking epics related to the team mission are:
Any working epic that the team creates should be directly added as a child to one of these two top level tracking epics.
Working epic should always have:
In cases where the work is tracked in a project in a different group outside of our canonical project location, we will create two epics for the same topic and state in the epic description which one is the working epic. It is expected that the Status of the Epic is updated on the DRI on a weekly basis.
Each working epic should be accompanied by an issue board. Issue boards should be tailored to the specific project needs, but at minimum it should contain the workflow labels shown on the workflow diagram.
The canonical issue tracker for the Delivery team is at gl-infra/delivery. Issues are automatically labeled if no labels are applied using the triage ops project. The default labels defined in the labeling library.
By default, an issue needs to have a:
The Delivery team leverages scoped
workflow-infra labels to track different stages of work.
On the Planning board problems are discussed and scoped so that we have enough information to prioritise and implement the solution.
The typical workflow is described below:
Not every issue will be prioritised for building as soon as it is ready. Instead we manage a Build board with all
workflow-infra::In Progress, and
workflow-infra::Ready issues focused on the team's current goals.
The standard progression of workflow is described below:
There are three other workflow labels of importance omitted from the diagram above:
workflow-infra::Done is applied to signify completion of work, but its sole purpose is to ensure that issues are closed when the work is completed, ensuring issue hygiene.
The Delivery team uses priority labels to indicate order under which work is next to be picked up. Meaning attached to priorities can be seen below:
|Delivery::P1||Issue is blocking other team-members, or blocking other work. Needs to be addressed immediately, even if it means postponing current work.|
|Delivery::P2||Issue has a large impact, contributes towards current OKRs or will create additional work. Work should start as soon as possible after completing ongoing task.|
|Delivery::P3||Issue should be completed once other urgent work is done.|
|Delivery::P4||Default priority. A nice-to-have improvement, non-blocking technical debt, or a discussion issue. Issue might be completed in future or work completely abandoned.|
The team uses priority labels differently to the general issue triage priority definition in order to avoid ambiguity that comes with difference in timelines between Stage teams and Infrastructure teams. We have different timelines (release brings different expectations for Delivery), different DRI's (no PM for Delivery), and different importance (Blocked release means that no one can ship anything).
Some of the labels related to the team management are defined as:
onboarding- issues are related to granting access to team resources.
team-tasks- issues related to general team topics.
Discussion- meta issues that are likely to be promoted to a working epic or generate separate implementation issues.
Announcements- issues used to announce important changes to wider audience.
Project labels are defined as needed, but they are required unless the issue describes a team management task.
The Delivery team generally has working epics assigned to specific owners who are responsible for bringing tasks into the Planning board and Build board to keep the project on track. However, anyone is welcome to pick up any tasks from the Build board regardless of which project it belongs to. If you want to work on something outside of your current project, feel free to contribute to any of the mid to lower priority labeled issues.
As part of the project, we might decide to organize project demo's. The decision on creating a demo depends on the expected longevity of the project, but also on the complexity of it.
The purpose of the demo is to ensure that everyone who participates in the project has a way of sharing their findings and challenges they might be encountering outside of the regular async workflow. The demo's do not have presentations attached to it, and they require no prior preparation. The demoer shouldn't feel like they have to excuse themselves for being unprepared, and expect that their explanation without faults. In fact, if what is being demoed is showing off no weaknesses, we might have not cut scope in time.
It is encouraged to show and discuss:
Every Delivery team member is responsible for sharing skills either through creating a training session for the rest of the team, or through paired work. See the page on team training for details.
The Delivery team officially came into existence on 2018-10-23. This was the culmination of a larger alignment that was happening throughout that year, exposed by the need to streamline releases for self-managed users and creating a better experience for GitLab.com users.
All throughout GitLab's existence, Release Management had been a monthly rotating role served by developers. The idea behind it was to keep developers close to the whole lifecycle of the software they create, and ensure that they automate their work. This worked well until the number of application changes, and developer tasks grew too large for anyone to handle as a secondary task. The event that indicated the need for a change was a near miss event near the end of 2017, when the first Release Candidate was deployed to GitLab.com just 2 days before the 22nd. That whole month was riddled with challenges, from release managers struggling to deliver their day to day development tasks and RM tasks, to multiple unsuccessful deployments to GitLab.com. Most importantly, this was a first indication that the company was growing and that the processes that worked previously, might need to change to accommodate the larger growth that was planned.
After some internal discussions, we entered 2018 with an attempt to work on process improvements, rather than changing everything in one go. We went from a monthly rotation to two month release manager rotation, started noting down spent time. Over the next several months we'd seen general stabilization of the process but it became apparent that spending 4 engineers time in Release Manager rotation was not getting us anywhere closer to improving the deployment process for GitLab.com, and with each developer we hired the task list grew bigger.
The initial discussion on what is in front of us to achieve Continuous Delivery on GitLab.com exposed a clear need for a team focused on this specific task.
After a successful team onsite (aka Fast boot) where we executed on our tasks while being in the same room together for the first time, we announced the first step towards Continous Delivery on GitLab.com. This was a very large change that changed the deployment frequency from deploying from the default branch once per month (for the total of 4-6 deploys to include bug fixes), to taking commits from the default branch once per week.
The team focus then shifted to getting deployment time measured in hours, and migration of GitLab.com to Kubernetes.
Prior to 2020, the team impact overview was created in Slack, and in the years that followed the overview was logged in issues: