The Core Infra teams owns:
Need help? How to get something on our backlog
The people of Core Infra are:
|David Smith||Engineering Manager,Reliability|
|Alex Hanselka||Site Reliability Engineer|
|Devin Sylva||Senior Site Reliability Engineer|
|Craig Barrett||Senior Site Reliability Engineer|
|Cameron S McFarland||Senior Site Reliability Engineer|
|Hendrik Meyer||Site Reliability Engineer|
|Graeme Gillies||Senior Site Reliability Engineer|
Our guiding principles center around implementing boring solutions to infrastructure problems. We work to simplify interfaces and build robust workflows for other engineers within GitLab who utilize our platform to provide support for and deliver new features to the gitlab.com SaaS product. Over time this continues to expand to include additional/related applications, sites, and systems.
In practice, this means that we
We have 3 types of epics:
We use two types of milestones that can overlap:
When external asks come in:
We conduct our planning and retrospectives asynchronously, using issues in the sre-coreinfra/planning project to organize discussion.
Planning occurs on a two-week cadence, aligned with our unscheduled milestones.
During the first week of a milestone, the team will review and discuss the next upcoming milestone in an issue opened on the sre-coreinfra/planning backlog. This allows for time before the start of the next milestone to work on collecting additional data required to refine an issue, clarify details with requestors, or perform basic analyses of the relative size/scope of an issue before starting the actual work.
During the second week of a milestone, in addition to resolving any questions or discussions raised on the next milestone planning issue, the team begins the issue triage process, and populates the planning issue for discussion when the cycle repeats.
Issue triage starts with reviewing rollover (unplanned issues in the current unplanned milestone that were not finished), incoming, unscheduled, and unprioritized work. Since rollover work is the least predictable in the first week of the milestone, planning is conducted with the assumption that all work in a milestone will be completed, and any rollover work will push a subsequent volume of work out to following milestones according to priority.
Aside from rollover issues, the remaining backlog issues will be reviewed in the following order
workflow-infra::triagewill have an initial
core-infra::pXpriority added and move to
Once the list of proposed issues for the upcoming milestone is completed, the team will perform a final review and discuss any additional context, missing details, pre-requisites, conflicts, etc. and decide on a final relative priority for work to be pulled.
At the conclusion of each milestone, a retrospective issue will be created in the sre-coreinfra/planning project, for discussion and review. The text of the retrospective can be customized to ensure that certain aspects of the previous project/milestone are discussed, but in general will follow the format
The Core-Infra team routinely uses the following set of labels:
team::Core-Infra label is used in order to allow for easier filtering of
issues applicable to the team that have group level labels applied.
The priority labels allow us to track the issues correctly and raise/lower priority of work based on both external and internal factors.
This means that the highest priority is given to working on issues that improve Gitlab.com SLO's either immediately and directly, or by unblocking other issues to achieve the same.
The Core-Infra team leverages scoped workflow labels to track different stages of work. They show the progression of work for each issue and allow us to remove blockers or change focus more easily.
The standard progression of workflow is from top to bottom in the table below:
||Problem is identified and effort is needed to determine the correct action or work required.|
||Proposal is created and put forward for discussion and review.
SRE looks for clarification and writes up a rough high-level execution plan if required. SRE highlights what they will check and along with soak/review time and developers can confirm.
If there are no further questions or blockers, the issue can be moved into "Ready".
||Proposal is complete and the issue is waiting to be picked up for work.|
||Issue is assigned and work has started.
While in progress, the issue should be updated to include steps for verification that will be followed at a later stage.
||Issue has an MR in review.|
||MR was merged and we are waiting to see the impact of the change to confirm that the initial problem is resolved.|
||Issue is updated with the latest graphs and measurements, this label is applied and issue can be closed.|
There are three other workflow labels of importance:
||Work in the issue is being abandoned due to external factors or decision to not resolve the issue. After applying this label, issue will be closed.|
||Work is not abandoned but other work has higher priority. After applying this label, team Engineering Manager is mentioned in the issue to either change the priority or find more help.|
||Work is blocked due external dependencies or other external factors. Where possible, a blocking issue should also be set. After applying this label, issue will be regularly triaged by the team until the label can be removed.|
The Core-Infra team uses priority labels as a means to indicate order under which work is next to be picked up. Priorities are roughly defined as:
||Issue is blocking other team-members, or blocking other work. As soon as possible after completing ongoing task unless directly communicated otherwise.|
||Issue has a large impact, or will create additional work.|
||Issue should be completed once other urgent work is done.|
||Default priority. A nice-to-have improvement, non-blocking technical debt, or a discussion issue.|