|Slack Channels||#g_scalability /
||#infrastructure-lounge (Infrastructure Group Channel), #incident-management (Incident Management), #alerts-general (SLO alerting), #mech_symp_alerts (Mechanical Sympathy Alerts)|
The Scalability team is responsible for GitLab and GitLab.com at scale, working on the highest priority scalability items in the application in close coordination with Reliability Engineering teams and providing feedback to other Engineering teams so they can become better at scalability as well.
As its name implies, the Scalability team enhances the availability, reliability and, performance of GitLab by observing applications capabilities to operate at GitLab.com scale. The Scalability team analizes application performance on GitLab.com, recognizes bottlenecks in service availability, proposes short term improvements and develops long term plans that help drive the decisions of other Engineering teams.
Short term goals include:
All work tracked by the team is compiled in the Scaling GitLab.com epic.
When we need to work in the GitLab.org group, we create a corresponding epic there and link it in the above epic's description (as epics are tied to groups, and we use more than one top-level group).
Diagram below describes how the work gets prioritized in the Scalability team, and added to the above mentioned epic:
Process contains 6 cyclical stages:
The Scalability team routinely uses the following set of labels:
team::Scalability label is used in order to allow for easier filtering of
issues applicable to the team that have group level labels applied.
The priority labels allow us to track the issues correctly and raise/lower priority of work based on the both external and internal factors. Priorities are set based on the priority definitions with an addition that the target SLO's apply to GitLab.com service SLO's.
This means that if resolving an issue will immediately improve, or is unblocking an issue that will immediately impact GitLab.com SLO's issue should have the highest priority.
The Scalability team leverages scoped workflow labels to track different stages of work. They show the progression of work for each issue and allow us to remove blockers or change focus more easily.
The standard progression of workflow is described below:
There are three other workflow labels of importance omitted from the diagram above:
We have automated triage policies defined in the triage-ops project. These perform tasks such as automatically labelling issues, asking the author to add labels, and creating weekly triage issues.
We currently have two weekly triage issues:
Service::Unknowngrooming - lists issues with
Service::Unknownwith the goal of adding a defined service, where possible.
We rotate the triage ownership each month, with the current triage owner responsible for picking the next one (a reminder is added to their last triage issue).
Issue is being implemented if:
Issue is resolved when:
The Scalability team issue boards track the progress of ongoing work. Purpose of some of the more important issue boards are described below:
We work from our main epic: Scaling GitLab on GitLab.com.
Most of our work happens on the current in-progress sub epic. This is always prominently visible from the main epic's description. From there, work takes place on the board associated to the current in-progress epic.
Priority and workflow labels take precedence; we don't use issue ordering in boards or epics for priorities. Workflow labels to the right are higher priority than those to the left.
The Scalability team will work with all engineering teams across all departments as a representative of GitLab.com as one of the largest GitLab installations, to ensure that GitLab continues to scale in a safe and sustainable way.
The Memory team is a natural counterpart to the Scalability team, but their missions are complementing each other rather than overlap:
The following people are members of the Scalability Team:
|New Vacancy - Marin Jankovski (Interim)||Engineering Manager, Scalability|
|Sean McGivern||Staff Backend Engineer, Scalability|
|Oswaldo Ferreira||Backend Engineer, Scalability|
|Bob Van Landuyt||Senior Backend Engineer, Scalability|
|C.M.||Site Reliability Engineer, Scalability|
workflowlabels to the issue. The team will triage the issue and apply these.
We celebrate our wins! Whenever a change driven by the Scalability Team shows a clear positive impact on the scalability of GitLab.com; through key metrics, saturation reduction, reduced Mean time to Detection (MTTD), improved Mean time between Failures, etc, we post a message as a comment on this snippet in our tracker: https://gitlab.com/gitlab-com/gl-infra/scalability/snippets/1900609.