The Infrastructure Department enables GitLab (the company) to deliver a single DevOps application, and GitLab SaaS users to focus on generating value for their own businesses by ensuring that we operate an enterprise-grade SaaS platform.
The Infrastructure Department does this by focusing on availability, reliability, performance, and scalability efforts. These responsibilities have cost efficiency as an additional driving force, reinforced by the properly prioritized dogfooding efforts.
Many other teams also contribute to the success of the SaaS platform because GitLab.com is not a role. However, it is the responsibility of the Infrastructure Department to drive the ongoing evolution of the SaaS platform, enabled by platform observability data.
The Infrastructure Department operates a fast, secure, and reliable SaaS platform to which (and with which) everyone can contribute.
Integral part of this vision is to:
In FY22 we will work towards accomplishing more of the department vision, especially in support of continuing the successful monthly delivery of product and ongoing reliability improvements for our SaaS customers. We must also work to enable both currently proposed as well as expected needs to scale our infrastructure not only vertically with the existing GitLab.com service, but horizontally with multiple site implementations.
To progress towards the department vision, we are focusing on:
Ensure that the platform can respond to existing and new demands to support future growth, with a better user experience.
The direction is accomplished by using Objectives and Key Results (OKRs).
Initiatives driven within the Infrastructure Department, often spanning multiple quarters, are represented on the Infrastructure Department epic. This epic description also includes a stack-ranked centralized roadmap table meant to help inform relative prioritization for all material projects across Infrastructure Departments.
Other strategic initiatives to achieve this vision are driven by the needs of enterprise customers looking to adopt GitLab.com. The GitLab.com strategy catalogs top customer requests for the SaaS offering and outlines strategic initiatves across both Infrastructure and Stage Groups needed to address these gaps.
Unlike typical companies, part of the mandates of our Security, Infrastructure, and Support Departments is to contribute to the development of the GitLab Product. This follows from these concepts, many of which are also behaviors attached to our core values:
As such, everyone in the department should be familiar with, and be acting upon, the following statements:
The Infrastructure Library contains documents that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.
Blueprints scope out our initial thinking about specific problems and issues we are working on. Designs outline the specific architecture and implementation.
The Infrastructure department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.
We follow the same dogfooding process as part of the Engineering function, while keeping the department mission statement as the primary prioritization driver. The prioritization process is aligned to the Engineering function level prioritization process which defines where the priority of dogfooding lies with regards to other technical decisions the Infrastructure department makes.
When we consider building tools to help us operate GitLab.com, we follow the
5x rule to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure's contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.
At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.
The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure department:
Classification of the Infrastructure department projects is described on the infrastructure department projects page.
Adding a new service involves work from a number of Infrastructure teams to make sure the service is deployed and operated safely. To help new service deployments run smoothly please open a request issue in the infrastructure issue tracker
The Infrastructure Department is comprised of three distinct groups:
Product Management duties for the Infrastructure Department are handled by the Infrastructure PM, who reports into the Enablement Stage.
For details on the Department's structure, see the Infrastructure Teams Handbook section.
GitLab is a widely distributed company and we aim to work asynchronously most of the time. However, some topics deserve a real-time discussion. We should always look to re-evaluate such meetings to ensure they are continuing to add value. We follow all the guidance for all-remote meetings, including items such as always starting and ending on time—or earlier.
Our team calendar is shared with the company.
Every scheduled team meeting must have a Google Doc agenda attached to the invite. The agendas should be long-running and organized by date. Each meeting's agenda is set and reviewed before the start of each meeting. Everyone invited to the meeting should have edit rights to add agenda items before the start of the meeting and to take notes during the meeting.
Meetings may have multiple topics. For each topic there should be one meeting. This helps to prevent conflicting information and inefficient duplication. Some meetings may deliberately be scheduled to occur twice to better include all global participants, but this is considered to be the same meeting, just held at two times.
|High priority escalations and project updates||GitLab.com Standup (internal only)||All Engineering||Daily|
|Incident Review and followup||Incident Review (internal only)||All Engineering||Tues|
|Prioritization of Engineering work||Engineering Allocation (internal only)||All Engineering||Tues|
|Infrastructure Performance Indicator Review||Infrastructure Key Meeting (internal only)||Eng VP Staff, Finance & Exec leadership||Monthly|
|What's Happening in Infrastructure||Infrastructure Group Conversation (internal only)||All Company||Monthly|
|Infrastructure Leadership Discussion||mStaff Weekly (internal only)||Infra VP Directs & all Managers||Tues|
|Tactical RE team coordination||Reliability Leader Team Sync (internal only)||Reliability Managers & Staff Eng||Mon & Thurs|
|Practical exercises to improve team capabilities||Firedrills (internal only)||All Infra||Weds|
|Discussions for Oncall Handover & Newsletter||Oncall Handover||Ending & Starting EOC||Tues|
This weekly meeting has been discontinued as of Aug 23, 2021. Topics in this meeting became redundant to other coordination in the Engineering Allocation, GitLab.com Standup, and Incident Review meetings.
Reliability discussions and firedrills is a purely technical meeting for Infrastructure ICs to discuss technical topics. The agenda is driven by:
Project status discussions are strictly out of bounds in this (the only exception being the resolution of technical dependencies).
While open discussions are welcome, it is strongly recommended that blueprints and designs are used as the source of agenda items. This allows everyone gain the required context–before the meeting starts–for an engaging conversation.
During discussions, it is ok to point shortcomings for a given design. This is one way in which we expand our angle of vision and learn. In general, however, make it a point to provide alternatives.
We have weekly infrastructure oncall handover and staff meetings. These meetings are organized by Infrastructure Managers that occur as a time for SREs to have weekly handover notes for Oncall and other announcements for the team. We run these meetings from the Team Newsletter issues.
Infrastructure mStaff is a loose denomination for the group of people who report directly to the Vice President of Infrastructure. This is a group composed of both managers and individual contributors, and they are responsible for the overall direction of Infrastructure and the achievement of our goals:
The weekly mStaff brings together Infrastructure's management team for a weekly sync to prepare for the week and address issues that require attention. The meeting is organized by the VP of Infrastructure.
|General Issue Trackers||General Slack Channels||Team Slack Channels||Resources|
|Infrastructure issue queue||#production||#sre_observability||Production Architecture|
|Production incidents, and changes||#infrastructure-lounge||#sre_datastores||Operational Runbooks|