The Infrastructure Department enables GitLab (the company) to deliver a single DevOps application, and GitLab SaaS users to focus on generating value for their own businesses by ensuring that we operate an enterprise-grade SaaS platform.
The Infrastructure Department does this by focusing on availability, reliability, performance, and scalability efforts. These responsibilities have cost efficiency as an additional driving force, reinforced by the properly prioritized dogfooding efforts.
Many other teams also contribute to the success of the SaaS platform because GitLab.com is not a role. However, it is the responsibility of the Infrastructure Department to drive the ongoing evolution of the SaaS platform, enabled by platform observability data.
The Infrastructure Department operates a fast, secure, and reliable SaaS platform to which (and with which) everyone can contribute.
Integral part of this vision is to:
In FY22 we will work towards accomplishing more of the department vision, especially in support of continuing the successful monthly delivery of product and ongoing reliability improvements for our SaaS customers. We must also work to enable both currently proposed as well as expected needs to scale our infrastructure not only vertically with the existing GitLab.com service, but horizontally with multiple site implementations.
To progress towards the department vision, we are focusing on:
Ensure that the platform can respond to existing and new demands to support future growth, with a better user experience.
The direction is accomplished by using Objectives and Key Results (OKRs).
Initiatives driven within the Infrastructure Department, often spanning multiple quarters, are represented on the Infrastructure Department epic. This epic description also includes a stack-ranked centralized roadmap table meant to help inform relative prioritization for all material projects across Infrastructure Departments.
Other strategic initiatives to achieve this vision are driven by the needs of enterprise customers looking to adopt GitLab.com. The GitLab.com strategy catalogs top customer requests for the SaaS offering and outlines strategic initiatves across both Infrastructure and Stage Groups needed to address these gaps.
Unlike typical companies, part of the mandates of our Security, Infrastructure, and Support Departments is to contribute to the development of the GitLab Product. This follows from these concepts, many of which are also behaviors attached to our core values:
As such, everyone in the department should be familiar with, and be acting upon, the following statements:
The Infrastructure Library contains documents that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.
Blueprints scope out our initial thinking about specific problems and issues we are working on. Designs outline the specific architecture and implementation.
The Infrastructure department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.
We follow the same dogfooding process as part of the Engineering function, while keeping the department mission statement as the primary prioritization driver. The prioritization process is aligned to the Engineering function level prioritization process which defines where the priority of dogfooding lies with regards to other technical decisions the Infrastructure department makes.
When we consider building tools to help us operate GitLab.com, we follow the
5x rule to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure's contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.
At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.
The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure department:
Classification of the Infrastructure department projects is described on the infrastructure department projects page.
Adding a new service involves work from a number of Infrastructure teams to make sure the service is deployed and operated safely. To help new service deployments run smoothly please open a request issue in the infrastructure issue tracker
The Infrastructure Department is comprised of three distinct groups:
Product Management duties for the Infrastructure Department are handled by the Infrastructure PM, who reports into the Enablement Stage.
For details on the Department's structure, see the Infrastructure Teams Handbook section.
GitLab is a widely distributed company, and we aim to work asynchronously most of the time. There are times, however, when we must get together to discuss topics in real time, and thus, we do have some meetings scheduled. Meetings start on time and end on time—or earlier.
Our team calendar is shared with the company.
Every scheduled team meeting must have a Google Doc agenda attached to the invite. The agendas should be long-running and organized by date. Each meeting's agenda is set and reviewed before the start of each meeting. Everyone invited to the meeting should have edit rights to add agenda items before the start of the meeting and to take notes during the meeting.
The GitLab SaaS Infrastructure meeting is one of many channels to share and distribute Infrastructure-related information relevant to the entire company. The meeting is organized by the Infrastructure leadership team and the VP of Infrastructure. This meeting aims to bring together many different related aspects which all influence the GitLab SaaS infrastructure reliability, scalability, performance, and efficiency.
This meeting is always to be recorded and made available in support of those in varying timezones. It is meant to be a learning environment for what is coming, what we've done well, and what we can do to improve. All GitLab team members are welcome to attend this meeting.
While the agenda continually evolves, the main structure is outlined below. Everyone should feel welcome to contribute to the agenda, but Infrastructure mStaff will work to curate the agenda into the best use of 50 minutes each week.
The recording of this meeting is automatically uploaded to Infrastructure Google Drive.
Reliability discussions and firedrills is a purely technical meeting for Infrastructure ICs to discuss technical topics. The agenda is driven by:
Project status discussions are strictly out of bounds in this (the only exception being the resolution of technical dependencies).
While open discussions are welcome, it is strongly recommended that blueprints and designs are used as the source of agenda items. This allows everyone gain the required context–before the meeting starts–for an engaging conversation.
During discussions, it is ok to point shortcomings for a given design. This is one way in which we expand our angle of vision and learn. In general, however, make it a point to provide alternatives.
We have weekly infrastructure oncall handover and staff meetings. These meetings are organized by Infrastructure Managers that occur as a time for SREs to have weekly handover notes for Oncall and other announcements for the team. We run these meetings from the Team Newsletter issues.
Infrastructure mStaff is a loose denomination for the group of people who report directly to the Vice President of Infrastructure. This is a group composed of both managers and individual contributors, and they are responsible for the overall direction of Infrastructure and the achievement of our goals:
|Steve Loyd||VP of Infrastructure|
|Marin Jankovski||Director of Infrastructure, Platform|
|Gerardo "Gerir" Lopez-Fernandez||Engineering Fellow, Infrastructure|
|Brent Newton||Director of Infrastructure, Reliability|
The weekly mStaff brings together Infrastructure's management team for a weekly sync to prepare for the week and address issues that require attention. The meeting is organized by the VP of Infrastructure.
|General Issue Trackers||General Slack Channels||Team Slack Channels||Resources|
|Infrastructure issue queue||#production||#sre_observability||Production Architecture|
|Production incidents, and changes||#infrastructure-lounge||#sre_datastores||Operational Runbooks|