The Infrastructure Department enables GitLab (the company) to deliver a single DevOps application, and GitLab SaaS users to focus on generating value for their own businesses by ensuring that we operate an enterprise-grade SaaS platform.
The Infrastructure Department does this by focusing on availability, reliability, performance, and scalability efforts. These responsibilities have cost efficiency as an additional driving force, reinforced by the properly prioritized dogfooding efforts.
Many other teams also contribute to the success of the SaaS platform because GitLab.com is not a role. However, it is the responsibility of the Infrastructure Department to drive the ongoing evolution of the SaaS platform, enabled by platform observability data.
The Infrastructure Department operates a fast, secure, and reliable SaaS platform to which (and with which) everyone can contribute.
Integral part of this vision is to:
In FY23 we will work towards added stability and resiliency to support the ongoing growth of GitLab SaaS offerings and contribute ongoing efficiency towards GitLab development productivity.
In support of improving these results, we are focusing on:
Continuing the collaborative efforts from the previous year throughout FY23 to gain additional reliability.
Contribute new platform management capabilities to expand our ability to meet our customers SaaS offering needs. GitLab Dedicated has been and will continue to be the focus of this effort. Much of our investment in Dedicated will also fuel other future platform efforts to continue offering customers our services where they want to consume them and to also scale our SaaS platform to many times future growth.
The direction is accomplished by using Objectives and Key Results (OKRs).
Initiatives driven within the Infrastructure Department, often spanning multiple quarters, are represented on the Infrastructure Department epic. This epic description also includes a stack-ranked centralized roadmap table meant to help inform relative prioritization for all material projects across Infrastructure Departments.
Other strategic initiatives to achieve this vision are driven by the needs of enterprise customers looking to adopt GitLab.com. The GitLab.com strategy catalogs top customer requests for the SaaS offering and outlines strategic initiatves across both Infrastructure and Stage Groups needed to address these gaps.
Unlike typical companies, part of the mandates of our Security, Infrastructure, and Support Departments is to contribute to the development of the GitLab Product. This follows from these concepts, many of which are also behaviors attached to our core values:
As such, everyone in the department should be familiar with, and be acting upon, the following statements:
The Infrastructure Library contains documents that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.
The Infrastructure department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.
We follow the same dogfooding process as part of the Engineering function, while keeping the department mission statement as the primary prioritization driver. The prioritization process is aligned to the Engineering function level prioritization process which defines where the priority of dogfooding lies with regards to other technical decisions the Infrastructure department makes.
When we consider building tools to help us operate GitLab.com, we follow the 5x rule
to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure's contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.
At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.
The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure department:
Classification of the Infrastructure department projects is described on the infrastructure department projects page.
The infrastructure issue tracker is the backlog and a catch-all project for the infrastructure teams and tracks the work our teams are doing–unrelated to an ongoing change or incident.
In addition to tracking the backlog, Infrastructure Department projects are captured in our Infrastructure Department Epic as well as in our Quarterly Objectives & Key Results
The Infrastructure Department is comprised of three distinct groups:
To create an issue for the related teams you can use the links below:
Product Management duties for the Infrastructure Department are handled by the Infrastructure PM, who reports into the Enablement Stage.
For details on the Department's structure, see the Infrastructure Teams Handbook section.
Infrastructure Reliability Engineering Squads may be aligned with stage groups as stable counterparts.
Stable Counterparts are used as a framework for managing reliable services at GitLab. The framework provides guidelines for collaboration between Stage Groups and Infrastructure Teams.
GitLab is a widely distributed company and we aim to work asynchronously most of the time. However, some topics deserve a real-time discussion. We should always look to re-evaluate such meetings to ensure they are continuing to add value. We follow all the guidance for all-remote meetings, including items such as always starting and ending on time—or earlier.
Our team calendar is shared with the company.
Every scheduled team meeting must have a Google Doc agenda attached to the invite. The agendas should be long-running and organized by date. Each meeting's agenda is set and reviewed before the start of each meeting. Everyone invited to the meeting should have edit rights to add agenda items before the start of the meeting and to take notes during the meeting.
Meetings may have multiple topics. For each topic there should be one meeting. This helps to prevent conflicting information and inefficient duplication. Some meetings may deliberately be scheduled to occur twice to better include all global participants, but this is considered to be the same meeting, just held at two times.
Topics | Meeting | Participants | Cadence |
Incident Review and followup | Incident Review (internal only) | All Engineering | Tues |
Prioritization of Engineering work | Engineering Allocation (internal only) | All Engineering | Tues |
Infrastructure Performance Indicator Review | Infrastructure Key Meeting (internal only) | Eng VP Staff, Finance & Exec leadership | Monthly |
What's Happening in Infrastructure | Infrastructure Group Conversation (internal only) | All Company | Monthly |
Infrastructure Leadership Discussion | mStaff Weekly (internal only) | Infra VP Directs & all Managers | Tues |
Tactical RE team coordination | Reliability Leader Team Sync (internal only) | Reliability Managers & Staff Eng | Mon & Thurs |
Practical exercises to improve team capabilities | Firedrills (internal only) | All Infra | Weds |
Discussions for Oncall Handover & Newsletter | Oncall Handover | Ending & Starting EOC | Tues |
Key Review meetings provide the Infrastructure leadership the opportunity to inform the executive team of our performance indicator progress, results on OKRs, and updates on any Cross-functional Key Initiatives which we are leading.
Infrastructure Key Review meetings are facilitated and led by VP of Infrastructure, Director of Infrastructure Platform, and Director of Reliability Engineering.
Group Conversation meetings take the information from the Key Review (plus any additional topics) and present this to an audience that is all of GitLab, and for Infrastructure, is a Public livestream.
Coordination of Infrastructure Group Conversation materials and facilitation of the discussion is a rotating role among the managers within Infrastructure.
FY23 Infrastructure Group Conversation DRI Schedule(internal only)
Quick checklist for the host (time order):
GC Date | DRI |
April 6 | Amy P |
May 18 | Dave S |
July 5 | Kenn W |
August 11 | Anna Liisa M |
September 22 | Rachel N |
November 8 | Liam M |
December 20 | Michele B |
TBD | Anthony F |
This weekly meeting has been discontinued as of Aug 23, 2021. Topics in this meeting became redundant to other coordination in the Engineering Allocation, GitLab.com Standup, and Incident Review meetings.
This daily meeting has been discontinued as of Oct 17, 2022. Improved maturity of other process along with overall improved reliability and security results led to reduced need for this session. It is expected that, as needed, follow-up sync discussions may be required for high-severity incident follow-up and these are to be scheduled as part of the overall incident management process.
Reliability discussions and firedrills is a purely technical meeting for Infrastructure ICs to discuss technical topics. The agenda is driven by:
Project status discussions are strictly out of bounds in this (the only exception being the resolution of technical dependencies).
While open discussions are welcome, it is strongly recommended that written designs are used as the source of agenda items. This allows everyone gain the required context–before the meeting starts–for an engaging conversation.
During discussions, it is ok to point shortcomings for a given design. This is one way in which we expand our angle of vision and learn. In general, however, make it a point to provide alternatives.
We have weekly infrastructure oncall handover and staff meetings. These meetings are organized by Infrastructure Managers that occur as a time for SREs to have weekly handover notes for Oncall and other announcements for the team. We run these meetings from the Team Newsletter issues.
Infrastructure mStaff is a loose denomination for the group of people who report directly to the Vice President of Infrastructure. This is a group composed of both managers and individual contributors, and they are responsible for the overall direction of Infrastructure and the achievement of our goals:
The weekly mStaff brings together Infrastructure's management team for a weekly sync to prepare for the week and address issues that require attention. The meeting is organized by the VP of Infrastructure.
General Issue Trackers | General Slack Channels | Team Slack Channels | Resources |
---|---|---|---|
Infrastructure issue queue | #production | #g_delivery | Production Architecture |
Production incidents, and changes | #infrastructure-lounge | #g_scalability | Operational Runbooks |
Delivery | #incident-management | Environments | |
Scalability | #announcements | Monitoring | |
#feed_alerts-general | Readiness Reviews | ||
Infrastructure Standards |