The Infrastructure Department is responsible for the availability, reliability, performance, and scalability of all user-facing services (most notably GitLab.com, the largest production GitLab Installation on the planet). Other departments and teams contribute greatly to these attributes of our service as well. In these cases it is the responsibility of the Infrastructure Department to close the feedback loop with monitoring and metrics to drive accountability.
We are a blend of operations gearheads and software crafters whose highest priority is the protection of the integrity and reliability of the operational environments that support GitLab.com. We apply sound engineering principles, a healthy dose of operational discipline, and the selection of technologies to build mature automation that safeguards the operational environment while driving changes through it. We aim to make GitLab.com ready for mission-critical customer workloads, and we strive for excellence every day by living and breathing GitLab's values as our guiding operating principles in every decision we make and every action we take.
The Infrastructure Library contains documents that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.
Blueprints scope out our initial thinking about specific problems and issues we are working on. Designs outline the specific architecture and implementation.
GitLab uses Objectives and Key Results (OKRs) as quarterly goals to execute our strategy to make sure [said] goals are clearly defined and aligned throughout the organization. We capture the objectives in epics.
Classicfication of the Infrastructure department projects is described on the infrastructure department projects page.
GitLab is a widely distributed company, and we aim to work asynchronously most of the time. There are times, however, when we must get together to discuss topics in real time, and thus, we do have some meetings scheduled. Meetings start on time and end on time—or earlier.
Our team calendar is shared with the company.
Every scheduled team meeting must have a Google Doc agenda attached to the invite. The agendas should be long-running and organized by date. Each meeting's agenda is set and reviewed before the start of each meeting. Everyone invited to the meeting should have edit rights to add agenda items before the start of the meeting and to take notes during the meeting.
The GitLab SaaS Infrastructure meeting is one of many channels to share and distribute Infrastructure-related information relevant to the entire company. The meeting is organized by the Infrastructure leadership team and the VP of Infrastructure. This meeting aims to bring together many different related aspects which all influence the GitLab SaaS infrastructure reliability, scalability, performance, and efficiency.
This meeting is always to be recorded and made available in support of those in varying timezones. It is meant to be a learning environment for what is coming, what we've done well, and what we can do to improve. All GitLab team members are welcome to attend this meeting.
While the agenda will continually evolve, the main structure is outlined below. Everyone should feel welcome to contribute to the agenda, but Infrastructure mStaff will work to curate the agenda into the best use of 50 minutes each week.
The recording of this meeting will be made available on GitLab Unfiltered.
Design and Automation (DNA) is a purely technical meeting for Infrastructure ICs to discuss technical topics. The agenda is driven by design documents from the library, although discussion on other technically relevant topics is welcome. Project status discussions are strictly out of bounds in this (the only exception being the resolution of technical dependencies).
While open discussions are welcome, it is strongly recommended that blueprints and designs are used as the source of agenda items. This allows everyone gain the required context–before the meeting starts–for an engaging conversation.
During discussions, it is ok to point shortcomings for a given design. This is one way in which we expand our angle of vision and learn. In general, however, make it a point to provide alternatives.
Each team in Infrastructure has a weekly Staff meeting, where relevant team issues are discussed. These meetings are organized by Infrastructure Managers for their respective teams.
Infrastructure mStaff is a loose denomination for the group of people who report directly to the VP of Engineering, Infrastructure. This is a group composed of both managers and individual contributors, and they are responsible for the overall direction of Infrastructure and the achievement of our goals:
The Infrastructure mStaff Board collects issues for all managers (including the VP) and mStaff-level individual contributors (e.g., the Infrastructure Operations Analyst and Distinguished Engineer).
The weekly mStaff brings together Infrastructure's management team for a weekly sync to prepare for the week and address issues that require attention. The meeting is organized by the VP of Infrastructure.
The Infrastructure Department is comprised of three teams:
For details on the Department's structure, see the Infrastructure Teams Handbook section.
Additionally, Infrastructure's mStaff is the loose denomination for the group of people who report directly to the VP of Engineering, Infrastructure, a group composed of both managers and individual contributors responsible for the overall direction of Infrastructure. The Infrastructure mStaff Board collects issues for all managers (including the VP) and mStaff-level individual contributors (e.g., the Infrastructure Operations Analyst and Distinguished Engineer).
|GitLab.com Status||Status Page|
|How to get help||How to get help|
|Incident Management||How we handle GitLab.com incidents|
|Change Management||How we manage changes to GitLab.com|
|Workflow||How may we be of service?|
|Issue Trackers||Infrastructure: Milestones, OnCall||Production: Incidents, Changes, Deltas||Delivery|
|Slack Channels||#infrastructure-lounge, #database||#alerts, #production||#g_delivery|
|Operations||Runbooks (please contribute!)||On-call: Handover Document, Reports|
|Production||SRE Onboarding||Readiness Guide||Database Reliability||On-call Handover|