The Infrastructure Department is responsible for the availability, reliability, performance, and scalability of GitLab.com and other supporting services. These responsibilities have cost efficiency as an additional driving force, reinforced by the properly prioritized dogfooding efforts. Many other departments and teams also contribute to the success of our service. It is the responsibility of the Infrastructure Department to drive ongoing accountability by closing the feedback loop with monitoring and metrics.
We are a blend of operations gearheads and software crafters whose highest priority is the protection of the integrity and reliability of the operational environments that support GitLab.com. We apply sound engineering principles, a healthy dose of operational discipline, and the selection of technologies to build mature automation that safeguards the operational environment while driving changes through it. We aim to make GitLab.com ready for mission-critical customer workloads, and we strive for excellence every day by living and breathing GitLab's values as our guiding operating principles in every decision we make and every action we take.
The Infrastructure Department work towards completing the vision for enterprise grade ready GitLab SaaS platform is conducted in a number of ways. Through writing up Designs, setting quarterly OKR's but also by driving projects that span multiple quarters.
GitLab uses Objectives and Key Results (OKRs) as quarterly goals to execute our strategy to make sure [said] goals are clearly defined and aligned throughout the organization. We capture the objectives in issues.
Initiatives driven within the Infrastructure Department, often spanning multiple quarters, are represented on the Infrastructure Department epic.
One of the strategic initiatives that is driving us towards the vision, aligned with the company direction strategy, is GitLab.com running on the Kubernetes platform.
Other strategic initiatives to achieve this vision are driven by the needs of enterprise customers looking to adopt GitLab.com. The GitLab.com strategy catalogs top customer requests for the SaaS offering and outlines strategic initiatves across both Infrastructure and Stage Groups needed to address these gaps.
The strategy section will expand to add more details in the future.
The Infrastructure Library contains documents that outline our thinking about the problems we are solving and represents the current state for any topic, playing a significant role in how we produce technical solutions to meet the challenges we face.
Blueprints scope out our initial thinking about specific problems and issues we are working on. Designs outline the specific architecture and implementation.
The Infrastructure department uses GitLab and GitLab features extensively as the main tool for operating many environments, including GitLab.com.
We follow the same dogfooding process as part of the Engineering function, while keeping the department mission statement as the primary prioritization driver. The prioritization process is aligned to the Engineering function level prioritization process which defines where the priority of dogfooding lies with regards to other technical decisions the Infrastructure department makes.
When we consider building tools to help us operate GitLab.com, we follow the 5x rule
to determine whether to build the tool as a feature in GitLab or outside of GitLab. To track Infrastructure's contributions back into the GitLab product, we tag those issues with the appropriate Dogfooding label.
At GitLab, we have a handbook first policy. It is how we communicate process changes, and how we build up a single source of truth for work that is being delivered every day.
The handbook usage page guide lists a number of general tips. Highlighting the ones that can be encountered most frequently in the Infrastructure department:
Classicfication of the Infrastructure department projects is described on the infrastructure department projects page.
The infrastructure issue tracker is the backlog and a catch-all project for the infrastructure teams and tracks the work our teams are doing–unrelated to an ongoing change or incident.
The Infrastructure Department is comprised of three distinct groups:
Product Management duties for the Infrastructure Department are handled by the Infrastructure PM, who reports into the Enablement Stage.
For details on the Department's structure, see the Infrastructure Teams Handbook section.
GitLab is a widely distributed company, and we aim to work asynchronously most of the time. There are times, however, when we must get together to discuss topics in real time, and thus, we do have some meetings scheduled. Meetings start on time and end on time—or earlier.
Our team calendar is shared with the company.
Every scheduled team meeting must have a Google Doc agenda attached to the invite. The agendas should be long-running and organized by date. Each meeting's agenda is set and reviewed before the start of each meeting. Everyone invited to the meeting should have edit rights to add agenda items before the start of the meeting and to take notes during the meeting.
The GitLab SaaS Infrastructure meeting is one of many channels to share and distribute Infrastructure-related information relevant to the entire company. The meeting is organized by the Infrastructure leadership team and the VP of Infrastructure. This meeting aims to bring together many different related aspects which all influence the GitLab SaaS infrastructure reliability, scalability, performance, and efficiency.
This meeting is always to be recorded and made available in support of those in varying timezones. It is meant to be a learning environment for what is coming, what we've done well, and what we can do to improve. All GitLab team members are welcome to attend this meeting.
While the agenda will continually evolve, the main structure is outlined below. Everyone should feel welcome to contribute to the agenda, but Infrastructure mStaff will work to curate the agenda into the best use of 50 minutes each week.
The recording of this meeting will be made available on GitLab Unfiltered.
Design and Automation (DNA) is a purely technical meeting for Infrastructure ICs to discuss technical topics. The agenda is driven by design documents from the library, although discussion on other technically relevant topics is welcome. Project status discussions are strictly out of bounds in this (the only exception being the resolution of technical dependencies).
While open discussions are welcome, it is strongly recommended that blueprints and designs are used as the source of agenda items. This allows everyone gain the required context–before the meeting starts–for an engaging conversation.
During discussions, it is ok to point shortcomings for a given design. This is one way in which we expand our angle of vision and learn. In general, however, make it a point to provide alternatives.
Each team in Infrastructure has a weekly Staff meeting, where relevant team issues are discussed. These meetings are organized by Infrastructure Managers for their respective teams.
Infrastructure mStaff is a loose denomination for the group of people who report directly to the Vice President of Infrastructure. This is a group composed of both managers and individual contributors, and they are responsible for the overall direction of Infrastructure and the achievement of our goals:
Person | Role |
---|---|
Steve Loyd | VP of Infrastructure |
Marin Jankovski | Senior Engineering Manager, Infrastructure, Delivery & Scalability |
Gerardo "Gerir" Lopez-Fernandez | Engineering Fellow, Infrastructure |
Davis Townsend | Data Analyst, Infrastructure |
Brent Newton | Director of Infrastructure, Reliability |
The weekly mStaff brings together Infrastructure's management team for a weekly sync to prepare for the week and address issues that require attention. The meeting is organized by the VP of Infrastructure.
General Issue Trackers | General Slack Channels | Team Slack Channels | Resources |
---|---|---|---|
Infrastructure issue queue | #production | #sre_observability | Production Architecture |
Production incidents, and changes | #infrastructure-lounge | #sre_datastores | Operational Runbooks |
Delivery | #incident-management | #sre_coreinfra | Environments |
Scalability | #announcements | #g_delivery | Monitoring |
#feed_alerts-general | #g_scalability | Readiness Reviews | |
Infrastructure Standards |