The mission of the Reliability:Foundations team at GitLab is to Build, Run and Own the entire lifecycle of the core infrastructure for GitLab.com.
The team is focused on owning the reliability, scalability, and security of the existing core infrastructure. We seek to reduce the effort required to provide our core infrastructure services, and to enable other teams to self-serve core infrastructure that allows them to more efficiently/effectively run their services for GitLab.com.
In order to enable Infrastructure, Development & Product teams to build their services for GitLab.com and fulfill their respective missions, we work to make the consumption of our services as simple as possible.
We seek to build our services on top of GitLab features, and use cloud vendor managed products to reduce complexity, improve efficiency and deliver new capabilities more quickly, where they are the right choice.
While the team does not explicitly have any product responsibilities, we endeavor to contribute the lessons we learn from running at-scale production systems back to the product teams, and advocate for GitLab to contain features that would allow us to DogFood.
The Foundations Team supports the rest of Infrastructure and Development by providing the resources that other teams build upon. We do so by working collaboratively, iteratively, and striking the right balance between delivering results quickly yet safely.
The Services that the Foundations team is responsible for fall into two general categories: Core and Edge.
Service | Description | Co-Ownership? |
---|---|---|
K8s | K8S workloads deployments, Cluster addons | Autodeploy remains with Delivery, and anything Delivery related is co-owned with Delivery |
Config | Terraform, Chef, Image Builds | The core TF repos are owned by Foundations, while specific modules may be maintained by the teams that use them |
Service discovery | Consul | |
Secrets Management | Vault | Vault is offered as a service to enable teams to manage their own secrets |
Ops | Ops.gitlab.net, Ops Runners |
Service | Description | Co-Ownership? |
---|---|---|
CDN | Cloudflare | |
DNS | AWS Route 53, Cloudflare | |
Load Balancing | HAProxy, Ingress | |
Networking | Cloud VPCs, Cloudflare | |
Rate Limiting | Rate limiting | shared ownership with development teams for specific endpoints and with abuse |
RBAC/IAM | Teleport, GCP IAM permissions and project creation |
Given the nature of this team's scope, several services the Foundation team works with are either co-owned by other teams or directly impact their work, as noted above.
Our primary customers are other teams in the Infrastructure department. Our services have particular overlap and impact on the Delivery teams and Reliability::General. Other teams outside of Infrastructure that we collaborate with regularly are Support and various teams in the Security organization.
We've adopted a version of the SPACE framework for Performance Indicators.
For more context, see the related discussion issue.
The Foundations Team must maintain a broad and diverse set of technical skills while also maintaining the ability to switch contexts frequently. Some of these technical skills include:
Cloudnative Engineering - Proficiency in Kubernetes and the associated ecosystem of running cloudnative services.
Infrastructure as Code - Proficiency in Chef and Terraform
Network Systems - Understanding of network concepts and experience with our Edge stack (see Edge services above)
### Values
In addition to striving to embrace GitLab's values, the Foundations team seeks to embody the following: