GitLab has a Three-Year Strategy, and we're excited to see every member of the Engineering division contribute to achieving it. Whether you're creating something new or improving something that already exists, we want you to feel empowered to bring your best ideas for influencing the product direction through improved scalability, usability, resilience, and system architectures. And when you feel like you need to expand your knowledge in a particular area, know that you're supported in having the resources to learn and improve your skills.
Our focus on FY24 is to make sure that GitLab is enterprise grade in all its abilities and to support the AI efforts required to successfully launch Code Suggestions and GitLab Duo to General Availability.
Making sure that GitLab is enterprise grade involves several teams collaborating on improving our disaster recovery and support offerings through ongoing work with GitLab Dedicated, and Cells infrastructure. Our goal here is improved availability and service recovery.
Engineering culture at GitLab encompasses the processes, workflows, principles and priorities that all stem from our GitLab Values. All these things continuously strengthen our engineering craftsmanship and allow engineers to achieve engineering excellence, while growing and having a significant, positive impact on the product, people and the company as a whole. Our engineering culture is primarily being carried, and evolves, through knowledge sharing and collaboration. Everyone can be part of this process because at GitLab everyone can contribute.
Engineering excellence can be defined as an intrinsic motivation to improve engineering efficiency, software quality, and deliver better results while building software products. Engineering excellence is being fueled by a strong engineering culture combined with a mission: to build better software that allows everyone to contribute.
Engineering is the primary advocate for the performance, availability, and security of the GitLab project. Product Management prioritizes 80% of engineering time, so everyone in the engineering function should participate in the Product Management prioritization process to ensure that our project stays ahead in these areas. Engineering prioritizes 20% of time on initiatives that improve the underlying platform and foundational technologies we use.
Support Team Contributions
label. You can filter on open MRs.We have a 3-year goal of reaching 1,000 monthly contributors as a way to mature new stages, add customer-desired features that aren’t on our roadmap, and even translate our product into multiple languages.
Diverse teams perform better. They provide a sense of belonging that leads to higher levels of trust, better decision making, and a larger talent pool. They also focus more on facts, process facts more carefully, and are more innovative. By hiring globally and increasing the numbers of women and ethnic minorities in the Engineering division, we’re helping everyone bring their best selves to work.
Hiring is still a top priority in FY24, and we're excited to continue hiring people who are passionate about our product and have the skills to make it the best DevSecOps tool in the market. Our current focus areas include reducing the amount of time between offer and start dates and hiring a diverse team (see above). We're also implementing industry-standard approaches like structured, behavioral, and situational interviewing to help ensure a consistent interview process that helps to identify the best candidate for every role. We’re excited to have a recruiting org to partner with as we balance the time that managers spend recruiting against the time they spend investing in their current team members.
As expected, a large part of our focus in FY24 is on improving our product.
For Enterprise customers, we’re refining our product to meet the levels of security and reliability that customers rightfully demand from SaaS platforms (SaaS Reliability). We’re also providing more robust utilization metrics to help them discover features relevant to their own DevOps transformations (Usage Reporting) and offering the ability to purchase and manage licenses without spending time contacting Sales or Support (E-Commerce and Cloud Licensing). Lastly, in response to Enterprise customer requests, we're adding features to support Suggested Reviewers, better portfolio management through Work Items, and Audit Events that provide additional visibility into user passive actions.
For Free Users, we’re becoming more efficient with our open core offering, so that we can continue to support and give back to students, startups, educational institutions, open source projects, GitLab contributors, and nonprofits
For Federal Agencies, we’re obtaining FedRAMP certification to strengthen confidence in the security standards required on our SaaS offering. This is a mandated prerequisite for United States federal agencies to use our product.
For Hosted Customers, we're supporting feature parity between Self-Managed and GitLab Hosted environments through the Workspace initiative. We're also launching GitLab Dedicated for customers who want the flexibility of cloud with the security and performance of a single-tenant environment.
For customers using CI/CD, we're expanding the available types of Runners to include macOS, Linux/Docker, and Windows, and we're autoscaling build agents.
There are four departments within the Engineering Division:
See the Cross-Functional Prioritization page for more information.
To maintain high availability, Engineering runs a weekly SaaS Availability standup to:
Each week the Infrastructure team reports on incidents and key metrics. Updating these items at the top of the Engineering Allocation Meeting Agenda is the responsibility of the Engineering Manager for the General Squad in Reliability.
For the core and expansion development departments, updates on current status of:
Groups under Feature Change Locks should update progress synchronously or asynchronously in the weekly agenda. The intention of this meeting is to communicate progress and to evaluate and prioritise escalations from infrastructure.
Feature Change Locks progress reports should appear in the following format in the weekly agenda:
FCL xxxx - [team name]
<issue link>
<issue link>
<issue link>
<link to Timeline tab of the Incident issue>
A Feature Change Lock (FCL) is a process to improve the reliability and availability of GitLab.com. We will enact an FCL anytime there is an S1 or public-facing (status page) S2 incident on GitLab.com (including the License App, CustomersDot, and Versions) determined to be caused by an engineering department change. The team involved should be determined by the author, their line manager, and that manager's other direct reports.
If the incident meets the above criteria, then the manager of the team is responsible for:
If the team believes there does not need to be an FCL, approval must be obtained from either the VP of Infrastructure or VP of Development.
Direct reports involved in an active borrow should be included if they were involved in the authorship or review of the change.
The purpose is to foster a sense of ownership and accountability amongst our teams, but this should not challenge our no-blame culture.
Rough guidance on timeline is provided here to set expectations and urgency for an FCL. We want to balance moving urgently with doing thoughtful important work to improve reliability. Note that as times shift we can adjust accordingly. The DRI of an FCL should pull in the timeline where possible.
The following bulleted list provides a suggested timeline starting from incident to completion of the FCL. "Business day x" in this case refers to the x business day after the incident.
During the FCL, the team(s) exclusive focus is around reliability work, and any feature type of work in-flight has to be paused or re-assigned. Maintainer duties can still be done during this period and should keep other teams moving forward. Explicitly higher priority work such as security and data loss prevention should continue as well. The team(s) must:
#fcl-incident-[number]
, with members
[Group Name] FCL for Incident ####
closing ceremony
upon completing the FCL to review the retrospectives and celebrate the learnings.
After the Incident Review is completed, the team(s) focus is on preventing similar problems from recurring and improving detection. This should include, but is not limited to:
Examples of this work include, but are not limited to:
Any work for the specific team kicked off during this period must be completed, even if it takes longer than the duration of the FCL. Any work directly related to the incident should be kicked off and completed even if the FCL is over. Work paused due to the FCL should be the priority to resume after the FCL is over. Items created for other teams or on a global level don't affect the end of the FCL.
A stable counterpart from Infrastructure will be available to review and consult on the work plan for Development Department FCLs. Infrastructure FCLs will be evaluated by an Infrastructure Director.
The Quality Department is the DRI for Engineering Performance Indicators. Work regarding KPI / RPI is tracked on the engineering metrics board and task process.
We manually verify that our code works as expected. Automated test coverage is essential, but manual verification provides a higher level of confidence that features behave as intended and bugs are fixed.
We manually verify issues when they are in the workflow::verification
state.
Generally, after you have manually verified something, you can close the associated issue.
See the Product Development Flow to learn more about this issue state.
We manually verify in the staging environment whenever possible. In certain cases we may need to manually verify in the production environment.
If you need to test features that are built for GitLab Ultimate then you can get added to the issue-reproduce group on production and staging environments by asking in the #development Slack channel. These groups are are on an Ultimate plan.
We follow the below process when existing critical customer escalations requires immediate scheduling of bug fixes or development effort.
~"priority::1"
regardless of severity~"critical-customer-escalation"
is applied to the issueThe DRI can use the customer critical merge requests process to expedite code review & merge.
In most cases, a single engineer and maintainer review are adequate to handle a priority::1/severity::1 issue. However, some issues are highly difficult or complicated. Engineers should treat these issues with a high sense of urgency. For a complicated priority::1/severity::1 issue, multiple engineers should be assigned based on the level of complexity. The issue description should include the team member and their responsibilities.
Team Member | Responsibility |
---|---|
Team Member 1 |
Reproduce the Problem |
Team Member 2 |
Audit Code Base for other places where this may occur |
If we have cases where three or five or X people are needed, Engineering Managers should feel the freedom to execute on a plan quickly.
Following this procedure will:
Information on canary testing has been moved to dedicated page covering the canary stage and how to use it
There are some engineering handbook topics that we cannot be publicly transparent about. These topics can be viewed by GitLab team members in the engineering section of the private handbook.
If you experience a page not found (404) error when attempting to access the internal handbook, you may need to register to use it via first browsing to the internal handbook authorization page.