GitLab has a Three-Year Strategy, and we're excited to see every member of the Engineering division contribute to achieving it. Whether you're creating something new or improving something that already exists, we want you to feel empowered to bring your best ideas for influencing the product direction through improved scalability, usability, resilience, and system architectures. And when you feel like you need to expand your knowledge in a particular area, know that you're supported in having the resources to learn and improve your skills.
Having a product that people enjoy using is critical to our company’s success. Many users rely on GitLab every single workday to create their own great products. When we offer experiences that make their jobs faster and easier, we support them in building success in their own careers—a responsibility that we take seriously. And when our users are successful, we’re more successful, too, as evidenced by a McKinsey study showing 32% higher revenue growth in a 5-year period for companies that focus on great design.
That’s why part of our FY23 focus is to raise our System Usability Scale score through efforts like burning down significant (Sev 1 and Sev 2) usability issues, addressing Usability Benchmark insights, and migrating our UI to single-source-of-truth Pajamas components for better consistency, accessibility, and general usability.
We have a 3-year goal of reaching 1,000 monthly contributors as a way to mature new stages, add customer-desired features that aren’t on our roadmap, and even translate our product into multiple languages. To build towards that goal, in FY23, we’re enhancing our Contributor Education to help support high-quality contributions that align with our approach to product development. We’re also improving our response time for contributions and providing timely feedback for a better contributor experience. Lastly, we’re focused on a step-change increase of contributors by creating a Leading Organizations program tailored for full-time contributors.
Diverse teams perform better. They provide a sense of belonging that leads to higher levels of trust, better decision making, and a larger talent pool. They also focus more on facts, process facts more carefully, and are more innovative. By hiring globally and increasing the numbers of women and ethnic minorities in the Engineering division, we’re helping everyone bring their best selves to work.
Hiring is still a top priority in FY23, and we're excited to continue hiring people who are passionate about our product and have the skills to make it the best DevSecOps tool in the market. Our current focus areas include reducing the amount of time between offer and start dates and hiring a diverse team (see above). We're also implementing industry-standard approaches like structured, behavioral, and situational interviewing to help ensure a consistent interview process that helps to identify the best candidate for every role. We’re excited to have a recruiting org to partner with as we balance the time that managers spend recruiting against the time they spend investing in their current team members.
As expected, a large part of our focus in FY23 is on improving our product. Following are some large projects to note.
For Enterprise customers, we’re refining our product to meet the levels of security and reliability that customers rightfully demand from SaaS platforms (SaaS Reliability). We’re also providing more robust utilization metrics to help them discover features relevant to their own DevOps transformations (Usage Reporting) and offering the ability to purchase and manage licenses without spending time contacting Sales or Support (E-Commerce and Cloud Licensing). Lastly, in response to Enterprise customer requests, we're adding features to support Suggested Reviewers, better portfolio management through Work Items, and Audit Events that provide additional visibility into user passive actions.
For Free Users, we’re becoming more efficient with our open core offering, so that we can continue to support and give back to students, startups, educational institutions, open source projects, GitLab contributors, and nonprofits (SaaS Free User Efficiency).
For Federal Agencies, we’re obtaining FedRAMP certification to strengthen confidence in the security standards required on our SaaS offering. This is a mandated prerequisite for United States federal agencies to use our product.
For Hosted Customers, we're supporting feature parity between Self-Managed and GitLab Hosted environments through the Workspace initiative. We're also launching GitLab Dedicated for customers who want the flexibility of cloud with the security and performance of a single-tenant environment.
For customers using CI/CD, we're expanding the available types of Runners to include macOS, Linux/Docker, and Windows, and we're autoscaling build agents.
In alphabetical order:
GitLab Engineering values clear, concise, transparent, asynchronous, and frequent communication. Here are our most important modes of communication:
As part of a fully-distributed organization such as GitLab, it is important to stay informed about engineering-led initiatives. We employ multimodal communication, which describes the minimum set of communication channels we'll broadcast to.
For the Engineering department, any important initiative will be announced in:
If you frequently check any of these channels, you can consider yourself informed. It is up to the person sharing to ensure that the same message is shared across all channels. Ideally, this message should be a one sentence summary with a link to an issue to allow for a single source of truth for any feedback.
The #engineering-fyi is used for large scale announcements and to drive views of the engineering week in review document. Everyone can contribute an announcement as long as the following criteria are met:
The posting model here is one of trusting judgement of the individual making the announcement. You do not need to ask for permission to post.
There are seven departments within the Engineering Division:
While we have moved to the cross-functional prioritization process to empower teams to determine the optimal balance of all types of issues, we will keep Engineering Allocations as a way to allow teams to quickly shift to a critical priority, designating the EM as the DRI to drive the effort.
Engineering is the DRI for mid/long term team efficiency, performance, security (incident response and anti-abuse capabilities), availability, and scalability. The expertise to proactively identify and iterate on these is squarely in the Engineering team. Whereas Product can support in performance issues as identified from customers. In some ways these efforts can be viewed as risk-mitigation or revenue protection. They also have the characteristic of being larger than one group at the stage level. Development would like to conduct an experiment to focus on initiatives that should help the organization scale appropriately in the long term. We are treating these as a percent investment of time associated with a stage or category. The percent of investment time can be viewed as a prioritization budget outside normal Product/Development assignments.
Engineering Allocation is also used in short-term situations in conjunction and in support of maintaining acceptable Error Budgets for GitLab.com and our GitLab-hosted first theme.
Unless it is listed in this table, the Engineering Allocation for a stage/group is 0% and we are following normal prioritization. Refer to this page for Engineering Allocation charting efforts. Some stage/groups may be allocated at a high percentage or 100%, typically indicating a situation where all available effort is to be focused on Reliability related (top 5 priorities from prioritization table) work.
During an Engineering Allocation, the EM is responsible for recognizing the problem, creating a satisfactory goal with clear success criteria, developing a plan, executing on a plan and reporting status. It is recommended that the EM collaborate with PMs in all phases of this effort as we want PMs to feel ownership for these challenges. This could include considering adding more/less allocation, setting the goals to be more aspirational, reviewing metrics/results, etc. We welcome strong partnerships in this area because we are one team even when allocations are need to resolving issues critical to our business.
During periods of Engineering Allocation, the PM remains the interface between the group and the fields teams & customers. This is important because:
|Group/Stage||Description of Goal||Justification||Maximum % of headcount budget||People||Supporting information||EMs / DRI||PMs|
|Manage:Workspace (BE)||Scalability of GitLab hierarchy functionality (Workspace)||Reduce duplication of code and increase performance for Groups/Projects||33%||2||Consolidate Groups and Projects||@mksionek||@lohrc|
|Govern:Threat Insights||Bring error budget back to green||Work on backlog of reliability and security issues||40%||5||&5629, &4239, &2791||@thiagocsf||@matt_wilson|
|Antiabuse||Improve PVS||Avoid production outages||33% as of Aug 19, 2022||1 fixed||PVS standup||@jayswain||@jstava|
Each allocation has a direction page maintained by the Engineering Manager. The Engineering Manager will provide regular updates to the direction page. Steps to add a direction page are:
index.html.mdin the newly created directory
To see an example for an Engineering Allocation Direction page, see Continuous Integration Scaling. Once the Engineering Allocation is complete, delete the direction page.
Groups allocating effort to an engineering allocation should update progress synchronously or asynchronously in the weekly, cross-functional infradev and engineering allocation meeting [agenda (internal)]. The intention of this meeting is to communicate progress on engineering allocations and to evaluate and prioritise escalations from infrastructure.
Engineering Allocation progress reports should appear in the following format:
One of the most frequent questions we get as part of this experiment is "How does a problem get put on the Engineering Allocation list?". The short answer is someone makes a suggestion and we add it. Much like everyone can contribute, we would like the feedback loop for improvement and long terms goals to be robust. So everyone should feel the empowerment to suggest an item at any time.
To help with getting items that on the list for consideration, we will be performing a survey periodically. The survey will consist of the following questions:
We will keep the list of questions short to solicit the most input. The survey will go out to members of the Development, Quality, Security. After we get the results, we will consider items for potential adding as an Engineering Allocation.
Once the item's success criteria are achieved, the Engineering Manager should consult with counterparts to review whether the improvements are sustainable. Where appropriate, we should consider adding monitoring and alerting to any areas of concern that will allow us to make proactive prioritizations in future should the need arise. The Engineering Manager should close all related epics/issues, reset the allocation in the above table to the floor level, and inform the Product Manager when the allocated capacity will be available to return their focus to product prioritizations.
When reseting a groups Engineering Allocation in the table above, the goal should be set as
floor %, the goal should be
empower every SWEs from raising reliability and security issues, percentage of headcount allocated should be
N/A in place of a link to the Epic.
All engineering allocation closures should be reviewed and approved by the VP of Development.
See the Cross-Functional Prioritization page for more information.
We will enact a localized feature change lock (FCL) anytime there is an S1 or public-facing (status page) S2 incident on GitLab.com (including the License App, CustomersDot, and Versions) determined to be caused by a change from the development department. The team involved should be determined by the author, their line manager, and that manager's other direct reports.
Direct reports involved in an active borrow should be included if they were involved in the authorship or review of the change.
An FCL assignment and creation must be approved by either the VP of Infrastructure or VP of Development. The purpose is to foster a sense of ownership and accountability amongst our teams, but this should not challenge our no-blame culture.
Rough guidance on timeline is provided here to set expectations and urgency for an FCL. We want to balance moving urgently with doing thoughtful important work to improve reliaiblity. Note that as times shift we can adjust accordingly. The DRI of an FCL should pull in the timeline where possible.
The following bulleted list provides a suggested timeline starting from incident to completion of the FCL. "Business day x" in this case refers to the x business day after the incident.
The approver of the FCL will add the item to the daily reliability standup and will tag the development director so they can assign as appropriate.
During the FCL, the team(s) exclusive focus is around reliability work, and any feature type of work in-flight has to be paused or re-assigned. Maintainer duties can still be done during this period and should keep other teams moving forward. Explicitly higher priority work such as security and data loss prevention should continue as well. The team(s) must:
#fcl-incident-[number], with members
closing ceremonyupon completing the FCL to review the retrospectives and celebrate the learnings.
After the Incident Review is completed, the team(s) focus is on preventing similar problems from recurring and improving detection. This should include, but is not limited to:
Examples of this work include, but are not limited to:
Any work for the specific team kicked off during this period must be completed, even if it takes longer than the duration of the FCL. Any work directly related to the incident should be kicked off and completed even if the FCL is over. Work paused due to the FCL should be the priority to resume after the FCL is over. Items created for other teams or on a global level don't affect the end of the FCL.
The stable counterpart from Infrastructure will be available to review and consult on the work plan.
Team members are welcome to run Folding@home on their company provided computers. Folding@home is a distributed computing network that is searching for therapies for the COVID-19 respiratory illness among other diseases. We recommend running it at night if you have high daily compute workloads. Also keep your computer plugged in. We considered potential security and hardware implications in this issue.
If you would like to join a team with other GitLab team members, there is a
GitLab Team Members team for Folding@home. When setting up or changing your Folding@home identity, you can add team
245256. This is not a competition, but simply to track how much our team members have contributed overall. You can view our statistics on our team page. You can discuss with other GitLab team members in the #folding-at-home slack channel.
Please reference our internal hiring repository for internal best practices and guidelines.
Whenever a team member departs from GitLab or they transfer to a different role, the below process should be followed to open a backfill. This process ensures alignment between the Department Heads, Finance business partner and Talent Acquisition. For departures, a backfill can only be opened once a departure or resignation is official where we've received written confirmation of the departure including the last working day and the People Business Partner has submitted the Offboarding Form to the People Ops team. For transfers, a backfill can only be opened once a transfer is official where an offer letter stating the transfer date has been completed.
As the Department Head is ready to request GHPIDs for budgeted new headcount, the below process should be followed to open a new headcount.
If two Department Heads agree to move a headcount between their teams, the following process should be followed.
The VP of Engineering and their direct reports track our highest priorities in the Engineering Management Issue Board, rather than to do lists, Google Doc action items, or other places. The reasons for this are:
Here are the mechanics of making this work:
Engineering Managementlabel to get it on the board, and the department label to get it in progress (e.g.
CEO Interestlabel, please post it to #ceo
In GitLab Engineering we are serious about concepts like servant leadership, over-communication, and furthering our company value of transparency. You may have joined GitLab from another organization that did not share the same values or techniques. Perhaps you're accustomed to more corporate politics? You may need to go through a period of "unlearning" to be able to take advantage of our results-focused, people-friendly environment. It takes time to develop trust in a new culture.
Less common, but even more important, is to make certain you don't unintentionally bring any mal-adaptive behaviors to GitLab from these other environments.
We encourage you to read the engineering section of the handbook as part of your onboarding, ask questions of your peers and managers, and reflect on how you can help us better live our culture:
We manually verify that our code works as expected. Automated test coverage is essential, but manual verification provides a higher level of confidence that features behave as intended and bugs are fixed.
We manually verify issues when they are in the
Generally, afer you have manually verified something, you can close the associated issue.
See the Product Development Flow to learn more about this issue state.
We manually verify in the staging environment whenever possible. In certain cases we may need to manually verify in the production environment.
If you need to test features that are built for GitLab Ultimate then you can get added to the issue-reproduce group on production and staging environments by asking in the #development Slack channel. These groups are are on an Ultimate plan.
Before the beginning of each fiscal year, and at various check points throughout the year, we plan the size and shape of the Engineering and Product Management functions together to maintain symmetry.
The process should take place in a single artifact (usually a spreadsheet, current spreadsheet), and follow these steps:
Note: Support is part of the engineering function but is budgeted as 'cost of sales' instead of research and development. Headcount planning is done separately according to a different model.
The non support related departments within Engineering (Development, Infrastructure, Quality, Security, and UX) have an expense target of 20% as a percentage of revenue.
The Support target is 10% as a percentage of revenue.
The PlatoHQ Program has a total of 10 Engineering Managers/Senior IC's participating. The program exists of both self-learning via an online portal and 1-1 sessions with a mentor.
The 7CTOs Program is run with 4 Senior leaders in Engineering. The program exists of peer mentoring sessions (forums) and effective network building.
The CTO is an executive sponsor for selected customers.
A shadow program is available to everyone in engineering (especially senior leaders) in order to have an opportunity to observe and participate in one of the executive sponsor meetings. Doing so can be a great way to hear directly from customers about what they like about GitLab and about what we can improve. (This program is similar in some ways to the CEO Shadow Program.
If you choose to be a shadow, your responsibilities will be:
To request to be a shadow: Post a message in the #cto Slack channel, indicate your timezone, and CC the CTO's EBA Kristie Thomas.
There is a program to find a mentor or to become a mentor at GitLab described on this handbook page.
You can find more information on this experimental program in this handbook page.
In most cases, a single engineer and maintainer review are adequate to handle a priority::1/severity::1 issue. However, some issues are highly difficult or complicated. Engineers should treat these issues with a high sense of urgency. For a complicated priority::1/severity::1 issue, multiple engineers should be assigned based on the level of complexity. The issue description should include the team member and their responsibilities.
If we have cases where three or five or X people are needed, Engineering Managers should feel the freedom to execute on a plan quickly.
Following this procedure will:
Engineering is the primary advocate for the performance, availability, and security of the GitLab project. Product Management prioritizes all initiatives, so everyone in the engineering function should participate in the Product Management prioritization process to ensure that our project stays ahead in these areas. The following list should provide some guidelines around the initiatives that each engineering team should advocate for during their release planning:
Support Team Contributionslabel. You can filter on open MRs.
GitLab makes use of a 'Canary' stage. Production Canary is a series of servers running GitLab code in a production environment. The Canary stage contains code functional elements like web, container registry and git servers while sharing data elements such as sidekiq, database, and file storage with production. This allows UX code and most application logic code to be consumed by a smaller subset of users under real world scenarios before being made available to all users on GitLab.com.
Information on canary testing has been moved to dedicated page covering the canary stage and how to use it
There are primarily two Slack channels which developers may be called upon to assist the production team when something appears to be amiss with GitLab.com:
#backend: For backend-related issues (e.g. error 500s, high database load, etc.)
Treat questions or requests from production team for immediate urgency with high priority.
There are some engineering handbook topics that we cannot be publicly transparent about. These topics can be viewed by GitLab team members in the engineering section of the private handbook.
If you experience a page not found (404) error when attempting to access the internal handbook, you may need to register to use it via first browsing to the internal handbook authorization page.