Hiring will again be a top priority in FY23. We need to remember that not every hiring manager has experienced our previous levels of growth, and we have a new recruiting org to partner with. We also need to remember that, despite the pressure to hire, it's always quality first. Additionally, we want to balance the time that managers spend recruiting against the time they spend investing in their current team members: giving them opportunities and helping them grow their careers.
We need to live our Diversity, Inclusion, and Belonging core value with respect to geography. We need to work with other divisions such as Finance, Legal, and People to open more countries for hiring and actively source candidates in those locations. Let's also be mindful of collaborating across cultures and time zones and deliberately design new asynchronous ways to work together that don't rely on location and working hours.
For the first time, we're going to have Engineering division team members 100% dedicated to growing our community of contributors. They will collaborate with Community Relations in the Marketing division and everyone else in Engineering who engages with our community, serving as Merge Request Coaches or in other ways. We're going to be experimental and try a large number of programs to see what's most effective and efficient. This will help drive one of our flywheels.
We're continuing to increase our focus on customer results. This means balancing the short-term needs of customers with our long-term vision of the DevSecOps platform. We have a video training edited together by our UX team that shows users navigating our UI. And we have a variety of other opportunities for non-customer facing team members, such as shadowing a support ticket or an infrastructure incident.
We've resolved our reliability backlog and should finish our security backlog some time in Q1. Once done, we'll be in our steady state and will use a new set of backlog metrics and error budgets to better govern the proportions of work done on a release-by-release basis going forward. Similarly, we've aligned with Product Management to improve the usability of GitLab: we'll be building out our UI component library and burning down our new backlog of SUS-impacting issues.
Lastly, Engineering will have a giant role to play in most of the company's top twelve cross-functional projects. Three are being lead by Engineering DRIs: FedRAMP, Reliability, and Project Horse. Five others hinge upon Engineering's ability to deliver high-quality work on ambitious timelines: Cloud Licensing, Free SaaS User Efficiency, Usage Data, Project Matterhorn, and E-commerce.
Department-specific FY23 Direction Statements (in alphabetical order):
GitLab Engineering values clear, concise, transparent, asynchronous, and frequent communication. Here are our most important modes of communication:
As part of a fully-distributed organization such as GitLab, it is important to stay informed about engineering-led initiatives. We employ multimodal communication, which describes the minimum set of communication channels we'll broadcast to.
For the Engineering department, any important initiative will be announced in:
#eng-week-in-review
#development
#development-guidelines
#production
#security
#support_team-chat
#quality
#ux
#whats-happening-at-gitlab
#company-fyi
If you frequently check any of these channels, you can consider yourself informed. It is up to the person sharing to ensure that the same message is shared across all channels. Ideally, this message should be a one sentence summary with a link to an issue to allow for a single source of truth for any feedback.
There are seven departments within the Engineering Division:
As of June 2022, Engineering Allocations are being phased out in favor of the Cross-Functional Prioritization Process, which is expected to be fully adopted by the end of July 2022.
Engineering is the DRI for mid/long term team efficiency, performance, security (incident response and anti-abuse capabilities), availability, and scalability. The expertise to proactively identify and iterate on these is squarely in the Engineering team. Whereas Product can support in performance issues as identified from customers. In some ways these efforts can be viewed as risk-mitigation or revenue protection. They also have the characteristic of being larger than one group at the stage level. Development would like to conduct an experiment to focus on initiatives that should help the organization scale appropriately in the long term. We are treating these as a percent investment of time associated with a stage or category. The percent of investment time can be viewed as a prioritization budget outside normal Product/Development assignments.
Engineering Allocation is also used in short-term situations in conjunction and in support of maintaining acceptable Error Budgets for GitLab.com and our GitLab-hosted first theme.
Unless it is listed in this table, Engineering Allocation for a stage/group is 0% and we are following normal prioritization. Refer to this page for Engineering Allocation charting efforts. Some stage/groups may be allocated at a high percentage or 100%, typically indicating a situation where all available effort is to be focused on Reliability related (top 5 priorities from prioritization table) work.
Mid/long term initiatives are engineering-led. The EM is responsible for recognizing the problem, creating a satisfactory goal with clear success criteria, developing a plan, executing on a plan and reporting status. It is recommended that the EM collaborate with PMs in all phases of this effort as we want PMs to feel ownership for these challenges. This could include considering adding more/less allocation, setting the goals to be more aspirational, reviewing metrics/results, etc. We welcome strong partnerships in this area because we are one team even when allocations are needed for long-range activities.
During periods of Engineering Allocation, the PM remains the interface between the group and the fields teams & customers. This is important because:
Group/Stage | Description of Goal | Justification | Maximum % of headcount budget | People | Supporting information | EMs / DRI | PMs |
---|---|---|---|---|---|---|---|
Manage:Authentication and Authorization (BE) | floor % | empower every SWEs from raising reliability and security issues | 33% | 1 | N/A | @m_gill | @hsutor |
Manage:Workspace (BE) | Scalability of GitLab hierarchy functionality (Workspace) | Reduce duplication of code and increase performance for Groups/Projects | 50% | 1 | Consolidate Groups and Projects | @mksionek | @mushakov |
Manage:Workspace (BE) | Linear Namespace Queries | Replace recursive CTE queries, which are complex and unpredictable | 50% | 1 | Linear Namespace Queries | @mksionek | @ogolowinski |
Manage:Import (BE) | floor % | empower every SWEs from raising reliability and security issues | 33% | 1 | N/A | @wortschi | @ogolowinski |
Manage:Compliance (BE) | floor % | empower every SWEs from raising reliability and security issues | 33% | 1 | N/A | @dennis | @stkerr |
Manage:Optimize (BE) | floor % | empower every SWEs from raising reliability and security issues | 33% | 1 | N/A | @m_gill | @hsnir1 |
Plan:Project management | 3 month headcount reset to help Manage:Workspace | 3 month headcount reset to help Manage:Workspace | 25% | 4 | 3 month headcount reset to help Manage:Workspace | @johnhope | @gweaver |
Plan:Product Planning | 3 month headcount reset to help Manage:Workspace | 3 month headcount reset to help Manage:Workspace | 20% | 5 | 3 month headcount reset to Manage:Workspace | @johnhope | @cdybenko |
Create:Source Code (BE) | floor % | empower every SWEs from raising reliability and security issues | 10% | 1 | N/A | @sean_carroll | @sarahwaldner |
Create:Code Review (BE) | floor % | empower every SWEs from raising reliability and security issues | 10% | 1 | N/A | @mnohr | @phikai |
Create:Gitaly | Infra-Dev Issues, P1/S1 issues, security issues and Customer Escalations (engineering approved) | Improve reliability of Gitaly | 20% | 6 | Infradev Issues | @timzallmann | @mjwood |
Verify:Pipeline Execution | floor % | empower every SWEs from raising reliability and security issues | 10% | 1 | N/A | @marknuzzo | @jheimbuck_gl |
Verify:Pipeline Authoring | floor % | empower every SWEs from raising reliability and security issues | 10% | 1 | N/A | @marknuzzo | @dhershkovitch |
Verify:Runner | floor % | empower every SWEs from raising reliability and security issues | 10% | 1 | N/A | @erushton | @DarrenEastman |
Verify:Pipeline Insights | floor % | empower every SWEs from raising reliability and security issues | 10% | 1 | N/A | @shampton | @jreporter |
Package:Package | floor % | empower every SWEs from raising reliability and security issues | 10% | 5 | N/A | @dcroft | @trizzi |
Release:Release | 3 milestones manage:Import Headcount Reset | Unlocks a new CEO initiative | 17% | 1 | https://gitlab.com/gitlab-com/Product/-/issues/3062 | @nicolewilliams | @cbalane |
Configure:Configure | 3 milestones manage:Import Headcount Reset | Unlocks a new CEO initiative | 20% | 1 | https://gitlab.com/gitlab-com/Product/-/issues/3062 | @nicholasklick | @nagyv-gitlab |
Secure:Static Analysis | floor % | empower every SWEs from raising reliability and security issues | 10% | 5 | N/A | @twoodham | @connorgilbert |
Secure:Dynamic Analysis | floor % | empower every SWEs from raising reliability and security issues | 10% | 5 | N/A | @sethgitlab | @derekferguson |
Secure:Composition Analysis | Proposed 3 month headcount reset to help manage:Import | Proposed 3 month headcount reset to help manage:Import | 25% | 4 | Proposed 3 month headcount reset to help manage:Import | @gonzoyumo | @NicoleSchwartz |
Secure:Threat Insights | Bring error budget back to green | Work on backlog of reliability and security issues | 25% | 4 | List of issues | @thiagocsf | @matt_wilson |
Protect:Container Security | floor % | empower every SWEs from raising reliability and security issues | 10% | 5 | N/A | @thiagocsf | @sam.white |
AntiAbuse:AntiAbuse | floor % | empower every SWEs from raising reliability and security issues | 10% | 2 | N/A | @jayswain | @jstava |
Growth:Adoption | floor % | empower every SWEs from raising reliability and security issues | 10% | 3 | N/A | @jayswain | @jstava |
Growth:Conversion | floor % | empower every SWEs from raising reliability and security issues | 10% | 4 | N/A | @kniechajewicz | @s_awezec |
Growth:Expansion | floor % | empower every SWEs from raising reliability and security issues | 10% | 3 | N/A | @kniechajewicz | @gdoud |
Analytics:Product Intelligence | floor % | empower every SWEs from raising reliability and security issues | 10% | 6 | N/A | @alinamihaila | @amandarueda |
Fulfillment | floor % | address overdue security issues | 10% | 1 | N/A | @jeromezng | @justinfarris |
Enablement:Distribution | floor % | empower every SWEs from raising reliability and security issues | 10% | 9 | N/A | @mendeni | @dorrino |
Enablement: Geo | 3 months headcount reset to new staging environment | 3 months headcount reset to new staging environment | 13% | 7 | 3 months headcount reset to new staging environment | @nhxnguyen | @nhxnguyen |
Enablement:Database | floor% | empower every SWEs from raising reliability and security issues | 10% | 6 | N/A | @alexives | @iroussos |
Enablement:Sharding | floor % | empower every SWEs from raising reliability and security issues | 10% | 3 | N/A | @nhxnguyen | @fzimmer |
Quality:Ops QE | Improve Staging environment | Improving reliability & availability is 3rd priority in Prioritizing technical decisions | 10% | 1 | New staging epic | @vincywilson | TBD |
Quality:Enablement QE | Improve Staging environment | Improving reliability & availability is 3rd priority in Prioritizing technical decisions | 10% | 1 | New staging epic | @vincywilson | TBD |
Infrastructure:Delivery | Improve Staging environment | Improving reliability & availability is 3rd priority in Prioritizing technical decisions | 80% | 2 | New staging epic | @amyphillips | TBD |
Infrastructure:Reliability | Improve Staging environment | Improving reliability & availability is 3rd priority in Prioritizing technical decisions | 80% | 3 | New staging epic | @amyphillips | TBD |
Enablement:Geo | Improve Staging environment | Improving reliability & availability is 3rd priority in Prioritizing technical decisions | 10% | 1 | New staging epic | @nhxnguyen | TBD |
Each allocation has a direction page maintained by the Engineering Manager. The Engineering Manager will provide regular updates to the direction page. Steps to add a direction page are:
index.html.md
in the newly created directoryTo see an example for an Engineering Allocation Direction page, see Continuous Integration Scaling. Once the Engineering Allocation is complete, delete the direction page.
Groups allocating effort to an engineering allocation should update progress synchronously or asynchronously in the weekly, cross-functional infradev and engineering allocation meeting [agenda (internal)]. The intention of this meeting is to communicate progress on engineering allocations and to evaluate and prioritise escalations from infrastructure.
Engineering Allocation progress reports should appear in the following format:
One of the most frequent questions we get as part of this experiment is "How does a problem get put on the Engineering Allocation list?". The short answer is someone makes a suggestion and we add it. Much like everyone can contribute, we would like the feedback loop for improvement and long terms goals to be robust. So everyone should feel the empowerment to suggest an item at any time.
To help with getting items that on the list for consideration, we will be performing a survey periodically. The survey will consist of the following questions:
We will keep the list of questions short to solicit the most input. The survey will go out to members of the Development, Quality, Security. After we get the results, we will consider items for potential adding as an Engineering Allocation.
Once the item's success criteria are achieved, the Engineering Manager should consult with counterparts to review whether the improvements are sustainable. Where appropriate, we should consider adding monitoring and alerting to any areas of concern that will allow us to make proactive prioritizations in future should the need arise. The Engineering Manager should close all related epics/issues, reset the allocation in the above table to the floor level, and inform the Product Manager when the allocated capacity will be available to return their focus to product prioritizations.
When reseting a groups Engineering Allocation in the table above, the goal should be set as floor %
, the goal should be empower every SWEs from raising reliability and security issues
, percentage of headcount allocated should be 10%
, and N/A
in place of a link to the Epic.
All engineering allocation closures should be reviewed and approved by the VP of Development.
To support GitLab's long-term product health and stability, teams are asked to plan their milestones with an appropriate mix of type::feature
, type::maintenance
, and type:bug
work. Ratios may differ between teams as well as with the same team over time. Factors that influence what ratio is appropriate for a given team include the age of the team, the area of the product they are working in, and the evolving needs of GitLab the business and GitLab the product. If your team does not have enough historical data to know its ratios or you are unsure what an appropriate ratio might be, use a guideline of 60% feature, 30% maintenance, and 10% bugs.
For more details on these three work types, please see the section on work type classification.
Our backlog should be prioritized on an ongoing basis. Prioritization will be done via quad planning (collaboration between product, development, quality, UX) with a DRI to be responsible for the decisions based on each work type:
type::feature
issuestype::maintenance
issuestype::bug
issuesThe DRIs of these three core areas will work collaboratively to ensure the overall prioritization of the backlog is in alignment with section direction or any other necessary product and business needs. If a team is not assigned a Product Designer then there is no UX counterpart needed for prioritization purposes.
It is recommended that teams use a Cross-functional Prioritization Board like this example which provides columns for type::feature
, type::maintenance
, and type::bug
issues. Issues may be reordered by drag and drop.
Note: Each team is encouraged to create their own board as the example board above belongs to the Threat Insights team. Please do not modify this board unless you are a member of the Threat Insights team.
Drag and drop reordering is also supported in the issues list by sorting by Manual
(example). You may find this view to be more effective when focusing on a specific type, or when working against large backlog. When you adjust the order of issues in the Manual list view, it's automatically reflected in the board view, so the order is consistent between both views.
Notes:
UX
that aren't relevant to implementation issues.The Product Manager is responsible for planning each milestone. Product Managers are also responsible for ensuring that their team's target ratios are maintained over time.
It is recommended to use the your team's same Cross-functional Prioritization board for milestone planning.
Add the milestone
(example) to review the milestone plan. The board will show the number of issues and cumulative issue weights for type::feature
, type::maintenance
, and type::bug
issues.
The primary goals of this review exercise is for teams to:
undefined
MRs under 5% of all MRs merged for a given calendar month, where undefined
MRs refers to any MRs without a type::
labelThese reviews will use cross-functional dashboards embedded on each team's handbook page that serve as the SSOT when reviewing type::
labels of merged MRs.
The cadence and attendees for reviews varies at each level.
Note that the review collaboration can be done in a way that's most effective for the team, either synchronously (e.g. scheduled recurring call) or asynchronously (e.g. issues), as long as the previous reviews are well documented (with historical tracking).
Who participates?
Questions to answer
type
label, or leveraging the /copy_metdata
command on merged MRs.)Who participates?
Questions to answer
Who participates?
Questions to answer
We will enact a localized feature change lock (FCL) anytime there is an S1 or public-facing (status page) S2 incident on GitLab.com (including the License App, CustomersDot, and Versions) determined to be caused by a change from the development department. The team involved should be determined by the author, their line manager, and that manager's other direct reports.
Direct reports involved in an active borrow should be included if they were involved in the authorship or review of the change.
An FCL assignment and creation must be approved by either the VP of Infrastructure or VP of Development. The purpose is to foster a sense of ownership and accountability amongst our teams, but this should not challenge our no-blame culture.
Rough guidance on timeline is provided here to set expectations and urgency for an FCL. We want to balance moving urgently with doing thoughtful important work to improve reliaiblity. Note that as times shift we can adjust accordingly. The DRI of an FCL should pull in the timeline where possible.
The following bulleted list provides a suggested timeline starting from incident to completion of the FCL. "Business day x" in this case refers to the x business day after the incident.
The approver of the FCL will add the item to the daily reliability standup and will tag the development director so they can assign as appropriate.
During the FCL, the team(s) exclusive focus is around reliability work, and any feature type of work in-flight has to be paused or re-assigned. Maintainer duties can still be done during this period and should keep other teams moving forward. Explicitly higher priority work such as security and data loss prevention should continue as well. The team(s) must:
#fcl-incident-[number]
, with members
closing ceremony
upon completing the FCL to review the retrospectives and celebrate the learnings.
After the Incident Review is completed, the team(s) focus is on preventing similar problems from recurring and improving detection. This should include, but is not limited to:
Examples of this work include, but are not limited to:
Any work for the specific team kicked off during this period must be completed, even if it takes longer than the duration of the FCL. Any work directly related to the incident should be kicked off and completed even if the FCL is over. Work paused due to the FCL should be the priority to resume after the FCL is over. Items created for other teams or on a global level don't affect the end of the FCL.
The stable counterpart from Infrastructure will be available to review and consult on the work plan.
Team members are welcome to run Folding@home on their company provided computers. Folding@home is a distributed computing network that is searching for therapies for the COVID-19 respiratory illness among other diseases. We recommend running it at night if you have high daily compute workloads. Also keep your computer plugged in. We considered potential security and hardware implications in this issue.
If you would like to join a team with other GitLab team members, there is a GitLab Team Members
team for Folding@home. When setting up or changing your Folding@home identity, you can add team 245256
. This is not a competition, but simply to track how much our team members have contributed overall. You can view our statistics on our team page. You can discuss with other GitLab team members in the #folding-at-home slack channel.
Please reference our internal hiring repository for internal best practices and guidelines.
Whenever a team member departs from GitLab or they transfer to a different role, the below process should be followed to open a backfill. This process ensures alignment between the Department Heads, Finance business partner and Talent Acquisition. For departures, a backfill can only be opened once a departure or resignation is official where we've received written confirmation of the departure including the last working day and the People Business Partner has submitted the Offboarding Form to the People Ops team. For transfers, a backfill can only be opened once a transfer is official where an offer letter stating the transfer date has been completed.
As the Department Head is ready to request GHPIDs for budgeted new headcount, the below process should be followed to open a new headcount.
If two Department Heads agree to move a headcount between their teams, the following process should be followed.
The VP of Engineering and their direct reports track our highest priorities in the Engineering Management Issue Board, rather than to do lists, Google Doc action items, or other places. The reasons for this are:
Here are the mechanics of making this work:
Engineering Management
label to get it on the board, and the department label to get it in progress (e.g. Development Department
)CEO Interest
label, please post it to #ceoThe Quality Department is the DRI for Engineering Performance Indicators. Work regarding KPI / RPI is tracked on the engineering metrics board and task process.
In GitLab Engineering we are serious about concepts like servant leadership, over-communication, and furthering our company value of transparency. You may have joined GitLab from another organization that did not share the same values or techniques. Perhaps you're accustomed to more corporate politics? You may need to go through a period of "unlearning" to be able to take advantage of our results-focused, people-friendly environment. It takes time to develop trust in a new culture.
Less common, but even more important, is to make certain you don't unintentionally bring any mal-adaptive behaviors to GitLab from these other environments.
We encourage you to read the engineering section of the handbook as part of your onboarding, ask questions of your peers and managers, and reflect on how you can help us better live our culture:
Because GitLab has team members across the globe in many different time zones that may or may not have time changes, it is most efficient to include UTC for communicating dates and times, in addition to PST which is the company-wide expectation.
Using UTC may require getting used to, so it may be easiest to add UTC to your Google calendar. You can enable this by doing the following:
Additionally, you can add multiple local times from other team members' time zone to your calendar sidebar. You can enable this by doing the following:
You'll now see something like this in your calendar:
You can also use sites like TimeAndDate to convert times to/from UTC.
We manually verify that our code works as expected. Automated test coverage is essential, but manual verification provides a higher level of confidence that features behave as intended and bugs are fixed.
We manually verify issues when they are in the workflow::verification
state.
Generally, afer you have manually verified something, you can close the associated issue.
See the Product Development Flow to learn more about this issue state.
We manually verify in the staging environment whenever possible. In certain cases we may need to manually verify in the production environment.
If you need to test features that are built for GitLab Ultimate then you can get added to the issue-reproduce group on production and staging environments by asking in the #development Slack channel. These groups are are on an Ultimate plan.
Before the beginning of each fiscal year, and at various check points throughout the year, we plan the size and shape of the Engineering and Product Management functions together to maintain symmetry.
The process should take place in a single artifact (usually a spreadsheet, current spreadsheet), and follow these steps:
Note: Support is part of the engineering function but is budgeted as 'cost of sales' instead of research and development. Headcount planning is done separately according to a different model.
The non support related departments within Engineering (Development, Infrastructure, Quality, Security, and UX) have an expense target of 20% as a percentage of revenue.
The Support target is 10% as a percentage of revenue.
The PlatoHQ Program has a total of 10 Engineering Managers/Senior IC's participating. The program exists of both self-learning via an online portal and 1-1 sessions with a mentor.
The 7CTOs Program is run with 4 Senior leaders in Engineering. The program exists of peer mentoring sessions (forums) and effective network building.
The CTO is an executive sponsor for selected customers.
A shadow program is available to everyone in engineering (especially senior leaders) in order to have an opportunity to observe and participate in one of the executive sponsor meetings. Doing so can be a great way to hear directly from customers about what they like about GitLab and about what we can improve. (This program is similar in some ways to the CEO Shadow Program.
If you choose to be a shadow, your responsibilities will be:
To request to be a shadow: Post a message in the #cto Slack channel, indicate your timezone, and CC the CTO's EBA Kristie Thomas.
There is a program to find a mentor or to become a mentor at GitLab described on this handbook page.
You can find more information on this experimental program in this handbook page.
In most cases, a single engineer and maintainer review are adequate to handle a priority::1/severity::1 issue. However, some issues are highly difficult or complicated. Engineers should treat these issues with a high sense of urgency. For a complicated priority::1/severity::1 issue, multiple engineers should be assigned based on the level of complexity. The issue description should include the team member and their responsibilities.
Team Member | Responsibility |
---|---|
Team Member 1 |
Reproduce the Problem |
Team Member 2 |
Audit Code Base for other places where this may occur |
If we have cases where three or five or X people are needed, Engineering Managers should feel the freedom to execute on a plan quickly.
Following this procedure will:
Engineering is the primary advocate for the performance, availability, and security of the GitLab project. Product Management prioritizes all initiatives, so everyone in the engineering function should participate in the Product Management prioritization process to ensure that our project stays ahead in these areas. The following list should provide some guidelines around the initiatives that each engineering team should advocate for during their release planning:
Support Team Contributions
label. You can filter on open MRs.In order to maintain our focus on customer results we conduct at least one recorded interview per quarter with a GitLab user. The interview should help team members who do not frequently interact with users and customers understand how their work positively helps others. The interview may also help team members understand problems faced by users that they may not face themselves.
There are two type of interviews:
The Chief of Staff to the CTO will identify, record, and share one DITLO interview over the course of their quarter of service.
While not all videos can be public, we should aim to find users who are able to share their experiences in public.
The DITLO videos cannot all be public, so the playlist is GitLab-internal.
Previous videos:
This is a starter agenda. The specifics will depend on the interviewee.
As this is a 1:1 interview, when recording, prefer Zoom's Speaker Mode over Gallery Mode.
GitLab makes use of a 'Canary' stage. Production Canary is a series of servers running GitLab code in a production environment. The Canary stage contains code functional elements like web, container registry and git servers while sharing data elements such as sidekiq, database, and file storage with production. This allows UX code and most application logic code to be consumed by a smaller subset of users under real world scenarios before being made available to all users on GitLab.com.
Information on canary testing has been moved to dedicated page covering the canary stage and how to use it
There are primarily two Slack channels which developers may be called upon to assist the production team when something appears to be amiss with GitLab.com:
#backend
: For backend-related issues (e.g. error 500s, high database load, etc.)#frontend
: For frontend-related issues (e.g. JavaScript errors, buttons not working, etc.)Treat questions or requests from production team for immediate urgency with high priority.
There are some engineering handbook topics that we cannot be publicly transparent about. These topics can be viewed by GitLab team members in the engineering section of the private handbook.
If you experience a page not found (404) error when attempting to access the internal handbook, you may need to register to use it via first browsing to the internal handbook authorization page.