This fiscal year we'd like to optimize various areas of the Engineering division, as well as build new capabilities demanded by where we are as a company in an extremely fast moving industry. The purpose of stating this transparently in our handbook is to establish a year-long arc of meaning to which we can all ascribe our daily work. Larger efforts, such as quarterly OKRs, should represent substantial progress along this path. We can be flexible throughout the year on this direction, and everyone can contribute so we welcome merge requests.
Firstly, we must recognize that hiring is going to pick up this year compared to FY21. Though we won't grow 100+% as we did in FY19-20, we need to remember the lessons we learned together and resurrect some processes to be successful. It's the responsibility of hiring managers to make timely, high-quality hires – talent acquisition is a service to them, not the other way around. Hiring managers should include their reports in hiring future team members, including source-a-thons. And as much pressure as we put on hitting target dates, it's always quality-first.
While we completed a tremendous amount of dogfooding projects as part of OKRs, we still need to strengthen our culture of dogfooding. It's our ambition to utilize our entire application. And we're willing to spend significantly more energy to build needed functionality into the GitLab application than it takes to script it outside. And we need to document these decisions in issues according to this process.
We need to live our Diversity, Inclusion & Belonging core value, and we need to rise to the cultural moment. As individuals you can take the new GitLab-specific training created by our DIB team. You can also volunteer to coach members of under-represented groups looking to get their first job in the technology industry, which may lead them into our Engineering Internship Program and eventually full-time employment with us.
We want to engage with the GitLab community and be a great steward of our open source project. We can continue to propose we move paid features into the free version. We should continue to make it easy to contribute to our project. Individuals can become Merge Request Coaches. And we are also starting to track and drive large enterprise customer contributions in the form of MRARR.
Everyone should also be aware of Product Management's top investment themes for FY22:
We also want to act on the information our team gave us in the last CultureAmp Engagement Survey so we can improve our working lives:
Below are some Department-specific highlights:
See more in Development's FY22 Direction.
Read more about this new department in their handbook section.
See more in Infrastructure's FY22 Direction.
See more in Quality's FY22 Direction.
See more in Security's FY22 Direction.
See more in Support's FY22 Direction.
See more in UX's FY22 Direction.
GitLab Engineering values clear, concise, transparent, asynchronous, and frequent communication. Here are our most important modes of communication:
As part of a fully-distributed organization such as GitLab, it is important to stay informed about engineering-led initiatives. We employ multimodal communication, which describes the minimum set of communication channels we'll broadcast to.
For the Engineering department, any important initiative will be announced in:
If you frequently check any of these channels, you can consider yourself informed. It is up to the person sharing to ensure that the same message is shared across all channels. Ideally, this message should be a one sentence summary with a link to an issue to allow for a single source of truth for any feedback.
There are seven departments within the Engineering Division:
Please see the Product Management section that governs how they prioritize work, and also should guide our technical decision making.
|3*||Availability, Infradev, Incident Corrective Actions, Sharding Blockers||
|4||Fixing regressions (things that worked before)||
|6||Committed Priority to Customers||
|7||Instrumentation improvements, particularly for xMAU||
|8||Usability Improvements and User Experience to drive xMAU||
|10||Identified for Dogfooding||
|11||Velocity of new features, technical debt, community contributions, and all other improvements||
|12||Behaviors that yield higher predictability (because this inevitably slows us down)||
*indicates forced prioritization items with SLAs/SLOs
Engineering is the DRI for mid/long term team efficiency, performance, security (incident response and anti-abuse capabilities), availability, and scalability. The expertise to proactively identify and iterate on these is squarely in the Engineering team. Whereas Product can support in performance issues as identified from customers. In some ways these efforts can be viewed as risk-mitigation or revenue protection. They also have the characteristic of being larger than one group at the stage level. Development would like to conduct an experiment to focus on initiatives that should help the organization scale appropriately in the long term. We are treating these as a percent investment of time associated with a stage or category. The percent of investment time can be viewed as a prioritization budget outside normal Product/Development assignments.
Engineering Allocation is also used in short-term situations in conjunction and in support of maintaining acceptable Error Budgets for GitLab.com and our GitLab-hosted first theme.
Unless it is listed in this table, Engineering Allocation for a stage/group is 0% and we are following normal prioritization. Refer to this page for Engineering Allocation charting efforts. Some stage/groups may be allocated at a high percentage or 100%, typically indicating a situation where all available effort is to be focused on Reliability related (top 5 priorities from prioritization table) work.
Mid/long term initiatives are engineering-led. The EM is responsible for recognizing the problem, creating a satisfactory goal with clear success criteria, developing a plan, executing on a plan and reporting status. It is recommended that the EM collaborate with PMs in all phases of this effort as we want PMs to feel ownership for these challenges. This could include considering adding more/less allocation, setting the goals to be more aspirational, reviewing metrics/results, etc. We welcome strong partnerships in this area because we are one team even when allocations are needed for long-range activities.
During periods of Engineering Allocation, the PM remains the interface between the group and the fields teams & customers. This is important because:
|Group/Stage||Description of Goal||Justification||Maximum % of headcount budget||People||Supporting information||EMs / DRI||PMs|
|Manage:Access (BE)||Infradev burn-down||Improve .com reliability and resolve infradev issues||17%||1||List of issues||@lmcandrew||@ogolowinski|
|Manage:Access (BE)||Dev Security burn-down||More security issues exist in the backlog and are incoming than the team is able to prioritize||50%||3||Overview||@dennis||@ogolowinski|
|Manage:Access (BE)||Scalability of AuthN/AuthZ functionality (Workspace)||Reduce duplication of code and increase performance for Groups/Projects||17%||1||Consolidate Groups and Projects||@m_gill||@ogolowinski|
|Manage:Access (BE)||Linear Namespace Queries||Replace recursive CTE queries, which are complex and unpredictable||17%||1||Linear Namespace Queries||@mksionek||@ogolowinski|
|Manage:Workspace (BE)||floor %||empower every SWEs from raising reliability and security issues||10%||4||N/A||@mksionek||@ogolowinski|
|Manage:Compliance (BE)||floor %||empower every SWEs from raising reliability and security issues||10%||1||N/A||@djensen||@stkerr|
|Manage:Import (BE)||Improving Error Budget, Infra-Dev Issues, and Security||Backlog of security issues and production incidents tied to infradev||100%||2||Infradev & Security Issues||@lmcandrew||@hdelalic|
|Manage:Optimize (BE)||Dev Security burn-down||Burn down Dev security backlogs down||67%||3||Overview||@djensen||@ljlane|
|Plan:Project management||3 month headcount reset to help Manage:Access||3 month headcount reset to help Manage:Access||50%||4||3 month headcount reset to help Manage:Access||@jlear||@gweaver|
|Plan:Product Planning||3 month headcount reset to help Manage:Access||3 month headcount reset to help Manage:Access||40%||5||3 month headcount reset to help Manage:Access||@johnhope||@cdybenko|
|Create:Source Code (BE)||Improving Error Budget and working Infra-Dev Issues||Improving Reliability of Source Code Features (infradev, improve error budgets, P1/S1 issues, security issues)||100%||4||Epic||@sean_carroll||@sarahwaldner|
|Create:Code Review (BE)||Dev Security burn-down||Burn down Dev security backlogs down||100%||5||Overview||@mnohr||@phikai|
|Create:Editor (BE)||Infradev, Linear Queries, Dev Security burn-down||Close all infradev issues, work through linear queries epic, Burn down Dev security backlogs down||100%||2||Overview||@oregand||@ericschurter|
|Create:Gitaly||Infra-Dev Issues, P1/S1 issues, security issues and Customer Escalations (engineering approved)||Improve reliability of Gitaly||20%||6||Infradev Issues||@timzallmann||@mjwood|
|Ecosystems:Integrations||Dev Security burn-down||Burn down Dev security backlogs down||100%||4||Overview||@arturoherrero||@mushakov|
|Verify:Pipeline Execution||Improving Error Budget and working Infra-Dev Issues||Large backlog of Infra-Dev issues||100%||5||CI Category Directionemail@example.com||@jreporter|
|Verify: Pipeline Authoring||floor %||empower every SWEs from raising reliability and security issues||10%||3||N/A||Mark Nuzzo||@dhershkovitch|
|Verify:Runner||Improve Runner deployment to GitLab.com Shared Runners. Target reduction in time to deploy of 80%.||Increase operational responsiveness and team efficiency/standardization||30%||6||CD for Runner||@erushton||@DarrenEastman|
|Verify:Testing||Improving Error Budget and working Infra-Dev Issues||Allocation to Verify:Pipeline Execution Infradev Issues||50%||2||Testing Investigation Issue||@shampton||@jheimbuck_gl|
|Package:Package||floor %||empower every SWEs from raising reliability and security issues||10%||5||N/A||@dcroft||@trizzi|
|Release:Release||3 milestones manage:Import Headcount Reset||Unlocks a new CEO initiative||17%||1||https://gitlab.com/gitlab-com/Product/-/issues/3062||@nicolewilliams||@cbalane|
|Configure:Configure||3 milestones manage:Import Headcount Reset||Unlocks a new CEO initiative||20%||1||https://gitlab.com/gitlab-com/Product/-/issues/3062||@nicholasklick||@nagyv-gitlab|
|Configure:Configure||Headcount reset to help Ops:Pipeline Execution for milestones 14.3 and 14.4||Assist with Infradev Issues||20%||1||Headcount Reset: Infradev Burndown in Verify and Reliability||@nicholasklick||@nagyv-gitlab|
|Monitor:Monitor||Headcount reset to help Ops:Pipeline Execution for milestones 14.3 and 14.4||Assist with Infradev Issues||40%||2||Headcount Reset: Infradev Burndown in Verify and Reliability||@crystalpoole||@abellucci|
|Verify||Scale GitLab.com to 20M builds a day||Give us 2 years of runway||10%||1||CI Scaling Target||@grzesiek||@jreporter|
|Secure:Static Analysis||floor %||empower every SWEs from raising reliability and security issues||10%||5||N/A||@twoodham||@tmccaslin|
|Secure:Dynamic Analysis||floor %||empower every SWEs from raising reliability and security issues||10%||5||N/A||@sethgitlab||@derekferguson|
|Secure:Composition Analysis||Proposed 3 month headcount reset to help manage:Import||Proposed 3 month headcount reset to help manage:Import||25%||4||Proposed 3 month headcount reset to help manage:Import||@gonzoyumo||@NicoleSchwartz|
|Secure:Threat Insights||floor %||empower every SWEs from raising reliability and security issues||10%||4||N/A||@thiagocsf||@matt_wilson|
|Protect:Container Security||3 months headcount reset to new staging environment||3 months headcount reset to new staging environment||25%||4||3 months headcount reset to new staging environment||@firstname.lastname@example.org|
|Growth:Activation||floor %||empower every SWEs from raising reliability and security issues||10%||1||N/A||@pcalder||@jstava|
|Growth:Conversion||floor %||empower every SWEs from raising reliability and security issues||10%||2||N/A||@pcalder||@s_awezec|
|Growth:Expansion||3 month headcount reset to help manage: Import||3 month headcount reset to help manage: Import||50%||2||3 month headcount reset to help manage: Import||@pcalder||@gdoud|
|Growth:Adoption||floor %||empower every SWEs from raising reliability and security issues||10%||2||N/A||@pcalder||@mkarampalas|
|Growth:Product Intelligence||floor %||empower every SWEs from raising reliability and security issues||10%||6||N/A||@nicolasdular||@amandarueda|
|Fulfillment||Improve availability of CustomersDot by migrating from Azure to GCP||Improve availability of CustomersDot due to several recent outages||20% (4 Fulfillment Engineers + 0.67 Infrastructure Engineers)||20||Epic||@jeromezng||@justinfarris|
|Fulfillment||Help Trust and Safety mitigate crypto-abuse by storing non-identifying credit card meta data||Provide Trust and Safety team with the ability to identify which abuse accounts were opened using the same credit card||5%||20||Issue||@jeromezng||@justinfarris|
|Enablement:Distribution||floor %||empower every SWEs from raising reliability and security issues||10%||9||N/A||@mendeni||@dorrino|
|Enablement: Geo||3 months headcount reset to new staging environment||3 months headcount reset to new staging environment||13%||7||3 months headcount reset to new staging environment||@nhxnguyen||@nhxnguyen|
|Enablement:Database||Primary Key overflow, Retention Strategy, Schema Validation, migration improvements, testing||Database has been under heavy operational load and needs improvement||100%||5||Automated Migration testing, Automated migrations for primary key conversions, Remove PK overflow, Schema Validation, Reduce Total size of DB||@craig-gomes||TBD|
|Enablement:Sharding||floor %||empower every SWEs from raising reliability and security issues||10%||4||N/A||@craig-gomes||@fzimmer|
|Enablement:Memory||Scalability-2 Swarm on Redis||Redis is one of our top scaling bottlenecks||25%||1||Move Rack::Attack store to its own Redis instance||@changzhengliu||@iroussos|
|Enablement:Memory||Improve Redis Scalability||Redis is one of our top scaling bottlenecks||75%||3||Functionally partition Redis||@changzhengliu||@iroussos|
|Enablement:Global Search||Enhance security in GitLab application||Security is the top priority in Prioritizing technical decisions||10%||1||Security related issue||@changzhengliu||@JohnMcGuire|
Each allocation has a direction page maintained by the Engineering Manager. The Engineering Manager will provide regular updates to the direction page. Steps to add a direction page are:
index.html.mdin the newly created directory
To see an example for an Engineering Allocation Direction page, see Continuous Integration Scaling. Once the Engineering Allocation is complete, delete the direction page.
One of the most frequent questions we get as part of this experiment is "How does a problem get put on the Engineering Allocation list?". The short answer is someone makes a suggestion and we add it. Much like everyone can contribute, we would like the feedback loop for improvement and long terms goals to be robust. So everyone should feel the empowerment to suggest an item at any time.
To help with getting items that on the list for consideration, we will be performing a survey periodically. The survey will consist of the following questions:
We will keep the list of questions short to solicit the most input. The survey will go out to members of the Development, Quality, Security. After we get the results, we will consider items for potential adding as an Engineering Allocation.
Once the item's success criteria are achieved, the Engineering Manager should consult with counterparts to review whether the improvements are sustainable. Where appropriate, we should consider adding monitoring and alerting to any areas of concern that will allow us to make proactive prioritizations in future should the need arise. The Engineering Manager should close all related epics/issues, remove the allocation from the above table, and inform the Product Manager when the allocated capacity will be available to return their focus to product prioritizations.
All engineering allocation removals should be reviewed and approved by the VP of Development.
We will enact a localized feature change lock (FCL) anytime there is an S1 or public-facing (tweeted) S2 incident on GitLab.com (including the License App, CustomersDot, and Versions) determined to be caused by a change from the development department. The team involved should be determined by the author, their line manager, and that manager's other direct reports. Attribution would fall to the VP of Infrastructure (if not obvious). The intent is to create a sense of ownership and accountability amongst our teams, but this should not challenge our no-blame culture.
The FCL will last 5 business days. During this time, the team(s) exclusive focus is around reliability work, and any other type of work in-flight has to be paused. The team(s) must:
#fcl-incident-[number], with members
After the RCA is completed, the team(s) focus is on:
Examples of this work include, but are not limited to:
Any work for the specific team kicked off during this period must be completed, even if it takes longer than the duration of the FCL. Any work directly related to the incident should be kicked off and completed even if the FCL is over. Work paused due to the FCL should be the priority to resume after the FCL is over. Items created for other teams or on a global level don't affect the end of the FCL.
The stable counterpart from Infrastructure will be available to review and consult on the work plan.
This is in effect as of September 2, 2021 (retroactive) for six months. R&D leadership will evaluate Mar 1, 2022 and decide whether to continue, modify, or cancel the process (already scheduled).
Despite the high priority of velocity to our project and our company, there is one set of things we must prioritize over it: GitLab availability & security. Neither we, nor our customers, can run an Enterprise-grade service if we are willing to risk users' productivity and data.
Our hundreds of Engineers collectively make thousands of independent decisions each day that can impact GitLab.com and our users and customers there. They all need to keep availability and security in mind as we endeavor to be the most productive engineering organization in the world. We can only move as fast as GitLab.com is available and secured. Availability of self-managed GitLab instances is also extremely important to our success, and this needs to happen in partnership with our customers' admins (whereas we are the admins for GitLab.com).
For security, we prioritize it more highly by having strict SLAs around priorities labels with security issues. This shows a security first mindset as these issues take precedence in a given timeframe.
Availability/Reliability, Quality, Security, and Performance are the pillars for building reliable software. Reliability is our contract with our customers that say you can count on us to deliver an available and dependable product. Everyone in the organization has a role to play.
Engineers, Product Managers, and Designers have the most direct influence over the reliability of the code through either planning, implementation, monitoring (e.g. Kibana, Sentry, Grafana and other Gitlab.com monitoring tools), or prioritization of the work. Product and Engineering management monitors (e.g. Error Budgets) and measures the reliability of features and makes recommendations if necessary. Our focus on learning and development will also ensure that teams have the tools and training required to build reliable software. The Infrastructure, Application Security, Database and Quality teams are the Subject Matter Experts supporting product development teams.
Our velocity should be incremental in nature. It's derived from our MVC-based approach, which encourages "delivering the smallest possible solution that offers value to our users". This could be a small new feature, but also includes code improvements, bug fixes, etc.
To measure this, we count and define the target here: Development Department Narrow MR Rate which is a goal for managers and not ICs. Historically, we have seen this as high as 11.5 Development Department Narrow MR Rate.
For example, an MR rate of 11 translates to roughly one MR every 1½ business days with time for overhead. To attain this, Product Development Engineers are encouraged to:
We optimize for shipping a high volume of user/customer value with each release. We do want to ship multiple major features in every monthly release of GitLab. However, we do not strive for predictability over velocity. As such, we eschew heavyweight processes like detailed story point estimation by the whole team in favor of lightweight measurements of throughput like the number of merge requests that were included or rough estimates by single team members.
There is variance in how much time an issue will take versus what you estimated. This variance causes unpredictability. If you want close to 100% predictability you have to take two measures:
Both measures reduce the overall velocity of shipping features. The way to prevent this is to accept that we don't want perfect predictability. Just like with our OKRs, which are so ambitious that we expect to reach about 70% of the goal, this is also fine for shipping planned features.
Note: This does not mean we place zero value on predictability. We just optimize for velocity first.
All team members are expected to follow documented processes. We develop and document processes (for example: Feature flag usage, Code Review Guidelines) through constant iteration and refinement. We find opportunities for improvement through analyzing metrics to identify trends, hosting retrospectives (e.g. Group Retrospectives, Iteration Retrospectives), performing Root Cause Analyses, and receiving feedback from team members. Team members are encouraged to identify opportunities to improve our processes and propose solutions, examples of this could be an MR or and issue describing these opportunities.
Following established processes ensures that we learn from our mistakes and efficiently deliver high-quality, highly performant, and secure software. We prefer to fail fast and learn quickly. Team members who are not software developers benefit from working more efficiently to deliver their results as well. Regardless of your discipline, processes are the guard rails that ensure we produce desirable and predictable results.
Everyone can contribute by proposing new processes and improving upon existing processes.
When changing an outdated part of our code (e.g. HAML views, jQuery modules), use discretion on whether to refactor or not. For long term maintainability, we are very interested in migrating old code to the consistent and preferred approach (e.g. Vue, GraphQL), but we're also interested in continuously shipping features that our users will love.
Aim to implement new modules or features with the preferred approach, but changing preexisting non-conforming parts is a gray area.
If the weight of refactoring and other constraints (such as time) risk threatening the availability of a feature, then strongly consider refactoring at another time. On the other hand, if the code in question has hurt availability or poses a threat to it, then strongly consider prioritizing refactoring. This is a balancing act and if you're not sure where your change should go (or whether you should do some refactoring before hand), reach out to another Engineer or Maintainer.
If it makes sense to refactor before implementing a new feature or a change, then please:
If it is decided not to refactor at this moment, then please:
Team members are welcome to run Folding@home on their company provided computers. Folding@home is a distributed computing network that is searching for therapies for the COVID-19 respiratory illness among other diseases. We recommend running it at night if you have high daily compute workloads. Also keep your computer plugged in. We considered potential security and hardware implications in this issue.
If you would like to join a team with other GitLab team members, there is a
GitLab Team Members team for Folding@home. When setting up or changing your Folding@home identity, you can add team
245256. This is not a competition, but simply to track how much our team members have contributed overall. You can view our statistics on our team page. You can discuss with other GitLab team members in the #folding-at-home slack channel.
Hiring for GitLab Engineering has picked up again in FY22 after slower growth in FY21. We can use the expertise and bandwidth we've built in past years to raise our bar even higher and to make timely, high-quality hires. We rely primarily on the judgment of our hiring managers to do this while including direct reports in hiring future team members. But we also try to systematize as much as possible so our hiring practices are fair, transparent, and repeatable.
We do not run a single-veto hiring process because this impedes our ability to uplevel our teams. High-performers are more likely to have been the product of a controversial hiring process because they challenge the status quo. But that does not mean every controversial hiring process yields a high performer. An important part of a hiring manager's performance is making these determinations.
Whenever a team member departs from GitLab or they transfer to a different role, the below process should be followed to open a backfill. This process ensures alignment between the Department Heads, Finance business partner and Talent Acquisition. For departures, a backfill can only be opened once a departure or resignation is official where we've received written confirmation of the departure including the last working day and the People Business Partner has submitted the Offboarding Form to the People Ops team. For transfers, a backfill can only be opened once a transfer is official where an offer letter stating the transfer date has been completed.
backfills-r-and-dprivate Slack channel.
The VP of Engineering and their direct reports track our highest priorities in the Engineering Management Issue Board, rather than to do lists, Google Doc action items, or other places. The reasons for this are:
Here are the mechanics of making this work:
Engineering Managementlabel to get it on the board, and the department label to get it in progress (e.g.
CEO Interestlabel, please post it to #ceo
Here is the standard, company-wide process for OKRs. Engineering has some small deviations from (and extensions to) this process.
Beginning in FY22-Q2 (2021-05-01 through 2021-2021-07-31), the Product and Engineering Divisions are using a third-party vendor, Ally.io, for OKRs.
Ally has provided a feature that allows you to embed OKR views into our Handbook. This is done via a sharing option that produces an embeddable iFrame link. This option is available for any "OKR View." For example, you might embed OKRs from the "Active period" view (current quarter) or the "Previous period" (last quarter), or you might embed granular views of Sub-Team OKRs. The embedded OKRs use a dynamic link that automatically updates each quarter without additional effort.
<iframe src="INSERT_URL_HERE" class="dashboard-embed" height="1500" width="100%" style="border:none;"> </iframe>
1500is optimal for Engineering's structure (3 OKRs x 3 KRs).
We are actively tracking the following important feature requests to improve our workflow efficiency.
We will use the following guidelines to a clear standard and consistency.
“CultureAmp Survey Action Item”,
“ReverseAMA Action Item”
This process should begin no later than two weeks before the end of the preceding quarter. And kickoff should happen on or before the first day of the new quarter.
* Raise first reply-time SLA for premium from 92% to 95%
This process should begin on the first day of the subsequent quarter, and complete no later than two weeks after.
The Chief Technology Officer and the leaders of each sub-department meet synchonously 2 weeks after each quarter ends to discuss the OKRs from the previous quarter. This is an opportunity to collaborate on cross-functional initiaties with the focus being the retrospective. Leaders will voice-over the good, bad and try items from the past quarter. The meeting will not cover the status and scores of the OKRs.
Occasionally, it may be useful to set up a demo on a regular cadence to ensure cross-functional iterative alignment. This is helpful for high-impact deliverables that require integration across multiple functional teams. This is in-line with the seventh principle of the Agile Manifesto: "Working Software is the best measure of process".
This process is required to be used by Single-Engineer Groups to maintain transparency and minimal alignment with the rest of GitLab.
For multi-person groups or critical projects, we use a heavier weight grading process:
The demo master grades each step during the demo meeting. To make it less subjective, we use a scale that is widely understood and communicated. Our scoring definitions are as follows:
|5||Signed off from all Stakeholders|
|4||Demo'd according to the "definition of done"|
|3||Demo'd but incomplete and/or has bugs|
|2||Demo'd in some rudimentary state, work has started|
|1||Not yet started or cannot be demo'd|
In GitLab Engineering we are serious about concepts like servant leadership, over-communication, and furthering our company value of transparency. You may have joined GitLab from another organization that did not share the same values or techniques. Perhaps you're accustomed to more corporate politics? You may need to go through a period of "unlearning" to be able to take advantage of our results-focused, people-friendly environment. It takes time to develop trust in a new culture.
Less common, but even more important, is to make certain you don't unintentionally bring any mal-adaptive behaviors to GitLab from these other environments.
We encourage you to read the engineering section of the handbook as part of your onboarding, ask questions of your peers and managers, and reflect on how you can help us better live our culture:
We always push ourselves to be iterative and make the minimal viable change. The image below provides an example of how we should iterate:
Image Credit: Henrik Kniberg from Crisp
When iterating, our goal is to build something quick and functional. In the example above, we should aim to build something that can transport us from A to B even though it may not have all of the nice-to-have features like an engine, seats, or air conditioning.
The example's upper sequence shows how we should not iterate. The problem with going from a wheel to a wheel base to a frame to a car is that the first few iterations are not functional.
The example's lower sequence shows how we should iterate. Building a skateboard is low complexity and can be assembled in a day while building a car is high complexity and takes thousands of parts and a much longer assembly time. A skateboard is not as fast as a car yet it can still transport a person. Each subsequent iteration provides the person with more speed, more control, and a better aesthetic.
One common misconception of iteration is that there is no waste. Using the example above, the parts of a skateboard can be reused in a scooter, however, they likely cannot be reused in a car. Iteration often requires us to throw away product or code to make way for a better product.
We dogfood everything. Based on our product principles, it is the Engineering division's responsibility to dogfood features or do the required discovery work to provide feedback to Product. It is Product's responsibility to prioritize improvements or rebuild functionality in GitLab.
An easy antipattern to fall into is to resolve your problem outside of what the product offers. Dogfooding is not:
Follow the dogfooding process described in the Product Handbook when considering building a tool outside of GitLab.
GitLab consists of many subprojects. A curated list of GitLab projects can be found at the GitLab Engineering projects page.
When creating a new project, please follow these steps:
gitlab-org/NEW_PROJECT). Doing so creates context and permission inheritance complications. Ensure that the project is under a subgroup of:
mainas the name of the default branch.
gitlab-org/gitlabMIT License, but contact legal before using it.
CONTRIBUTING.mdin the repository. It is easiest to simply copy-paste the
gitlab-org/gitalyDCO + License section verbatim.1. Add any further relevant details to the Contribution Guide. See Contribution Example.
CONTRIBUTING.mdfrom the project's
Users can request accesssetting disabled to discourage granting accidental external access.
When changing the settings in an existing repository, it's important to keep communication in mind. In addition to discussing the change in an issue and announcing it in relevant chat channels (e.g.,
#development), consider announcing the change during the Company Call. This is particularly important for changes to the GitLab repository.
Following is the default
.gitlab-ci.yml config that all projects under the
gitlab-com groups should use:
include: - template: 'Workflows/MergeRequest-Pipelines.gitlab-ci.yml' # Or if the project needs to support stable/security branches, use the following instead workflow: rules: # For merge requests, create a pipeline. - if: '$CI_MERGE_REQUEST_IID' # For `master` branch, create a pipeline (this includes on schedules, pushes, merges, etc.). - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH' # For tags, create a pipeline. - if: '$CI_COMMIT_TAG' # For stable, and security branches, create a pipeline. - if: '$CI_COMMIT_BRANCH =~ /^[\d-]+-stable(-ee)?$/' - if: '$CI_COMMIT_BRANCH =~ /^security\//' default: tags: - gitlab-org
workflowto create pipelines for MR,
master, and tags only.
gitlab-orgtag to be used by default which corresponds to cost-optimised runners, with no Docker support. Jobs that need Docker support would use the
If a job requires the usage of Docker, it needs to be defined only in the context of the specific job with the
sast: tags: - gitlab-org-docker
If a job requires the usage of Windows (not yet supported), it needs to be defined only in the context of the specific
job with the
windows_job: tags: - gitlab-org-windows
When publishing a project to a package repository, please follow these steps:
GitLab consists of many different types of applications and resources.
When you require escalated permissions or privileges to a resource to conduct task(s), or support for creating resource(s) with specific endpoints, please submit an issue to the Access Requests Issue Tracker using the template provided.
Below is a short list of supported technologies:
Before the beginning of each fiscal year, and at various check points throughout the year, we plan the size and shape of the Engineering and Product Management functions together to maintain symmetry.
The process should take place in a single artifact (usually a spreadsheet, current spreadsheet), and follow these steps:
Note: Support is part of the engineering function but is budgeted as 'cost of sales' instead of research and development. Headcount planning is done separately according to a different model.
The non support related departments within Engineering (Development, Infrastructure, Quality, Security, and UX) have an expense target of 20% as a percentage of revenue.
The Support target is 10% as a percentage of revenue.
A dedicated team needs certain skills and a minimum size to be successful. But that doesn't block us from taking on new work. This is how we iterate our team size and structure as a feature set grows:
Generally engineering teams at GitLab are functional, they are made up of Frontend, Backend, or Fullstack individual contributors and an engineering manager with the same functional background. This is intended to provide scalable hiring capabilities, technical credibility, and career development support for all team members. When hiring at scale, these functionally focussed teams are better able to hire and onboard people as well as supply them with ongoing, clear support. An alternative team construction that could be considered for some circumstances is the Fullstack Team.
Circumstances would include cases where a team:
The goal should still be to move to a functional construction.
A Fullstack Team has Frontend, Backend, and/or Fullstack engineers lead by a Fullstack Engineering Manager. An example of this might be where a product category group has a Frontend and a Backend team, if either of those teams is significantly smaller than the other and, the engineering manager has experience working in both Frontend and Backend, a fullstack team could be considered as a measure of efficiency. It's not the intention of this type of team to remove productive team members for the sake of efficiency, in the scenario above, if there were two managers, care would need to be taken to find the other manager a role within GitLab team.
Examples of Fullstack Teams:
## Vision ... ## Mission ... ## Team Members The following people are permanent members of the [Blank] Team: <table> <thead> <tr> <th>Person</th> <th>Role</th> </tr> </thead> <tbody> </tbody> </table> ## Stable Counterparts The following members of other functional teams are our stable counterparts: <table> <thead> <tr> <th>Person</th> <th>Role</th> </tr> </thead> <tbody> </tbody> </table> ## Hiring Check out our [jobs page](/jobs/) for current openings. ## Common Links * Issue Tracker * Slack Channel * ... ## How to work with us ...
New teams may benefit from holding a Fast Boot event to help the jump start the team. During a Fast Boot, the entire team gets together in a physical location to bond and work alongside each other.
The pilot for PlatoHQ has a total of 10 Engineering Managers/Senior IC's participating. The program exists of both self-learning via an online portal and 1-1 sessions with a mentor. The goals for the pilot are:
The pilot with 7CTOs is ran with 4 Senior leaders in Engineering. The program exists of peer mentoring sessions (forums) and effective network building. The goals of the pilot are:
The pilot programs' success will be evaluated on November 30, 2020. After the evaluation there will be a decision whether to continue into FY22 with this program.
The CTO is an executive sponsor for selected customers.
A shadow program is available to everyone in engineering (especially senior leaders) in order to have an opportunity to observe and participate in one of the executive sponsor meetings. Doing so can be a great way to hear directly from customers about what they like about GitLab and about what we can improve. (This program is similar in some ways to the CEO Shadow Program).
If you choose to be a shadow, your responsibilities will be:
To request to be a shadow: Post a message in the #cto Slack channel, indicate your timezone, and CC the CTO's EBA Kristie Thomas.
There is a program to find a mentor or to become a mentor at GitLab described on this handbook page.
You can find more information on this experimental program in this handbook page.
To maintain our rapid cadence of shipping a new release on the 22nd of every month, we must keep the barrier low to getting things done. Since our team is distributed around the world and therefore working at different times, we need to work in parallel and asynchronously as much as possible.
That also means that if you are implementing a new feature, you should feel empowered to work on the entire stack if it is most efficient for you to do so.
Nevertheless, there are features whose implementation requires knowledge that is outside the expertise of the developer or even the group/stage group. For these situations, we'll require the help of an expert in the feature's domain.
In order to figure out how to articulate this help, it is necessary to evaluate first the amount of work the feature will require from the expert.
If the feature only requires the expert's help at an early stage, for example designing and architecting the future solution, the approach will be slightly different. In this case, we would require the help of at least two experts in order to get a consensual agreement about the solution. Besides, they should be informed about the development status before the solution is completed. This way, any discrepancy or architectural issue related to the current solution, will be brought up early.
We have specific guidelines to ensure consistency for Engineering automation using approved secure patterns aligned with least privileged access principle.
We need to maintain code quality and standards. It's very important that you are familiar with the Development Guides in general, and the ones that relates to your group in particular:
Please remember that the only way to make code flexible is to make it as simple as possible:
A lot of programmers make the mistake of thinking the way you make code flexible is by predicting as many future uses as possible, but this paradoxically leads to *less* flexible code.— Nearby Cats (@BaseCase) January 16, 2019
The only way to achieve flexibility is to make things as simple and easy to change as you can.
It is important to remember that quality is everyone's responsibility. Everything you merge to master should be production ready. Familiarize yourself with the definition of done.
Our releases page describes our two main release channels:
As the first of these is a monthly release, it's tempting to try to rush to get something in to a monthly self-managed release. However, this is an anti-pattern. Most issues don't have strict due dates. Those that do are exceptions, and should be treated as such.
Due date pressure logically leads to a few outcomes:
Only the last two outcomes are acceptable as a general rule. Missing a 'due date' in the form of an assigned milestone is often OK as we put velocity above predictability, and missing the monthly self-managed release does not prevent code from reaching GitLab.com.
For these reasons, and others, we intentionally do not define a specific date for code to be merged in order to reach a self-managed monthly release. The earlier it is merged, the better. This also means that:
If it is essential that a merge request make it in a particular release, this must be communicated well in advance to the engineer and any reviewers, to ensure they're able to make that commitment. If a severe bug needs to be fixed with short notice, it is better to revert the change that introduced it than to rush, or even to delay the release until the fix is ready.
In general, there is no need to change any behavior close to the self-managed release.
In most cases, a single engineer and maintainer review are adequate to handle a priority::1/severity::1 issue. However, some issues are highly difficult or complicated. Engineers should treat these issues with a high sense of urgency. For a complicated priority::1/severity::1 issue, multiple engineers should be assigned based on the level of complexity. The issue description should include the team member and their responsibilities.
If we have cases where three or five or X people are needed, Engineering Managers should feel the freedom to execute on a plan quickly.
Following this procedure will:
Error budgets process is described on the error budgets page.
Engineering is the primary advocate for the performance, availability, and security of the GitLab project. Product Management prioritizes all initiatives, so everyone in the engineering function should participate in the Product Management prioritization process to ensure that our project stays ahead in these areas. The following list should provide some guidelines around the initiatives that each engineering team should advocate for during their release planning:
Support Team Contributionslabel. You can filter on open MRs.
Part of our engineering culture is to keep shipping so users and customers see significant new value added to GitLab.com or their self-managed instance. To support rapid development, we choose pragmatically the right technology. Rails page views by default for basic pages is our first choice because server side page loads are better suited for iteration.
As a feature matures, if complex interactive or async UX is needed we choose VueJS as a single page app backed by our API (GraphQL preferred). This is in order to maintain the best qualitative experience and quantitative performance. Also VueJS is the right tool to create dynamic web applications like the Web IDE or when we need to be able to scale to the data (for example file trees that would take 20s to load can already show the first data in 2 seconds due to the asynchronous loading).
Moved to a dedicated page.
GitLab makes use of a 'Canary' stage. Production Canary is a series of servers running GitLab code in a production environment. The Canary stage contains code functional elements like web, container registry and git servers while sharing data elements such as sidekiq, database, and file storage with production. This allows UX code and most application logic code to be consumed by a smaller subset of users under real world scenarios before being made available to all users on GitLab.com.
The production Canary stage is forcibly enabled for all users visiting GitLab Inc. operated groups:
The Infrastructure department teams can globally disable use of production Canary when necessary. Individuals can also opt-out of using production Canary environments. However, opting-out does not include the aforementioned groups above.
To opt in/out, go to GitLab Version and move the toggle appropriately.
To verify that Canary is enabled, in the header, next to the GitLab logo will be a 'Next' icon, or use the performance bar (typing
pb) in GitLab and watch out for the Canary icon next to the web server name.
When using any of the resources listed below, some rules apply:
Every team member has access to a common project on Google Cloud Platform. Please see the secure note with the name "Google Cloud Platform" in the shared vault in 1password for the credentials or further details on how to gain access.
Once in the console, you can spin up VM instances, Kubernetes clusters, etc. Where possible, please prefix the resource name with your name for easy identification (e.g.
Please remove any resources that you are not using, since the company is billed monthly. If you are unable to create a resource due to quota limits, file an issue on the Infrastructure issue tracker.
If you encounter the following error when creating a new GKE cluster, this indicates that we cannot create more clusters within that network. Please ask in #kubernetes for team members to delete unused clusters, or alternatively create your cluster in a different network.
The network "default" does not have available private IP space in 10.0.0.0/8
Every team member has access to the dev-resources project which allows everyone to create and delete machines on demand.
In general, most team members do not have access to AWS accounts. In case you need an AWS resource, file an issue on the Access Requests issue tracker. Please supply the details on what type of access you need.
There are primarily two Slack channels which developers may be called upon to assist the production team when something appears to be amiss with GitLab.com:
#backend: For backend-related issues (e.g. error 500s, high database load, etc.)
Treat questions or requests from production team for immediate urgency with high priority.
There are some engineering handbook topics that we cannot be publicly transparent about. These topics can be viewed by GitLab team members in the engineering section of the private handbook.
If you experience a page not found (404) error when attempting to access the internal handbook, you may need to register to use it via first browsing to the internal handbook authorization page.