The Plan:Product Planning & Plan:Certify backend team works on the backend part of GitLab's Product Planning and Certify categories in the Plan stage.
For more details about the vision for this area of the product, see the Plan stage page.
Person | Role |
---|---|
John Hope | Backend Engineering Manager, Plan:Product Planning |
Charlie Ablett | Senior Backend Engineer, Plan:Product Planning |
Eugenia Grieff | Backend Engineer, Plan:Product Planning |
Felipe Artur | Senior Backend Engineer, Plan:Product Planning |
Florie Guibert | Senior Frontend Engineer, Plan:Product Planning |
Jan Provaznik | Staff Backend Engineer, Plan:Product Planning |
Kushal Pandya | Senior Frontend Engineer, Plan:Product Planning |
Rajat Jain | Frontend Engineer, Plan:Product Planning |
Person | Role |
---|---|
Alexis Ginsberg | Senior Product Designer, Plan:Product Planning |
Marcin Sędłak-Jakubowski | Technical Writer, Plan |
Costel Maxim | Senior Security Engineer, Application Security, Plan (Project Management, Product Planning, Certify), Create:Source Code |
Melissa Ushakov | Group Manager, Product Management, Plan and Ecosystem |
Check out our jobs page for current openings.
We have a metrics dashboard intended to track against some of the Development Department KPIs, particularly those around merge request creation and acceptance. From that dashboard, the following chart shows MR Rate.
The following chart shows the MR Rate of the Dev section as a whole, for the identification of trends:
We have useful dashboards tracking the performance of parts of the application we're responsible for:
We're tracking a number of issues that we believe could cause scalability problems in the future.
Type | Description | Estimated Timeline for Failure | Resolution Due Date | 12 Month Target | Issue | Status |
---|---|---|---|---|---|---|
Int4 Primary Key Overflow | Primary key overflow in the notes table. |
April 2023 | April 2022 | Creation of 1m Notes per Day | Attention | |
Redis Primary CPU | Unexpected load on the Shared State Redis instance caused by SUBSCRIBE , UNSUBSCRIBE and PUBLISH commands. |
Unknown | April 2022 | 150k Concurrent WebSocket Connections at peak | Okay | |
Redis Memory | Retention of Action Cable messages in Redis Shared State memory due to high numbers of and/or stalled/hung clients. | Unknown | April 2022 | 150k Concurrent WebSocket Connections at peak | #326364 | Okay |
Various | Scaling a combined 'Work Items' table consisting of all current issues, epics, requirements and test cases. | Unknown | April 2022 | 100k Work Items created per day | Okay |
Note: Work is ongoing on migration helpers to mitigate Int4 Primary Key Overflows. These will provide a standard way to resolve these issues.
Type | Description | Estimated Timeline for Failure | Resolution Due Date | 12 Month Target | Issue | Status |
---|---|---|---|---|---|---|
Redis Primary CPU | EmailReceiverWorker creates expensive lpush and hset operations on redis-sidekiq primary |
September 2021 | September 2021 | 1200 Service Desk issues per day | gitlab-com/gl-infra&469 | Resolved |
See the Plan stage page and the Plan:Project Management backend team page.
We use a lightweight system of issue weighting to help with capacity planning, with the knowledge that things take longer than you think. These weights are used for capacity planning and the main focus is on making sure the overall sum of the weights is reasonable.
It's OK if an issue takes longer than the weight indicates. The weights are intended to be used in aggregate, and what takes one person a day might take another person a week, depending on their level of background knowledge about the issue. That's explicitly OK and expected.
These weights we use are:
Weight | Meaning |
---|---|
1 | Trivial, does not need any testing |
2 | Small, needs some testing but nothing involved |
3 | Medium, will take some time and collaboration |
4 | Substantial, will take significant time and collaboration to finish |
5 | Large, will take a major portion of the milestone to finish |
Anything larger than 5 should be broken down if possible.
We look at recent releases and upcoming availability to determine the weight available for a release.
Estimating bugs is inherently difficult. The majority of the effort in fixing bugs is finding the cause, and then a bug be accurately estimated. Additionally, velocity is used to measure the amount of new product output, and bug fixes are typically fixes on a feature that has been tracked and had a weight attached to it previously.
Because of this, we do not weigh bugs during ~"workflow::planning breakdown". If an engineer picks up a bug and determines that there will be a significant level of effort in fixing it (for example, a large migration is needed, or we need to switch state management to Vuex on the frontend), we then will want to prioritize it against feature deliverables. Ping the product manager with this information so they can determine when the work should be scheduled.
To assign weights to issues in a future milestone, we ask two team members to take the lead each month. They can still ask questions - of each other, of the rest of the team, of the stable counterparts, or anyone else - but they are the initial. This should start on the 4th of the month, per the Product Development Timeline.
To weight issues, they should:
The rotation for upcoming releases is:
Release | Weights due | Engineer | Engineer |
---|---|---|---|
14.10 | 2022-03-13 | Eugenia Grieff | Felipe Artur |
15.0 | 2022-04-13 | Eugenia Grieff | Jan Provaznik |
15.1 | 2022-05-13 | Jan Provaznik | Charlie Ablett |
15.2 | 2022-06-13 | Charlie Ablett | Felipe Artur |
15.3 | 2022-07-13 | Felipe Artur | Eugenia Grieff |
15.4 | 2022-08-13 | Eugenia Grieff | Jan Provaznik |
Work that arrives in ~"workflow::ready for development" that is out of scope or ill-defined should be returned to ~"workflow::planning breakdown" for further refinement. To avoid the disruption this introduces we try to reduce the number of times it happens by planning more carefully. While it's not always possible, we aim to identify complexity before the build phase. For this reason, we assign a backend DRI to help with each upcoming deliverable during design and validation phases.
However, sometimes complexity can't be accurately estimated until development work starts. If you anticipate this during planning, consider creating a spike to produce a design document. Notify the participants in the issue, especially the PM, that a spike is required, create a separate issue and follow these steps:
The deliverable is a design document that answers the questions set out in the issue description. This can simply be the issue itself, containing a summary of the discussion in the description, answers to the questions and links to any PoC MRs produced.
Points of weight delivered by the team in previous milestones, including a 3-month rolling average, are available in this chart. This allows for more accurate estimation of what we can deliver in future milestones.
As a team we often work on features that require close collaboration. We've identified a list of techniques and characteristics that help projects like this proceed at a pace that is sustainable, predictable, and challenging, yet rewarding. An example of such feature was Epic Linking.
Most issues, especially features, involve working with other disciplines. A single issue will often be shared between frontend and backend and it can be difficult to know which workflow label should be applied, especially when progress is at different stages.
To ensure visibility for other team-members, for issues with a frontend and backend component:
We value velocity over predictability so use your own judgement on whether you should wait for a frontend engineer to get involved before proceeding with development.
The ~"backend complete" label is added to issues with multiple specializations (usually backend and frontend) to indicate that the backend component is complete. Add this label when the backend work is functionally complete, merged and verified but frontend, or other, work is ongoing.
Documentation should accompany code for any new or changed functionality as per our definition of done. This can become tricky when collaborating on a feature that is behind a feature flag.
Since all feature flags start as disabled by default, we should aim to document the feature as soon as it's safe for testing by users using the feature flag template. Don't wait until a feature is performant and stable to document it, instead do so once it's secure and won't leave data in a corrupt, interim state.
Try to include docs with the first MR to introduce usable functionality. If this is an API addition with no UI, document that and allow the FE engineers to update it as work proceeds. As the feature flag rollout proceeds, the documentation should be updated.
This avoids the rush to provide documentation that often accompanies the release cutoff.
(Sisense↗) We also track our backlog of issues, including past due security and infradev issues, and total open SUS-impacting issues and bugs.
(Sisense↗) MR Type labels help us report what we're working on to industry analysts in a way that's consistent across the engineering department. The dashboard below shows the trend of MR Types over time and a list of merged MRs.
More detail is available on our metrics page.
Product Planning is part of a test of new MR sub-type labels which are designed to make it easier to understand which top-level type should be applied. You can read more about them in the Work Type Classification section of the metrics page.
Note: MR Type may differ from issue type. For example, a ~"maintenance::dependency" change that supports a new ~"feature::enhancement".
The team Build Board always shows work in the current release, with workflow columns relevant to implementation. Filtering it by ~backend shows issues for backend engineers to work on.
It's OK to not take the top item if you are not confident you can solve it, but please post in #s_plan if that's the case, as this probably means the issue should be better specified.
When an issue comes through that is both ~"severity::1" and ~"priority::1", our SLO requires that it be looked at right away. Other items being worked on should be postponed in favor of any investigations or work for the high severity/priority issue. When postponing an issue, engineers should leave a comment on the issue with a link to the high severity item that is being prioritized instead. Leaving a comment will help with communication with the cross-functional team and for historical tracking. The exception to this is if another ~"severity::1"/~"priority::1" issue is currently being worked on by an engineer. If this is the case, the engineer should make others on the team aware of the new issue on Slack but then keep working on the initial issue.
Everyone at GitLab has the freedom to manage their work as they see fit, because we measure results, not hours. Part of this is the opportunity to work on items that aren't scheduled as part of the regular monthly release. This is mostly a reiteration of items elsewhere in the handbook, and it is here to make those explicit:
When you pick something to work on, please: