We provide confidence in software by increasing visibility into code change impacts on software in GitLab.
The Verify:Pipeline Security Group provides visibility into the results of code changes on software applications. We aim to easily integrate layers testing and verification into their GitLab CI workflow including:
See all current and planned category maturity in the Maturity page.
We want software teams to feel confident that the changes they introduce into their code are safe and conformant.
We measure the value we contribute by using a Product Performance Indicator. Our current PI for the Pipeline Security group is the GMAU (internal handbook). This is a rolling count of unique users who have triggered a pipeline that uploads a test or coverage report. This is not currently instrumented and we are tracking progress of instrumentation in gitlab&4528.
This funnel represents the customer journey and the various means a product manager may apply a Performance Indicator metric to drive a desired behavior in the funnel. This framework can be applied to any of the categories being worked on by Verify:Pipeline Security. The current priority is to increase Activation within the Code Testing and Coverage Category.
As part of the FY22 SaaS First investment theme and our committment to building a reliable service we use error budgets to track availability and help guide how much investment we make milestone to milestone in reliability. More information is shared on the Ops Section Performance Indicators (internal handbook).
The following people are permanent members of the Verify:Pipeline Security group:
To find our stable counterparts look at the Pipeline Security product category listing.
You can view and contribute to our current list of JTBD and job statements here.
Like most GitLab backend teams we spend a lot of time working in Rails on the main GitLab app. Familiarity with Docker and Kubernetes is also useful on our team.
Our active feature flags can be found on the feature flag dashboard. Make sure to apply the team_group
testing filter.
(Sisense↗) We also track our backlog of issues, including past due security and infradev issues, and total open System Usability Scale (SUS) impacting issues and bugs.
(Sisense↗) MR Type labels help us report what we're working on to industry analysts in a way that's consistent across the engineering department. The dashboard below shows the trend of MR Types over time and a list of merged MRs.
(Sisense↗) Flaky test are problematic for many reasons.
(Sisense↗) Slow tests are impacting the GitLab pipeline duration.
Our team does not use Quad-Planning because we do not have an assigned Software Engineer in Test. Intead we deploy a "tripod" approach where PM, EM, and UX work together to plan at least three milestones worth of work at a high level. During planning, the PM sets the general direction for the team and works with the EM to ensure the team's plans are ambitious, yet achievable based on the team's capacity. This ensures checks and balances during the planning discussions to minimize carryover and ensure engineering priorities are addressed along with feature priories in the roadmap. UX will work with PM to ensure research and design issue are ready in time for implementation. Planning is mostly asynchoronous via the release planning issue (see below), with additional discussion during weekly 1:1 sessions and team meetings as needed. This tripod approach allows us to address dependencies early on and ensure Design and Engineering are aligned on our product roadmap.
We use a release planning issue to plan our release-level priorities over each milestone. This issue is used to highlight deliverables, capacity, team member holidays, and more. This allows team members and managers to see at a high-level what we are planning on accomplishing for each release, and serves as a central location for collecting information. It is important to note this planning issue is static once the milestone starts and will not be updated to reflect changes to the milestone (e.g. a higher priority item is added to the milestone post-start and a planned issue is removed). This allows us to compare our baseline plan to final release as part of the retrospective issue.
Before issues can be moved from the workflow::planning breakdown
status into the workflow::ready for development
status, they must have a weight greater than 0 applied to the issue.
The Product Manager (PM) and/or Engineering Manager (EM) will tag the approriate team member to provide input for issues that need to be weighed in advance of the target milestone.
We would like to give engineers one full milestone to refine and weight the issues.
An issue weight is determined based on the complexity in the following criteria:
Based on these criteria, an issue can have one of the following weights:
Weight | Description |
---|---|
1: Trivial | Issues that are very well understood. The exact solution is already known and straight forward. The scope of change is very isolated. Examples are documentation updates, simple regressions, and bugs or technical debt that can be fixed with a few lines of code. |
2: Small | Issues that are well understood and have an outlined solution. No surprises are expected. No coordination with other teams or people is required. Examples are simple features, like a new API endpoint to expose existing data or functionality, or regular bugs or performance issues. |
3: Medium | Issues that are well understood and have an outlined solution. These issues require external team involvement or coordination to be released, such as a feature flag. Examples are regular features, potentially with a backend and frontend component, or most bugs or performance issues. |
5: Large | Issues that are known to be complex. A solution has been outlined. There are many major edge cases that need to be catered for. Surprises are expected. Extensive coordination with other teams is required. Careful release process needs to be considered. These issues may have potential for adverse performance impact or catastrophic failures. They may also involve more one components (backend, frontend, gitaly, workhorse, runner, etc) or changes the interaction between the components. Examples are issues that changes API contracts requiring backward compatibility, requires multiple feature flags to be safely released. It could also be issues where the team does not have any existing expertise or knowledge, or require changes in components that the team does not usually work on. |
8: Unknown | An issue that is weight 8 will not be scheduled and instead should be investigated further in order to be broken down into smaller issues Examples are bugs that are not well understood or easily replicated, bugs or features that do not have a suggested solution. |
If the weight of an issue cannot be determined within a day, create a separate investigation issue for an in-depth investigation.
When issues require a design proposal, we follow the Product Development Flow. Design and development should work together from the start to ensure the issue follows our MVC guidelines, while still providing value and a usable experience.
To maintain a SSOT, the same issue should be used for design and development. This creates less duplicated work for both teams. Product designers should use the UX Definition of Done template to clearly state where the issue stands in the product development flow. An example of this in practice is https://gitlab.com/gitlab-org/gitlab/-/issues/33418/.
Once the design is complete, and appropriate workflow labels are applied, design, quality, and development (include FE, BE, and EM) should work together to break down the issue further for implementation, if necessary.
In the process of refinement we may discover a new feature will require a blueprint or the team feels input from maintainers will help scope down the problem, ensure the feature is performant and/or reduce future technical debt. When this happens the team will create a Technical Investigation issue for the investigation. This issue will be assigned to one team member. That team member should spend the minimum amount of time to create documentation, a poc, or some other artifact that clarifies the approach to the problem, ideally in less than 5 working days. This will help us to gather information, validate the solution with others, and propose a plan to execute. They will answer specific questions outlined in the Technical Investigation issue before work on the feature is started. This process is analogous to the concept of a Spike.
When possible, the assigned team member is encouraged to schedule synchronous time with another developer to pair on the investigation and publishing of the results (Example Technical Investigation issue gitlab#336617). By default Technical Investigation issues are weighted at a 2 and we timebox them to 3 business days from start to presentation of data. Team members may change this weight and/or time frame at their discretion.
We limit the number of of technical investigation issues assigned in each milestone to two so that we can maintain overall velocity and MR Rate.
If an issue has several components (e.g. ~frontend, ~backend, or ~documentation) we should split it up into separate implementation issues. When these issues are created, the issues should be titled Frontend: [Issue title]
or Backend: [Issue title]
, and marked as blocked by
in case one is blocking the other. The original issue should hold all the discussion around the feature, with the implementation issues being used to track the work done. By splitting issues, there are several benefits:
Error Budgets for stage groups have been established in order to help groups identify and prioritize issues that are impacting customers and infrastructure performance.
The Pipeline Security group Error Budget dashboard is used to identify issues that are contributing to the Pipeline Security group's error budget spend.
The engineering manager will review the error budget dashboard weekly to determine whether we're exceeding our budget, determine what (if anything) is contributing to our error budget spend, and create issues addressing root cause for product manager prioritization. Issues created to address error budget spend should be created using appropriate labels and prioritized according to our technical debt process. Other issues may be created as a result of the Ops Section SaaS Reviews and also prioritized using the technical debt process.
We follow the company guidance in how we prioritize technical debt and UX debt. In order to manage this effectively we have decided to keep track of technical and UX debt and feature maintenance issues that are ready to be worked on in the "~workflow::scheduling" column, prioritized by Product, but informed by the impact these issues will have on our future velocity. We will try to dedicate a certain percentage (~30%) of our capacity by weight in each milestone to paying down technical and UX debt. We do not consider bugs to be debt, and they are prioritized as part of the remaining capacity by weight.
Engineering managers apply the Deliverable
label on issues that meet the following criteria:
Before the team will accept an issue into a milestone for work it must meet these criteria:
Issues that depend on another issue to be completed before they can be validated on Canary
are considered blocked, and should have the ~workflow::blocked
label applied. The issues should also be marked as blocked within the related issues section of the issues.
We utilize a process wherein issues that are follow-ups to other issues and are being worked on in the same milestone they are created will have a ~follow-up
label added to them. Examples of items the team may create and label as ~follow-up
include but is not limited to feature scope-creep, non-blocking requests from code review, additional UI polish, non-blocking refactoring that can be done, Low Priority (P2 or lower) bug fixes, etc.
During each milestone, we create a Release Post Checklist issue that is used by team members to help track the progress of the team release posts. We use the checklist issue to associate release post merge requests with links to the implementation issue, links to updated documentation, and list which engineers are working on each issue. The checklist issue provides a single place to see all this information.
Unless specifically mentioned below, the Verify:Pipeline Security group follows the standard engineering, product, and UX workflows.
Verify:Pipeline Security team members are encouraged to start looking for work starting Right to left on our workflow board. This is also known as "Pulling from the right". If there is an issue that a team member can help along on the board, they should do so instead of starting new work. This includes conducting code review on issues that the team member may not be assigned to if they feel that they can add value and help move the issue along the board.
Specifically this means, in order:
workflow::in review
workflow::in development
columnworkflow::ready for development
column OR an item the team member investigated to apply the estimated weight if unfamiliar with the top item.workflow::planning breakdown
column so that we can move them to the workflow::scheduling
columnThe goal with this process is to reduce WIP. Reducing WIP forces us to "Start less, finish more", and it also reduces cycle time. Engineers should keep in mind that the DRI for a merge request is the author(s), just because we are putting emphasis on the importance of teamwork does not mean we should dilute the fact that having a DRI is encouraged by our values.
We use a series of labels to indicate the highest priority issues in the milestone.
Deliverable
issues. The issues will be stack ranked on the workflow::ready for development
column in top-to-bottom priority order.Deliverable
issue that a team member can "pull from the right", the team member can take the next stack ranked item in the workflow::ready for development
column.Issues in "Planning Breakdown" and "Ready for Development" are in top-to-bottom priority order on the planning board. Issues further to the right on the issue board are not in vertical priority order. Rather, the further to the right an issue is on the board, the higher the priority which follows our "Pull from the right" philosophy of working.
After issues are refined and weighted in the workflow::planning breakdown
the issue is then moved to the workflow::scheduling
column. The issues in the workflow::scheduling
column are assessed at least weekly in the PM/EM sync meeting and moved to their stack ranked positions in the workflow::ready for development
column.
If a team member believes a specific issue should be considered a Deliverable
or a higher priority, they are encouraged to ping the product and engineering managers on the issues where we can discuss and decide. Note that issues need to be refined with a weight for them to be considered for the Deliverable
label.
Code reviews follow the standard process of using the reviewer roulette to choose a reviewer and a maintainer. The roulette is optional, so if a merge request contains changes that someone outside our group may not fully understand in depth, it is encouraged that a member of the Verify:Pipeline Security team be chosen for the preliminary review to focus on correctly solving the problem. The intent is to leave this choice to the discretion of the engineer but raise the idea that fellow Verify:Pipeline Security team members will sometimes be best able to understand the implications of the features we are implementing. The maintainer review will then be more focused on quality and code standards.
We also recommend that team members take some time to review each others merge requests even if they are not assigned to do so, as described in the GitLab code review process. It is not necessary to assign anyone except the initial domain reviewer to your Merge Request. This process augmentation is intended to encourage team members to review Merge Requests that they are not assigned to. As a new team, reviewing each others merge requests allows us to build familiarity with our product area, helps reduce the amount of investigation that needs to be done when implementing features and fixes, and increases our lottery factor. The more review we can do ourselves, the less work the maintainer will have to do to get the merge request into good shape.
This tactic also creates an environment to ask for early review on a WIP merge request where the solution might be better refined through collaboration and also allows us to share knowledge across the team.
As part of the code review process, when an MR makes a user-facing change (no matter how small), it should be reviewed by a Product Designer. When in doubt, if the MR is related to an issue with the ~UX label, involve a Product Designer. Include screenshots or screen recordings of the changes in the MR description whenever possible to make the review process easier for everyone.
For iterations on features behind feature flags, even when the changes won't be user-facing right away, involve a Product Designer. If they feel that it would be more efficient to do so, the Product Designer may choose to defer their review of the feature-flagged feature until it is closer to being complete. This option gives them the flexibility to prioritize their workload as they see fit and avoid some of the noise generated by the MR review process.
On Track
within the week.
On Track
status.On Track
today.
On Track
status in your message.When an engineer is actively working (workflow of ~workflow::"In dev" or further right on current milestone) on an issue they will periodically leave status updates as top-level comments in the issue. The status comment should include the updated health status, any blockers, notes on what was done, if review has started, and anything else the engineer feels is beneficial. If there are multiple people working on it also include whether this is a front end or back end update. An update for each of MR associated with the issue should be included in the update comment. Engineers should also update the health status of the issue at this time.
This update need not adhere to a particular format. Some ideas for formats:
Health status: (On track|Needs attention|At risk)
Notes: (Share what needs to be shared specially when the issue needs attention or is at risk)
Health status: (On track|Needs attention|At risk)
What's left to be done:
What's blocking: (probably empty when on track)
## Update <date>
Health status: (On track|Needs attention|At risk)
What's left to be done:
#### MRs
1. !MyMR1
1. !MyMR2
1. !MyMR3
There are several benefits to this approach:
Some notes/suggestions:
In addition to the steps documented for developing with feature flags at GitLab Verify:Pipeline Security engineers monitor their changes' impact on infrastructure using dashboards and logs where possible. Because feature flags allow engineers to have complete control over their code in production it also enables them to take ownership of monitoring the impact their changes have against production infrastructure. In order to monitor our changes we use this helpful selection of dashboards and specifically the Rails controller dashboard (Internal Only) for monitoring our changes in production. Metrics we evaluate include latency, throughput, CPU usage, memory usage, and database calls, depending on what our change's expected impact will be and any considerations called out in the issue.
The goal of this process is to reduce the time that a change could potentially have on production infrastructure to the smallest possible window. A side benefit of this process is to increase engineer familiarity with our monitoring tools, and develop more experience with predicting the outcomes of changes as they relate to infrastructure metrics.
After an engineer has ensured that the Definition of Done is met for an issue, they are the ones responsible for closing it. The engineer responsible for verifying an issue is done is the engineer who is the DRI for that issue, or the DRI for the final merge request that completes the work on an issue.
We want to create a welcoming environment for everyone who is interested in contributing to GitLab in anyway be it code, documentation, UX, or others.
If an issue is being resolved in a merge request that is made by a community contributor, the engineering manager will assign the associated issue to themselves to keep better track of it. The engineering manager will help to coach the community contributor through the merge request process, or be responsible for delegating that coaching to a team member engineer.
The Pipeline Security group supports the product marketing categories described below:
Label | ||||
---|---|---|---|---|
Category:Secrets Management |
Issues | MRs | Direction | Documentation |
Category:Build Artifacts |
Issues | MRs | Direction | Documentation |
Label | Description | ||
---|---|---|---|
CI variables |
Issues | MRs | Relates to functionality surrounding pre-defined and user-defined variables available in the Build environment. Formerly ~ci variables |
CI job token |
Issues | MRs | Relates to functionality surrounding CI_JOB_TOKEN available in the Build environment. |
secrets storage |
Issues | MRs | Relates to functionality surrounding the usage of secrets managers, including integration with secrets storage providers, in the Build environment. |
external authentication |
Issues | MRs | Relates to functionality surrounding tokens for external authentication available in the Build environment. |
This synchronous meeting is to discuss anything that is blocking, or notable from the past week. This meeting acts as a team touchpoint. We have two sessions of this meeting each week - one for APAC/EMEA timezones and another for AMER timezones. We record each meeting so that everyone can benefit from the discussions asynchronously.
We use geekbot integrated with Slack for our daily async standup. The purpose of the daily standup meeting is to keep the team informed about what everyone is working on, and to surface blockers so we can eliminate them. The standup bot will run at 10am in the team members local time and ask 2 questions:
We use a GitLab issue in this project for our monthly retrospective. The issue is created automatically towards the end of the current milestone. The purpose of the monthly retrospective issue is to reflect on the milestone and talk about what went well, what didn't go so well, and what we can do better. Instead of waiting until the end of the milestone to add items to the retrospective issue, we encourage team members to add comments throughout the month. We have a slack reminder on our #g_pipeline-security channel to remind us to add items to the issue each Friday.
We have a monthly synchronous 30-minute think big meeting, followed the next week by a monthly 30-minute think small meeting on the same topic of the previous think big meeting. This pair of meetings is modeled after the Gitlab Product Manager deep dive interview. The purpose of this meeting is to discuss the vision, product roadmap, user research, design, and delivery around the Pipeline Security features. The goal of this meeting will be to align the team on our medium to long-term goals and ensure that our short-term goals are leading us in that direction. This meeting is useful for aligning the team with its stable counterparts and ensuring that engineers have an understanding of the big picture and so they know how their work fits into the long-term goals of the team.
Issues worked on by the Pipeline Security group have a group label of ~"group::pipeline security". Issues that contribute to the verify stage of the devops toolchain have the ~"devops::verify" label.
Refer to the Developer Onboarding in Verify section.