|GitLab Team Handle||
|Team Boards||Team Board & Priority Board|
Engineering productivity has monthly office hours on the 3rd Wednesday of the month at 3:00 UTC (20:00 PST) on even months (e.g February, April, etc) open for anyone to add topics or questions to the agenda. Office hours can be found in the GitLab Team Meetings calendar
The Engineering Productivity team increases productivity of GitLab team members and contributors by shortening feedback loops and improving workflow efficiency for GitLab projects. The team uses a quantified approach to identify improvements and measure results of changes.
Increase pipeline stability, shorten the feedback loop, reduce pipeline cost, and reduce pipeline duration for GitLab projects focusing on projects with the largest reach, leveraging GitLab features where possible.
Collaborate with the Contributor Success team to enable frequent and positive experience of contributions from the Wider GitLab Community.
The Engineering Productivity team focuses on the following workstreams and the associated Epics with workstream specific vision and objectives.
|~"ep::pipeline"||GitLab Project Pipeline Improvement|
|~"ep::review-apps"||Improve Review Apps reliability & efficiency|
|~"ep::workflow"||Reviewer Roulette Improvements|
Engineering Productivity team resides under the Quality Department operating as a team of Full-stack engineers, led by an Engineering Manager reporting to the Quality Department Leader.
|Jennifer Li||Acting Manager, Senior Backend Engineer, Engineering Productivity|
|Alina Mihaila||Senior Backend Engineer, Engineering Productivity|
|Ash McKenzie||Staff Backend Engineer, Engineering Productivity|
|David Dieulivol||Senior Backend Engineer, Engineering Productivity|
|Jen-Shin Lin||Senior Backend Engineer, Engineering Productivity|
|Nao Hashizume||Backend Engineer, Engineering Productivity|
|Rémy Coutable||Principal Engineer, Quality|
|Greg Alfaro||GDK Project Stable Counterpart, Application Security|
Engineering Productivity has an alternating weekly team meeting schedule to allow for all team members to collaborate in times that work for them.
Showcases are done every two months and will be voted on by the team asynchronously in an issue.
:thumbsup:reactions on the ideas they'd like to hear about.
The Engineering Productivity team uses modified prioritization and planning guidelines for targeting work within a Milestone.
The Engineering Productivity team creates metrics in the following sources to aid in operational reporting.
Exception Ratio: 2 Staff+ Engineering Team
Justification: Engineering Productivity has a wide focus to enable efficiency for GitLab code workflows. The team is implementing productivity improvements and globally optimizing for all workflows (GitLab, JiHu and Contributors). Staff+ team members focus on digging deep into feedback loop bottlenecks (Solver) and ensuring that the approach and implementation is scalable (Tech Lead) by working with counterparts in Development and Infrastructure departments.
Future Growth or Anticipated Change: It is expected that Engineering Productivity will maintain the current ratio during FY22 and re-evaluate in FY23.
The Engineering Productivity team will make changes which can create notification spikes or new behavior for GitLab contributors. The team will follow these guidelines in the spirit of GitLab's Internal Communication Guidelines.
Pipeline changes that have the potential to have an impact on the GitLab.com infrastructure should follow the Change Management process.
Pipeline changes that meet the following criteria must follow the Criticality 3 process:
These kind of changes led to production issues in the past.
The team will communicate significant pipeline changes to
#development in Slack and the Engineering Week in Review.
Pipeline changes that meet the following criteria will be communicated:
Other pipeline changes will be communicated based on the team's discretion.
Be sure to give a heads-up to
#ux Slack channels
and the Engineering week in review when an automation is expected to triage more
than 50 notifications or change policies that a large stakeholder group use (e.g. team-triage report).
As the owner of pipeline configuration for the GitLab project, the Engineering Productivity team has adopted several test intelligence strategies aimed to improve pipeline efficiency with the following benefits:
These strategies include:
Tests that provide coverage to the code changes in each merge request are most likely to fail. As a result, merge request pipelines for the GitLab project run only the predictive set of tests by default. These include:
Test mapping is done via the test_file_finder gem. It is worth noting that the gem is supplemented with a static mapping file to account for known gaps, as the automated mapping is not always perfect.
The "Fail-fast" job we are experimenting with is a variation of this strategy. It is to run all of the RSpec tests that are most likely to fail in a single job created in an early stage of the pipeline. See the Fail-fast job section for details.
There is a fail-fast job in each merge request pipeline aimed to run all the RSpec tests that provide coverage for the code changes, hence are most likely to fail. It uses the same test_file_finder gem for test mapping. The job provides faster feedback by running early and stops the rest of the pipeline right away if any of the fail-fast job tests fail. Take a look at this youtube video for details on how GitLab implements the fail-fast job with test_file_finder. Note that the current design only works with low-impacting merge requests which are only mapped to a small set of tests. If there is a large number of tests that are likely to fail for a merge request, putting them in a single job is not feasible and could result in a long-running bottleneck which defeats its purpose.
Premium GitLab customers, who wish to incorporate the
Fail-Fast job into their Ruby projects, can set it up with our Verify/Failfast template.
Tests that previously failed in a merge request are likely to fail again, so they provide the most urgent feedback in the next run. To grant these tests the highest priority, the GitLab pipeline prioritizes previously failed tests by re-running them early in a dedicated job, so it will be one of the first jobs to fail if attention is needed.
The GitLab pipeline consists of hundreds of jobs, but not all are necessary for each merge request. For example, a merge request with only changes to documenation files do not need to run any backend tests, so we can exclude all backend test jobs from the pipeline. See specify-when-jobs-run-with-rules for how to include/exclude CI jobs based on file changes. Most of the pipeline rules for the GitLab project can be found in https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/ci/rules.gitlab-ci.yml.
Developers can add labels to run jobs in addition to the ones selected by the pipeline rules. Those labels start with
pipeline: and multiple can be applied. A few examples that people commonly use:
See docs for when to use these pipeline labels.
This is a list of Engineering Productivity experiments where we identify an opportunity, form a hypothesis and experiment to test the hypothesis.
|Experiment||Status||Hypothesis||Feedback Issue or Findings|
|Always run predictive jobs for fork pipelines||Complete||The goal is to reduce the CI minutes consumed by fork pipelines. The "full" jobs only run for canonical pipelines (i.e. pipelines started by a member of the project) once the MR is approved.|
|Retry failed specs in a new process after the initial run||Complete||Given that a lot of flaky tests are unreliable due to previous test which are affecting the global state, retrying only the failing specs in a new RSpec process should result in a better overall success rate.||https://gitlab.com/gitlab-org/quality/team-tasks/-/issues/1148#note_914106156|
|Experiment with automatically skipping identified flaky tests||Complete||Skipping flaky tests should reduce the number of false broken
||We found that this doesn't seem to have a negative impact on
|Experiment with running previously failed tests early||Complete||We have not noticed a significant improvement in feedback time due to other factors impacting our Time to First Failure metric.|
|Store/retrieve tests metadata in/from pages instead of artifacts||In Progress||We're only interested in the latest state of these files, so using Pages makes sense here. Also, this would simplify the logic to retrieve the reports and reduce the load on GitLab.com's infrastructure.||There are some transient problems where a Cloudflare page is returned instead of the expected JSON file.|
|Reduce pipeline cost by reducing number of rspec tests before MR approval||In Progress||Reduce the CI cost for GitLab pipelines by running the most applicable rspec tests for changes prior to approval||Improvements needed to identify and resolve selective test gaps as this impacted pipeline stability.|
|Enabling developers to run failed specs locally||In Progress||Enabling developers to run failed specs locally will lead to less pipelines per merge request and improved productivity from being able to fix regressions more quickly||https://gitlab.com/gitlab-org/gitlab/-/issues/327660|
|Use dynamic analysis to streamline test execution||Complete||Dynamic analysis can reduce the amount of specs that are needed for MR pipelines without causing significant disruption to master stability||Miss rate of 10% would cause a large impact to master stability. Look to leverage dynamic mapping with local developer tooling. Added documentation from the experiment.|
|Using timezone for Reviewer Roulette suggestions||Complete - Reverted||Using timezone in Reviewer Roulette suggestions will lead to a reduction in the mean time to merge||Reviewer Burden was inconsistently applied and specific reviewers were getting too many reviews compared to others. More details in the experiment issue and feedback issue|