Gitlab hero border pattern left svg Gitlab hero border pattern right svg

CI Queue Time Stabilization Working Group

On this page

Attributes

Property Value
Date Created November 1, 2019
Date Ended TBD
Slack #wg_ci_queue_stability (only accessible from within the company)
Google Doc CI Queue Stability Working Group (only accessible from within the company)
Issue Label wg_CIQueueStability (gitlab-com/-org)

Business Goal

Increase the stability and predictability of the CI job queue times on GitLab.com.

Intent is to:

  1. Analyze and remediate situations where our CI job queue times for shared runners exceed reasonable expectations
  2. Define metrics and tune alerting that more precisely correspond to the expectations of the CI job queues
  3. Develop troubleshooting and investigation guides to use in cases of excessive CI job queue times
  4. Perform predictive analysis on system health and growth and create issues to remediate anticipated future bottlenecks

Exit Criteria

  1. Creation and tuning of metrics and alerts that trigger when system behaviour no longer matches expectations
  2. 1-week of running with above mentioned tuned alerts without them going off
  3. Published or updated documentation of runbook information on how to diagnose, respond, and restore abnormal behavior into being normal

Roles and Responsibilities

Working Group Role Person Title
Facilitator Elliot Rushton Engineering Manager, Runner
Exec Sponsor Christopher Lefelhocz Senior Director of Development
Engineering Lead Tomasz Maczukin Backend Engineer
Infrastructure Lead Andrew Newdigate Distinguished Engineer, Infrastructure
Member Darby Frey Director of Engineering, CI/CD
Member Steve Azzopardi Backend Engineer
Member Darren Eastman Senior Product Manager, Gitlab-Runner
Member Kamil Trzciński Distinguished Engineer