Cells

This is the handbook page for the Cells project. Cells is one of the top priorities for FY2025, with the goal of providing additional scalability for GitLab.com. This handbook page contains the project information such as the project plan, roadmap, workstreams, DRIs, stakeholders, and communication channels. It also has links to important documentation such as the Cells design blueprints.

Intro

Cells is a new architecture for our software as a service platform. This architecture is horizontally scalable, resilient, and provides a more consistent user experience. It may also provide additional features in the future, such as data residency control (regions) and federated features.

For more information about the goals of Cells, see goals.

Requirements and Architecture

Cells overall architecture blueprint.

Roadmap, Workstreams, and DRIs

Roadmap

Cells 1.0

Cells 1.5

Cells 2.0

  • For internal customers only
  • Organizations are private
  • Users cannot interact with other Organizations (including GitLab Org)
  • Groups and projects are private in the Organization
  • For more details, see Organizations on Cells 1.0
  • For existing/new customers of GitLab.com
  • Organizations are private
  • Existing users can interact with private Organizations on Secondary Cells
  • Groups and projects are private in the Organization
  • For more details, see Organizations on Cells 1.5
  • Organizations are public or private
  • Users can interact with other Organizations
  • Groups and projects are private or public in the Organization
  • For more details, see Organizations on Cells 2.0

DRIs and Stakeholders

Role Responsibility

Sabrina Farmer

Executive Sponsor

Marin Jankovski

Senior Director of Engineering

Ethan Guo

Director, Infra Technical Program Management
  1. Develop project plan and drive schedule
  2. Inter-team connection, collaboration and communication
  3. Project management

Chun Du

Director of Engineering
  1. Liaison between project team and cross-functional engineering leaders
  2. Coordinating temporary staffing arrangements within the Data Stores stage

Nick Nguyen

Senior Engineering Manager
  1. Coordinating staffing and unblocking groups in Data Stores
  2. Drive cross-functional efforts in engineering
  3. Report on Data Stores progress and mitigate risks

Arturo Herrero

Tenant Scale Engineering Manager
  1. Status updates of Tenant Scale workstreams
  2. Mitigate risks
  3. Collaborate with Tenant Scale Product Manager on Organizations and Cells projects

Joshua Lambert

Director of Product Management
  1. Investment and staffing of Core Platform teams
  2. Liaison between project team and cross functional product managers and product leaders
  3. Escalation of product priorities competing with Cells
  4. Decision maker for supported and un-supported features for each iteration of Cells

Christina Lohr

Tenant Scale Product Manager
  1. Product definition, requirements, roadmap for Organization workstream within Tenant Scale
  2. Product definition, requirements, roadmap for Cells workstreams within Tenant Scale
  3. Point of contact to collaborate with product managers from other teams
  4. Investment and staffing of Tenant Scale

Workstreams

Work stream

Engineering DRI

PM DRI

TPM DRI

Application’s Cell readiness

Kamil Trzciński

Josh Lambert

Ethan Guo

Organization for Cells

Alex Pooley

Christina Lohr

Ethan Guo

Architecture

Kamil Trzciński

Josh Lambert

Ethan Guo

Cells Services (includes Router and Topology services)

Thong Kuah

Christina Lohr

Ethan Guo

Cell lifecycle automation and management

Steve Xuereb

Christina Lohr

Ethan Guo

Observability

Rachel Nienaber

Christina Lohr

Ethan Guo

Application Deployment

Dave Smith

Sam Wiskow

Ethan Guo

Production readiness

Chun Du

Josh Lambert

Ethan Guo

Operations

Rick Mar

Josh Lambert

Ethan Guo

Performance validation of Cells

Andy Hohenner

Christina Lohr

Ethan Guo

Program Planning and Tracking

All Cells 1.0 work is tracked under the Cells 1.0 Epic. We also have a planning spreadsheet that provides a high level program structure and timelines (for planning purpose only).

Cells 1.0 Milestones

  1. First Production Cell - Experiment
    • label: cells-1.0-milestone::Experiment
    • Production system with No customer data. We have an environment that covers testing needs of Test Platform and Development teams.
    • Entering criteria: A cell is brought up so that development teams and Infra teams have an environment to test their changes, Test platform team has a place to run different kind of tests, including E2E, automation test and etc.
    • Exit criteria: All the application feature gaps are filled, a Cell is provisioned using the cells lifecycle automation tools, and we run our existing E2E tests on Cells as part of our deployment pipeline
  2. First Production Cell - Beta
    • label: cells-1.0-milestone::Beta
    • We have a production instance that an internal or external customer can do functional and performance test on
    • Entering criteria: Exit criteria of Experiment milestone
    • Exit criteria: Customer discovered issues are addressed, we meet our GA requirements
  3. First Production Cell - General Availability
    • label: cells-1.0-milestone::GA
    • We have a production instance that is ready for internal or external customer’s production use
    • Entering Criteria: Exit criteria of Beta

Cells 1.0 Timeline

  • 2024-11-30: Start of Beta
  • 2025-01-31: GA

Cells 1.0 Development Phases

The listed phases will be applied for both Staging then at a later stage to Production, if not stated otherwise. We use the cells-1.0-milestone::Phase x labels to categorize issues by phase.

  1. Phase 1: Deploy router as a pass-through proxy for GitLab.com
  2. Phase 2: Deploy router as a pass-through proxy for registry.GitLab.com
    • Registry behind the WAF
    • Pass through proxy to Cell 1
  3. Phase 3: Routing via classification
    • Topology Service deployed with classification with Runway
    • mTLS between the router and topology service
    • Works with GDK and Cell 2 (QA) to unblock development/testing of certain workflows.
  4. Phase 4: Complete Cells Services
    1. Phase 4a: Add Claim Service
    2. Phase 4b: Enable Claim Service on Cell 1
    3. Phase 4c: Backfill of Claims
  5. Phase 5: Register existing GitLab.com as a Cell with Topology Service
    1. Phase 5a: Legacy infrastructure becomes a cell
    2. Phase 5b: Database Sequencing Service - Sequence claiming is enabled on Cell 1 (legacy GitLab.com)
  6. Phase 6: Cell 2 Ready (QA cell, no external customers)
    1. Phase 6a: Application Readiness
      • Basic functionality across Cells such as sign-up, project creation, running pipelines.
      • Enable organizations FF on Cell 2
      • Hook up Fulfillment/License
    2. Phase 6b: Continuous Deployment to Cell 2 (QA cell, no external customers)
      • Dedicated on GCP pre-GA
      • Able to run QA E2E tests across cells
      • Hook up data replication to Snowplow/Tableau
    3. Limitations
      • No automation
      • No internal and external customers
  7. Phase 7: Reconfigure Gitlab Shell to use Topology Service
  8. Phase 8: Production readiness
  9. Phase 9: Cell 3
    • Internal customers only
  10. Phase 10: Create an organization for a GitLab internal customer, for example Finance
    • Enable organization FF on Cell 3
    • Move the internal customer to Cell 3 with Direct Transfer

Work Estimation

We use t-shirt sizing to estimate the time and effort needed to deliver issues/epics. Sizes are not meant to be viewed as precise estimations or timeline commitments. Rather, these sizes help us identify risk areas and opportunities for cutting scope. Sizes map to the following definitions:

Size Time
Tiny 1-2 weeks
Small 1 month
Medium 3 months
Large 6 months
XXL > 6 months

Communication

Slack Channels

Meetings

Status updates

Additional Information

Cells Fast Boot 2024

We held a Cells Fast Boot in Dublin, Ireland, between 2024-04-23 and 2024-04-24. Below are the artifacts from the event.

Agenda, Slides, and Videos

Please use the Unfiltered Google account to watch video recordings.

  1. Main agenda (internal only)
  2. Introductions, overview, and logistics: Agenda (internal only)
  3. Cells Services - Global Service: Agenda (internal only), Slides (internal only), Video (internal only)
  4. Cells Services - Routing: Agenda (internal only), Slides (internal only), Video (internal only)
  5. Application Readiness - Organizations and Users: Agenda (internal only)
  6. Application Readiness - Dependencies and OKR alignments: Agenda (internal only)
  7. Deployment: Agenda (internal only), Slides (internal only), Video (internal only)
  8. Provisioning: Agenda (internal only)
  9. Observability and Runners: Agenda (internal only)
  10. Security: Agenda (internal only), Slides (internal only), Video (internal only)
  11. Disaster Recovery: Agenda (internal only), Slides (internal only), Video (internal only)
  12. Cells Mover and Isolation: Agenda (internal only)
  13. Scalability Headroom and Timeline: Agenda (internal only)

Decisions

  1. No external customers on Cells 1.0, internal dogfooding only. Cells 1.x is the target to onboard new or existing external customers.

Artifacts

  1. Day 1 recording: Part 1 (internal only), Part 2 (internal only)
  2. Day 2 recording (internal only)
  3. Database breakout recording (internal only)
  4. Organizations breakout recording (internal only)

Test Platform in Cells
Cells is a project that spans the entirety of GitLab. Instead of recreating feature testing done by the other teams, we will reuse and leverage what exists currently and supplement to fill in gaps. This approach has the following requirements: It must feed back useful information to the engineering teams in an efficient, non burdensome way It must provide good coverage so we have confidence to release It must be easy to add/enhance/change tests It works with our current process Strategy The testing strategy for Cells follows our practice of testing at the correct level.