Cells
Intro
Cells is a new architecture for our software as a service platform. This architecture is horizontally scalable, resilient, and provides a more consistent user experience. It may also provide additional features in the future, such as data residency control (regions) and federated features.
For more information about the goals of Cells, see goals.
Requirements and Architecture
Cells overall architecture blueprint.
Roadmap, Workstreams, and DRIs
Roadmap
|
|
|
DRIs and Stakeholders
Role | Responsibility | |
Executive Sponsor | ||
Senior Director of Engineering | ||
Director, Infra Technical Program Management |
|
|
Director of Engineering |
|
|
Senior Engineering Manager |
|
|
Tenant Scale Engineering Manager |
|
|
Director of Product Management |
|
|
Tenant Scale Product Manager |
|
Workstreams
Work stream |
Engineering DRI |
PM DRI |
TPM DRI |
Application’s Cell readiness | |||
Organization for Cells | |||
Architecture | |||
Cells Services (includes Router and Topology services) | |||
Cell lifecycle automation and management | |||
Observability | |||
Application Deployment | |||
Production readiness | |||
Operations | |||
Performance validation of Cells |
Program Planning and Tracking
All Cells 1.0 work is tracked under the Cells 1.0 Epic. We also have a planning spreadsheet that provides a high level program structure and timelines (for planning purpose only).
Reference links
- Cells 1.0 Epic
- Cells 1.0 Status Dashboard
- Sharding Key migration work for Cells
- Database schema migration
Cells 1.0 Milestones
- First Production Cell - Experiment
- label: cells-1.0-milestone::Experiment
- Production system with No customer data. We have an environment that covers testing needs of Test Platform and Development teams.
- Entering criteria: A cell is brought up so that development teams and Infra teams have an environment to test their changes, Test platform team has a place to run different kind of tests, including E2E, automation test and etc.
- Exit criteria: All the application feature gaps are filled, a Cell is provisioned using the cells lifecycle automation tools, and we run our existing E2E tests on Cells as part of our deployment pipeline
- First Production Cell - Beta
- label: cells-1.0-milestone::Beta
- We have a production instance that an internal or external customer can do functional and performance test on
- Entering criteria: Exit criteria of Experiment milestone
- Exit criteria: Customer discovered issues are addressed, we meet our GA requirements
- First Production Cell - General Availability
- label: cells-1.0-milestone::GA
- We have a production instance that is ready for internal or external customer’s production use
- Entering Criteria: Exit criteria of Beta
Cells 1.0 Timeline
- 2024-11-30: Start of Beta
- 2025-01-31: GA
Cells 1.0 Development Phases
The listed phases will be applied for both Staging then at a later stage to Production, if not stated otherwise. We use the cells-1.0-milestone::Phase x
labels to categorize issues by phase.
- Phase 1: Deploy router as a pass-through proxy for GitLab.com
- Phase 2: Deploy router as a pass-through proxy for registry.GitLab.com
- Registry behind the WAF
- Pass through proxy to Cell 1
- Phase 3: Routing via classification
- Topology Service deployed with classification with Runway
- mTLS between the router and topology service
- Works with GDK and Cell 2 (QA) to unblock development/testing of certain workflows.
- Phase 4: Complete Cells Services
- Phase 4a: Add Claim Service
- Phase 4b: Enable Claim Service on Cell 1
- Phase 4c: Backfill of Claims
- Phase 5: Register existing GitLab.com as a Cell with Topology Service
- Phase 5a: Legacy infrastructure becomes a cell
- Phase 5b: Database Sequencing Service - Sequence claiming is enabled on Cell 1 (legacy GitLab.com)
- Phase 6: Cell 2 Ready (QA cell, no external customers)
- Phase 6a: Application Readiness
- Basic functionality across Cells such as sign-up, project creation, running pipelines.
- Enable organizations FF on Cell 2
- Hook up Fulfillment/License
- Phase 6b: Continuous Deployment to Cell 2 (QA cell, no external customers)
- Dedicated on GCP pre-GA
- Able to run QA E2E tests across cells
- Hook up data replication to Snowplow/Tableau
- Limitations
- No automation
- No internal and external customers
- Phase 6a: Application Readiness
- Phase 7: Reconfigure Gitlab Shell to use Topology Service
- Phase 8: Production readiness
- Phase 9: Cell 3
- Internal customers only
- Phase 10: Create an organization for a GitLab internal customer, for example Finance
- Enable organization FF on Cell 3
- Move the internal customer to Cell 3 with Direct Transfer
Work Estimation
We use t-shirt sizing to estimate the time and effort needed to deliver issues/epics. Sizes are not meant to be viewed as precise estimations or timeline commitments. Rather, these sizes help us identify risk areas and opportunities for cutting scope. Sizes map to the following definitions:
Size | Time |
---|---|
Tiny | 1-2 weeks |
Small | 1 month |
Medium | 3 months |
Large | 6 months |
XXL | > 6 months |
Communication
Slack Channels
- #f_cells_and_organizations (internal only): Regular communication
- #cto (internal only): Weekly program status update
Meetings
- Fortnightly Cells X-Functional Sync (Meeting notes (internal only))
- Monthly Infrastructure Cells program review (Meeting notes (internal only))
- Quarterly Cells program review (TBD)
Status updates
- Weekly “Cells & Organizations Status Update - [yyyy-mm-dd]” issues in this project
- Weekly status updates in Slack #cto channel (internal only) channel
Additional Information
Cells Fast Boot 2024
We held a Cells Fast Boot in Dublin, Ireland, between 2024-04-23 and 2024-04-24. Below are the artifacts from the event.
Agenda, Slides, and Videos
Please use the Unfiltered
Google account to watch video recordings.
- Main agenda (internal only)
- Introductions, overview, and logistics: Agenda (internal only)
- Cells Services - Global Service: Agenda (internal only), Slides (internal only), Video (internal only)
- Cells Services - Routing: Agenda (internal only), Slides (internal only), Video (internal only)
- Application Readiness - Organizations and Users: Agenda (internal only)
- Application Readiness - Dependencies and OKR alignments: Agenda (internal only)
- Deployment: Agenda (internal only), Slides (internal only), Video (internal only)
- Provisioning: Agenda (internal only)
- Observability and Runners: Agenda (internal only)
- Security: Agenda (internal only), Slides (internal only), Video (internal only)
- Disaster Recovery: Agenda (internal only), Slides (internal only), Video (internal only)
- Cells Mover and Isolation: Agenda (internal only)
- Scalability Headroom and Timeline: Agenda (internal only)
Decisions
- No external customers on Cells 1.0, internal dogfooding only. Cells 1.x is the target to onboard new or existing external customers.
Artifacts
- Day 1 recording: Part 1 (internal only), Part 2 (internal only)
- Day 2 recording (internal only)
- Database breakout recording (internal only)
- Organizations breakout recording (internal only)
a3fc67ec
)