PURPOSE: This page is focused on the operations of GitLab's internal Data Science Team. For information about GitLab's Product Data Science Capabilities, please visit GitLab ModelOps
The mission of the Data Science Team is to facilitate making better decisions faster using predictive analytics.
At GitLab we are Handbook First and promote this concept by ensuring the data science team page remains updated with the most accurate information regarding data science objectives, processes, and projects. We also strive to keep the handbook updated with useful resources and our data science toolset.
Check out this brief overview of what data science is at GitLab:
Of the Data Team's Responsibilities, the Data Science Team is directly responsible for:
Additionally, the Data Science Team supports the following responsibilities:
As a Center of Excellence, the data science team is focused on working collaboratively with other teams in the organization. This means our stakeholders and executive sponsors are usually in other parts of the business (e.g. Sales, Marketing). Working closely with these other teams, we craft a project plan that aligns to their business needs, objectives, and priorities. This usually involves working closely with functional analysts within those teams to understand the data, the insights from prior analyses, and implementation hurdles.
The Data Science flywheel is focused on improving business efficiency and KPIs by creating accurate and reliable predictions. This is done in collaboration with Functional Analytics Center of Excellence to ensure the most relevant datasources are utilized, business objectives are met, and results can be quantifiably measured. As business needs change, and as the user-base grows, this flywheel approach will allow the data science team to quickly adapt, iterate, and improve machine learning models.
|Propensity to Expand (PtE)||Optimized||Determine which paid accounts are likely to expand their ARR by > 10% in next 3 months; Identify uptier opportunities||Sales||Last update: FY23-Q3, Next update: FY23-Q4||Exec Summary, PtE Inspector, PtE Results Dashboard|
|Propensity to Contract (PtC)||Optimized||Determine which paid accounts are likely to reduce their ARR by > 10% or leave GitLab permanently in the next 6 months||Customer Success||Last update: FY23-Q2, Next update: FY23-Q4||PtC slide deck, PtC Inspector, PtC Results Dashboard|
|Namespace Segmentation||Viable||v1 - Define groups for paid and free SaaS namespaces based on its product usage and impact on conversions v2 - Defined groups for paid SaaS only||Growth||Last update: FY23-Q3||Namespace Segmentation slide deck, Namespace Segmentation Dashboard, Namespace Segmentation v2 slide deck, Namespace Segmentation v2 Dashboard|
|Propensity to Purchase* (PtP)||In Progress||Identify which free and trials accounts are likely to become paid accounts||Growth||Last Update: FY23-Q2, Next update: FY23-Q3||Tracking Epic , SaaS Trials Model Readout, SaaS Trials Results Dashboard|
|Adoption Index||In Progress||Define way to measure adoption and customer journey||TBD||Next update: FY23-Q4|
|Product Usage Event||Planned||-||TBD|
|Prospect/Lead Scoring||Planned||Identify leads most likely to convert to closed won opportunities||Marketing||FY23-Q4 / FY24-Q1|
|Golden Journey||Planned||Identify optimal paths to increasing platform usage and adoption||Growth||TBD|
|Expansion Predicted ARR||Planned||Predict expansion ARR dollar amount||Sales||TBD|
|Stage Adoption MRI||Planned||-||TBD|
|Community Sentiment Analysis||Unplanned||-||Product|
|Feature $ARR Uplift Prediction||Unplanned||Attribute incremental ARR lift based on feature adoption||Product|
|GitLab MLOps Product Development||Unplanned||-||Product|
*Propensity to Purchase is currently implented for: SaaS Trials. Propensity to Purchase is currently in the process of being created for: SaaS free accounts, self-managed trials, self-managed free accouts
For implimentation details and where to find model predictions/scores, please see the Propesnity Models Internal Handbook Page
Maturity of data science projects is similar to the GitLab product maturity model:
|Name||Current Sources||Additional Planned Sources|
|PtE||Product usage: SaaS & Self-Managed - paid tiers; Product stage usage: SaaS & Self-Managed - paid tiers; Salesforce (account, opportunities, events, tasks); Zuora (billing); Bizible (marketing); Firmographics; ZenDesk (help tickets)||Prior expansion type (product change, seat licenses), amount, and time laspe; buy personas attached to opportunities|
|PtC||Product usage: SaaS & Self-Managed - paid tiers; Product stage usage: SaaS & Self-Managed - paid tiers; Salesforce (account, opportunities, events, tasks); Zuora (billing); Bizible (marketing); ZenDesk (help tickets); Firmographics||# of answered emails, ratio sent/answered emails; account health fields; security score|
|Namespace Segmentation||Product usage: SaaS & Self Managed - free and paid tiers; Product stage usage: SaaS & Self Managed - free and paid tiers; Salesforce (account); Zuora (billing); Bizible (marketing)||# of consecutive days of product/stage usage|
|PtPT||Product usage: SaaS Only - free tiers; Product stage usage and adoption: SaaS Only - Free Tiers; Registration; Namespace metadata; User-level||Self-managed product usage data; buy personas|
The Data Science Team follows Cross-Industry standard process for data mining (CRISP-DM), which consists of 6 iterative phases:
The Data Science Team approach to model development is centered around GitLab's value of iteration and the CRISP-DM standard. Our process expands on some of the 6 phrase outlined in CRISP-DM in order to best address the needs of our specific business objectives and data infrastructure.
Our current platform consists of:
Over time we plan to dogfood as many components of the GitLab MLOps Stage as possible, leading to fully automated productionalized pipelines. However, the MLOps Stage is currently incubating and is not yet ready for our use. Our immediate next step is to automate
Current State Data Flows using a combination of python and airflow.
pip install gitlabds), or use as part of the above JupyterLab image.