This page contains forward-looking content and may not accurately reflect current-state or planned feature sets or capabilities.
How did we get here? The Data Development Timeline page provides coverage of the Data Team's accomplishments and the path we have taken to create today's team, technology platform, and programs.
In November-2021 we held several cross-team sessions to help align on the GitLab data strategy for FY23. Parcipants included Growth, Finance, Marketing, Sales Stragy, and Customer Success.
As an important step towards achieving our mission, meeting our responsibilities, and helping GitLab become a successful public company, we are creating an Enterprise Data Platform (EDP), a single unified data and analytics stack, along with a broad suite of Data Programs such as Self-Serve Data and Data Quality. The EDP will power GitLab's KPIs, cross-functional reporting and analysis, and in general, allow all team members to make better decisions with trusted data. Over time, the EDP will further accelerate GitLab's analytics capabilities with features such as data publishing and products - enriched and aggregated data integrated into business systems or into the GitLab product for use by our customers. This acceleration happens through the development of "Data Flywheels", much like GitLab's Open Core and Development Spend flywheels.
The Customer & Product Intelligence Flywheel is focused on improving the Customer Experience and encompasses the data and analytics involved in user-product interactions, customer use cases, product development, product adoption, and most aspects of the Customer Journey.
The Corporate Intelligence Flywheel is focused on improving (internal) Business Efficiency and this is accomplished by instrumenting, monitoring, and improving business workflows. Common outputs of Corporate Intelligence teams include performance dashboards, balanced scorecards, KPIs, MBOs, and related data-enabled frameworks.
Measured in Years, our long-term direction is to extend the EDP with features found in a mature Enterprise Data Platform such as master data management, a data lake, and advanced analytics. Also, once we have reached Level 2, we:
We will measure progress towards our short-term direction in the following ways:
We have not yet defined criteria for measuring long-term progress.
All-TimeNumber of Self-Service Data Customers Enabled
MonthlyNumber of active Self-Service Dashboard Developers
MonthlyNumber of active Self-Service SQL Developers
Monthly% of Dashboard Traffic From User Generated Content
The following table represents capabilities of a mature Enterprise Data Platform which can solve for the wide range of data and analytics needed by a large business. Not all capabilities listed are required to meet GitLab's short-term needs or known long-term needs. The decision to implement a given capability will be driven by a clear business need and the final result may differ significantly from the reference example.
|Data Architecture||Data Security||Data Quality|
|Operational Data Store||Data Warehouse||Data Lake|
|Data Model Standards||Enterprise Dimensional Model||Data Marts|
|Reference Data Management||Data Enrichment||Master Data Management|
|Data Pipeline||Data Transformation||Real-Time Data|
|Data Exports||Data Publishing||Data Products|
|Data Taxonomy||Data Catalog||Data Portal|
The following sections describe the Data Platform FY23 initiatives.
Data is landed from different source systems in the
raw data layer and processed/transformed in the
prod before it becomes available to business users via Sisense, data pumps, queryable in Snowflake and other ways. All transformations are performed by dbt. All the data that is in
raw changes over time, because data is changed in the source systems and therefore also needs to be processed downstream towards the
Currently there are about 30 source systems extracted:
Currently there is monitoring available to check failures in the process, from extracting until making it available for the different end points. This is done via our Trusted Data Framework with defined tests in dbt and monitored in our triage process.
Data observability is a methodology to actively monitor data sets inside a data platform for the existing health status. When the data is healthy, data is trusted and can be used in the decision making process, without facing the risk of making a decision on the wrong information.
Currently the Data Team is looking beyond our current technologies, to see if there is tooling available that can help us in this process.
In FY23, the Data Team will continue to take next steps by exploring new technologies to support the data team in observing the data and finding any anomalies in the data for business users.
We want to help all GitLab teams move up (or left-to-right in the diagram below) the Data Value Pyramid and turn basic metrics and counts into wisdom that helps them create better products for our customers, run our business more efficiently, and add new capabilities to our business model. Relative to the Data Value Pyramid, we are currently working primarily within the Data and Information stages.
The Data Capability Model is used to identify target state requirements to support GitLab's Strategy.
To help GitLab become a public company, we need our lead-to-cash and public-facing metrics to reach Level 2 of the capability model.
|(5)Prescriptive||Real-time complex analysis embedded in products, shape actions and perceptions; Data analytics is a strategic differentiator||New Data Products, Improved Decision ROI|
|(4)Predictive||Data Science” Insight into what is likely to happen, Widespread and effortless analytics production, Enterprise Data Quality and Governance||Reliable Customer Lifetime Value, Expansion & Churn Prediction, Product Embedded Analytics|
|(3)Strategic||Widespread & effortless drillable analysis, Drillable cross-functional scorecards, dashboards, Enterprise Data Warehouse||Customer 360 & Health Score, Predictable & Trusted Data Reporting, Robust Self-Service & Data @ Scale|
|(2)Advanced: Reference Solution||Operational Automated Reports and Dashboards, Reliable and validated data with automated tests, Mixture of manual and automated integration, core integrated data with some Data silos||Trusted Data, Self-Service Data, Key Performance Indicators, Stable platform for expansion|
|(1)Reactive||Static lists and reports, Highly focused on history/lagging - last 30/90/365 days, Unpredictable velocity, minimal cross-functional analysis, Data Silos||Historical Tabular Reports, Data Visualization|
|(0)None||Inconsistent report generation, Results not widely trusted, No stable analytics infrastructure|