At GitLab, we collect product usage data for the purpose of helping us build a better product. Data helps GitLab understand which parts of the product need improvement and which features we should build next. Product usage data also helps our team better understand the reasons why people use GitLab. With this knowledge we are able to make better product decisions.
There are several steps involved to go from collecting data to making it useful for our internal teams and customers.
We work closely with other internal teams in each step in the Product Analytics Process. The teams involved at each step are:
|Theme||Step||Description||Team Responsible||Teams Involved|
|Data Collection||Collection Framework||Outlines our available data collection tools.||Product Analytics|
|Data Collection||Event Dictionary||The single source of truth defining all product metrics and events.||Product Analytics||Product Managers|
|Data Collection||Instrumentation||Instrumentation of feature tracking done by each product and engineering team.||Product Managers||Product Analytics|
|Data Collection||Release Cycle||The release of GitLab code. We have daily releases for SaaS and monthly releases for self-managed.||Product Managers||Product Analytics|
|Data Collection||Product Usage||Product usage of GitLab generating tracking events.||Product Managers||Product Analytics|
|Data Collection||Usage Ping Generation||A weekly job that aggregates and sends product usage data to GitLab.||Product Analytics|
|Processing Pipeline||Snowplow Collector||The collection of Snowplow data.||Product Analytics||Data, Infrastructure|
|Processing Pipeline||Snowplow Enricher||The enrichment of Snowplow data.||Product Analytics||Data, Infrastructure|
|Processing Pipeline||Usage Ping Collector||The collection of Usage Ping data.||Product Analytics||Data|
|Processing Pipeline||Usage Ping Processor||The processing of Usage Ping data.||Product Analytics||Data|
|Processing Pipeline||Extractors||The extraction of Usage Ping and Snowplow data sources.||Data||Product Analytics, Infrastructure|
|Processing Pipeline||Loaders||The loading of Usage Ping and Snowplow data into the data warehouse.||Data||Product Analytics, Infrastructure|
|Processing Pipeline||Snowflake Enterprise Data Warehouse||Our enterprise data warehouse where our organization's data is kept.||Data||Product Analytics, Infrastructure|
|Processing Pipeline||dbt Base Data Models||General ETL, models, and visualizations||Data||Product Analytics|
|Processing Pipeline||Product Data Models||Product specific ETL, models, and visualizations||Product Analytics||Data|
|Processing Pipeline||Engineering Data Models||Engineering specific ETL, models, and visualizations||Data||Engineering|
|Processing Pipeline||Sales Data Models||Sales specific ETL, models, and visualizations||Data||Sales|
|Processing Pipeline||Customer Success Data Models||Customer Success specific ETL, models, and visualizations||Data||Customer Success|
|Processing Pipeline||Marketing Data Models||Marketing specific ETL, models, and visualizations||Data||Marketing|
|Processing Pipeline||People Data Models||People specific ETL, models, and visualizations||Data||People|
|Processing Pipeline||Finance Data Models||Finance specific ETL, models, and visualizations||Data||Finance|
|Processing Pipeline||Enterprise Dimensional Models||The single source of truth for GitLab data, spanning corporate performance and customer journey analytics.||Data||Product Analytics|
|Enable Product||Data Triage||The triaging of inbound product analytics requests.||Product Analysts||Product Managers, Data|
|Enable Product||Sisense Dashboards||Sisense dashboards for product managers.||Product Managers||Product Analytics, Data|
|Enable Product||Certified Sisense Dashboards||Sisense dashboards containing the SSOT for product performance, supported by the Enterprise Dimensional Model.||Product Analysts, Data||Product Managers, Product Analytics|
|Enable Product||Product KPI Dashboards||Sisense dashboards containing established KPIs for Product.||Product Analysts||Product Managers, Data|
|Enable Product||Product Performance Indicators||The product metrics GitLab's product team pays attention to.||Product Managers||Product Analytics, Data|
|Enable Product||Metrics Reviews||Monthly reviews of GitLab's product metrics.||Product Managers||Product Analytics, Data|
|Enable Product||Product Improvements||Product improvements based on insights from product usage data.||Product Managers|
|Enable Sales / CS||Snowflake EDW to Salesforce Data Pump||Data feed of product usage data into Salesforce from EDW.||Data||Sales, Customer Success, Product Analytics|
|Enable Sales / CS||Salesforce Dashboards||Dashboard embeded into each Salesforce Customer showing product usage.||Sales||Product Analytics, Data|
|Enable Sales / CS||Salesforce to Gainsight Data Feed||Data feed from Salesforce to Gainsight.||Customer Success|
|Enable Sales / CS||Gainsight Dashboards||Dashboard embeded into each Gainsight Customer showing product usage.||Customer Success||Product Analytics, Data|
|Enable Sales / CS||Customer Conversations||Customer conversrations using product usage data.||Sales, Customer Success|
|Full Funnel Analytics||All Data Feeds to Snowflake EDW||ETL from source into EDW/Snowflake in a Secure and Compliant RAW landing zone.||Data||Multiple|
|Full Funnel Analytics||Product Data||Product data sources such as GitLab.com Postgres, Usage Ping data, and Snowplow data.||Product Analytics||Data, Enterprise Applications|
|Full Funnel Analytics||License and Subscription Data||License and subscription data sources such as Customers app, License app, and Zuora data.||Fulfillment||Product Analytics, Data, Enterprise Applications|
|Full Funnel Analytics||Sales Data||Sales data sources such as Salesforce.||Sales||Product Analytics, Data, Enterprise Applications|
|Full Funnel Analytics||Customer Success Data||Customer Success data sources such as Gainsight.||Customer Success||Product Analytics, Data, Enterprise Applications|
|Full Funnel Analytics||Marketing Data||Marketing data sources such as Google Analytics and Marketo.||Marketing||Product Analytics, Data, Enterprise Applications|
|Full Funnel Analytics||Full Funnel Integration||The integration of all Product, License and Subscription, Sales, Customer Success, and Marketing Data.||Enterprise Applications||Marketing, Fulfillment, Sales, Customer Success, Product Analytics, Data|
|Full Funnel Analytics||Full Funnel Analysis||The analysis of all Product, License and Subscription, Sales, Customer Success, and Marketing Data.||Data||Marketing, Fulfillment, Sales, Customer Success, Product Analytics|
|Full Funnel Analytics||Corporate Performance Analysis||Cross-functional dashboards, balanced scorecards, and related metrics for measuring corporate lead-to-cash performance.||Data||Marketing, Fulfillment, Sales, Customer Success, Product Analytics|
|Full Funnel Analytics||Customer Journey Analytics||Cross-functional dashboards, balanced scorecards, and related metrics for measuring the customer product experience.||Data||Marketing, Fulfillment, Sales, Customer Success, Product Analytics|
|Enhance Value Stream Management||Value Stream Management Data Feed||A data feed for passing benchmark product usage data back to customers.||Product Analytics||Value Stream Management|
|Enhance Value Stream Management||Value Stream Management Features||Features to show customers how they're using GitLab.||Value Stream Management||Product Analytics|
The systems overview is a simplified diagram showing the interactions between GitLab Inc and self-managed instances.
For Product Analytics purposes, GitLab Inc has three major components:
For Product Analytics purposes, self-managed instances have two major components:
As shown by the orange lines, on GitLab.com Snowplow JS, Snowplow Ruby, Usage Ping, and PostgreSQL database imports all flow into GitLab Inc's data infrastructure. However, on self-managed, only Usage Ping flows into GitLab Inc's data infrastructure.
As shown by the green lines, on GitLab.com system logs flow into GitLab Inc's monitoring infrastructure. On self-managed, there are no logs sent to GitLab Inc's monitoring infrastructure.
Note (1): Snowplow JS and Snowplow Ruby are available on self-managed, however, the Snowplow Collector endpoint is set to a self-managed Snowplow Collector which GitLab Inc does not have access to.
✅ Available, 🔄 In Progress, 📅 Planned, ✖️ Not Planned
We use three methods to gather product usage data:
Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way users engage with our website and application.
Snowplow consists of two components:
For more details, read the Snowplow guide.
Usage Ping is a method for GitLab Inc to collect usage data on a GitLab instance. Usage Ping is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product. This high-level data is used to help our product, support, and sales teams.
For more details, read the Usage Ping guide.
Database imports are full imports of data into GitLab's data warehouse. For GitLab.com, the PostgreSQL database is loaded into Snowflake data warehouse every 6 hours. For more details, see the data team handbook.
UI events are any interface-driven actions from the browser including click data.
These are backend events that include the creation, read, update, deletion of records, and other events that might be triggered from layers other than those available in the interface.
These are raw database records which can be explored using business intelligence tools like Sisense. The full list of available tables can be found in structure.sql.
These are settings of your instance such as the instance's Git version and if certain features are enabled such as
These are integrations your GitLab instance interacts with such as an external storage provider or an external container registry. These services must be able to send data back into a GitLab instance for data to be tracked.
Our reporting levels of aggregate or individual reporting varies by segment. For example, on Self-Managed Users, we can report at an aggregate user level using Usage Ping but not on an Individual user level.
Our reporting time periods varies by segment. For example, on Self-Managed Users, we can report all time counts and 28 day counts in Usage Ping.
Note: We've temporarily moved the Event Dictionary to a Google Sheet. The previous Markdown table exceeded 500+ rows making it difficult to manage. In the future, our intention is to move this back into our docs using a YAML file.
The event dictionary is a single source of truth for the metrics and events we collect for product usage data. The Event Dictionary lists all the metrics and events we track, why we're tracking them, and where they are tracked.
This is a living document that is updated any time a new event is planned or implemented. It includes the following information.
We're currently focusing our Event Dictionary on Usage Ping. In the future, we will also include Snowplow. We currently have an initiative across the entire product organization to complete the Event Dictionary for Usage Ping.
For future metrics and events you plan to track, please add them to the Event Dictionary and note the status as
In Progress, or
Implemented. Once you have confirmed the metric has been implemented and have confirmed the metric data is in our data warehouse, change the status to Data Available.
We've recently had a large push across the product organization to become more data driven. Part of this push includes getting product metrics in place for each product section, stage, and group. In FY21-Q3 OKRs, we setup a couple of OKRs to help us accomplish this:
To accomplish these OKRs, we setup a seven step process to implement product metrics. This process was originally presented in the Weekly Product Meeting on August 11, 2020 (slide deck and video presentation) and has been refined over time.
|Implementation Status||Description||Responsibility||Exit Criteria|
|Definition||The definition step outlines the process for deciding which product metrics to track.||PM Responsibility, Product Analytics Support||Metric is defined in the Event Dictionary and in the Performance Indicator file with the future
|Instrumentation||The instrumentation step outlines how each product team implements data collection.||PM Responsibility, Product Analytics Support||Instrumentation is completed and feature flags are turned off so that data can be collected|
|Data Availability||The data availability step outlines the timing of a product release to receiving product usage data in the data warehouse.||PM Responsibility, Product Analytics Support||PM confirms that the
|Dashboard||The dashboarding step outlines how Sisense dashboards are built.||PM Responsibility, Product Analytics Support||There is a chart in Sisense.|
|Handbook||The handbook PI page describes how product performance indicators are added for each product section, stage, and group.||PM Responsibility, Product Analytics Support||Chart is embedded into the handbook. Target has been assigned based off the data.|
|Target||The target definition step outlines how targets are defined for each performance indicator.||PM Responsibility, Product Analytics Support||The target value is in both the chart and in the Performance Indicator (PI) file.|
|Complete||All of the prior steps have been completed.||PM Responsibility, Product Analytics Support||:tada:|
Determine what metrics are important for your specific section, stage, or group.
Work with your engineering team to instrument tracking for your XMAU. Focus on using Usage Ping as your metrics will be available on both SaaS and self-managed.
The Usage Ping Guide outlines the steps required for instrumentation. It includes:
Plan instrumentation with sufficient lead time for data availability. Ensure your metrics make it into the self-managed release as early as possible.
Dashboard the metric. This is done by creating a Sisense dashboard. Avoid cumulative views and instead focus on month-over-month growth. Instructions for creating dashboards are here.
We need PMs to self-serve their own dashboards as data team capacity is limited. The data team will be focused on enabling self-service, advising PMs, and working on the more challenging XMAU dashboards.
To learn how to create your own dashboard, see Data For Product Managers: Creating Charts
For GMAU and SMAU data issues:
XMAUlabel to the data issues.
For non-GMAU and non-SMAU data issues:
We need all PMs to ensure their PIs are showing on the performance indicator pages. Based off What we're aiming for
To do so, we need a clear way to communicate to PMs exactly which PIs are remaining. We will be adding placeholder PIs for each section into the performance indicator file so that all required PIs show in the handbook. Once a PI is implemented, the actual PI will replace the placeholder PI.For more information about how PIs and XMAUs are related to one another, see PI Structure.
As a product organization, we need to get into the habit of understanding our baselines and setting targets for each stage & group. For the PI Target step, you will work with your Section or Group Leader to define targets for each of your XMAUs.
Set a growth target, and embed in the tracking dashboard. Growth targets should be ambitious but achievable.
All of the prior steps have been completed and a PI is successfully implemented.
|Product Analytics Guide||A guide to Product Analytics|
|Usage Ping Guide||An implementation guide for Usage Ping|
|Snowplow Guide||An implementation guide for Snowplow|
|Event Dictionary||A SSoT for all collected metrics and events|
|Implementing Product Performance Indicators||The workflow for putting product performance indicators in place|
|Product Analytics Direction||The roadmap for Product Analytics at GitLab|
|Product Analytics Development Process||The development process for the Product Analytics groups|