At GitLab, we collect product usage data for the purpose of helping us build a better product. Data helps GitLab understand which parts of the product need improvement and which features we should build next. Product usage data also helps our team better understand the reasons why people use GitLab. With this knowledge we are able to make better product decisions.
There are several steps involved to go from collecting data to making it useful for our internal teams and customers.
We work closely with other internal teams in each step in the Product Intelligence Process. The teams involved at each step are:
Theme | Step | Description | Team Responsible | Teams Involved |
---|---|---|---|---|
Privacy Policy and Settings | Privacy Policy | Outlines what product usage data we collect from our users. | Legal | Product Intelligence, Data |
Privacy Policy and Settings | Data Classification Policy | Outlines our data classification levels based on sensitivity. | Security | Legal, Product Intelligence, Data |
Privacy Policy and Settings | Privacy Settings | In app settings for users to control what data they share with GitLab. | Product Intelligence | Legal, Data |
Data Collection | Collection Framework | Outlines our available data collection tools. | Product Intelligence | |
Data Collection | Event Dictionary | The single source of truth defining all product metrics and events. | Product Intelligence | Product Managers |
Data Collection | Instrumentation | Instrumentation of feature tracking done by each product and engineering team. | Product Managers | Product Intelligence |
Data Collection | Release Cycle | The release of GitLab code. We have daily releases for SaaS and monthly releases for self-managed. | Product Managers | Product Intelligence |
Data Collection | Product Usage | Product usage of GitLab generating tracking events. | Product Managers | Product Intelligence |
Data Collection | Usage Ping Generation | A weekly job that aggregates and sends product usage data to GitLab. | Product Intelligence | |
Processing Pipeline | Snowplow Collector | The collection of Snowplow data. | Product Intelligence | Data, Infrastructure |
Processing Pipeline | Snowplow Enricher | The enrichment of Snowplow data. | Product Intelligence | Data, Infrastructure |
Processing Pipeline | Usage Ping Collector | The collection of Usage Ping data. | Product Intelligence | Data |
Processing Pipeline | Usage Ping Processor | The processing of Usage Ping data. | Product Intelligence | Data |
Processing Pipeline | Extractors | The extraction of Usage Ping and Snowplow data sources. | Data | Product Intelligence, Infrastructure |
Processing Pipeline | Loaders | The loading of Usage Ping and Snowplow data into the data warehouse. | Data | Product Intelligence, Infrastructure |
Processing Pipeline | Snowflake Enterprise Data Warehouse | Our enterprise data warehouse where our organization's data is kept. | Data | Product Intelligence, Infrastructure |
Processing Pipeline | dbt Base Data Models | General ETL, models, and visualizations | Data | Product Intelligence |
Processing Pipeline | Product Data Models | Product specific ETL, models, and visualizations | Product Intelligence | Data |
Processing Pipeline | Engineering Data Models | Engineering specific ETL, models, and visualizations | Data | Engineering |
Processing Pipeline | Sales Data Models | Sales specific ETL, models, and visualizations | Data | Sales |
Processing Pipeline | Customer Success Data Models | Customer Success specific ETL, models, and visualizations | Data | Customer Success |
Processing Pipeline | Marketing Data Models | Marketing specific ETL, models, and visualizations | Data | Marketing |
Processing Pipeline | People Data Models | People specific ETL, models, and visualizations | Data | People |
Processing Pipeline | Finance Data Models | Finance specific ETL, models, and visualizations | Data | Finance |
Processing Pipeline | Enterprise Dimensional Models | The single source of truth for GitLab data, spanning corporate performance and customer journey analytics. | Data | Product Intelligence |
Enable Product | Data Triage | The triaging of inbound Product Intelligence requests. | Product Analysts | Product Managers, Data |
Enable Product | Sisense Dashboards | Sisense dashboards for product managers. | Product Managers | Product Intelligence, Data |
Enable Product | Certified Sisense Dashboards | Sisense dashboards containing the SSOT for product performance, supported by the Enterprise Dimensional Model. | Product Analysts, Data | Product Managers, Product Intelligence |
Enable Product | Product KPI Dashboards | Sisense dashboards containing established KPIs for Product. | Product Analysts | Product Managers, Data |
Enable Product | Product Performance Indicators | The product metrics GitLab's product team pays attention to. | Product Managers | Product Intelligence, Data |
Enable Product | Metrics Reviews | Monthly reviews of GitLab's product metrics. | Product Managers | Product Intelligence, Data |
Enable Product | Product Improvements | Product improvements based on insights from product usage data. | Product Managers | |
Enable Sales / CS | Snowflake EDW to Salesforce Data Pump | Data feed of product usage data into Salesforce from EDW. | Data | Sales, Customer Success, Product Intelligence |
Enable Sales / CS | Salesforce Dashboards | Dashboard embeded into each Salesforce Customer showing product usage. | Sales | Product Intelligence, Data |
Enable Sales / CS | Salesforce to Gainsight Data Feed | Data feed from Salesforce to Gainsight. | Customer Success | |
Enable Sales / CS | Gainsight Dashboards | Dashboard embeded into each Gainsight Customer showing product usage. | Customer Success | Product Intelligence, Data |
Enable Sales / CS | Customer Conversations | Customer conversrations using product usage data. | Sales, Customer Success | |
Full Funnel Analytics | All Data Feeds to Snowflake EDW | ETL from source into EDW/Snowflake in a Secure and Compliant RAW landing zone. | Data | Multiple |
Full Funnel Analytics | Product Data | Product data sources such as GitLab.com Postgres, Usage Ping data, and Snowplow data. | Product Intelligence | Data, Enterprise Applications |
Full Funnel Analytics | License and Subscription Data | License and subscription data sources such as Customers app, License app, and Zuora data. | Fulfillment | Product Intelligence, Data, Enterprise Applications |
Full Funnel Analytics | Sales Data | Sales data sources such as Salesforce. | Sales | Product Intelligence, Data, Enterprise Applications |
Full Funnel Analytics | Customer Success Data | Customer Success data sources such as Gainsight. | Customer Success | Product Intelligence, Data, Enterprise Applications |
Full Funnel Analytics | Marketing Data | Marketing data sources such as Google Analytics and Marketo. | Marketing | Product Intelligence, Data, Enterprise Applications |
Full Funnel Analytics | Full Funnel Integration | The integration of all Product, License and Subscription, Sales, Customer Success, and Marketing Data. | Enterprise Applications | Marketing, Fulfillment, Sales, Customer Success, Product Intelligence, Data |
Full Funnel Analytics | Full Funnel Analysis | The analysis of all Product, License and Subscription, Sales, Customer Success, and Marketing Data. | Data | Marketing, Fulfillment, Sales, Customer Success, Product Intelligence |
Full Funnel Analytics | Corporate Performance Analysis | Cross-functional dashboards, balanced scorecards, and related metrics for measuring corporate lead-to-cash performance. | Data | Marketing, Fulfillment, Sales, Customer Success, Product Intelligence |
Full Funnel Analytics | Customer Journey Analytics | Cross-functional dashboards, balanced scorecards, and related metrics for measuring the customer product experience. | Data | Marketing, Fulfillment, Sales, Customer Success, Product Intelligence |
Enhance Value Stream Management | Value Stream Management Data Feed | A data feed for passing benchmark product usage data back to customers. | Product Intelligence | Value Stream Management |
Enhance Value Stream Management | Value Stream Management Features | Features to show customers how they're using GitLab. | Value Stream Management | Product Intelligence |
The systems overview is a simplified diagram showing the interactions between GitLab Inc and self-managed instances.
For Product Intelligence purposes, GitLab Inc has three major components:
For Product Intelligence purposes, self-managed instances have two major components:
As shown by the orange lines, on GitLab.com Snowplow JS, Snowplow Ruby, Usage Ping, and PostgreSQL database imports all flow into GitLab Inc's data infrastructure. However, on self-managed, only Usage Ping flows into GitLab Inc's data infrastructure.
As shown by the green lines, on GitLab.com system logs flow into GitLab Inc's monitoring infrastructure. On self-managed, there are no logs sent to GitLab Inc's monitoring infrastructure.
Note (1): Snowplow JS and Snowplow Ruby are available on self-managed, however, the Snowplow Collector endpoint is set to a self-managed Snowplow Collector which GitLab Inc does not have access to.
✅ Available, 🔄 In Progress, 📅 Planned, ✖️ Not Planned
We use three methods to gather product usage data:
Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way users engage with our website and application.
Snowplow consists of two components:
For more details, read the Snowplow guide.
Usage Ping is a method for GitLab Inc to collect usage data on a GitLab instance. Usage Ping is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product. This high-level data is used to help our product, support, and sales teams.
For more details, read the Usage Ping guide.
Database imports are full imports of data into GitLab's data warehouse. For GitLab.com, the PostgreSQL database is loaded into Snowflake data warehouse every 6 hours. For more details, see the data team handbook.
UI events are any interface-driven actions from the browser including click data.
These are backend events that include the creation, read, update, deletion of records, and other events that might be triggered from layers other than those available in the interface.
These are raw database records which can be explored using business intelligence tools like Sisense. The full list of available tables can be found in structure.sql.
These are settings of your instance such as the instance's Git version and if certain features are enabled such as container_registry_enabled
.
These are integrations your GitLab instance interacts with such as an external storage provider or an external container registry. These services must be able to send data back into a GitLab instance for data to be tracked.
Our reporting levels of aggregate or individual reporting varies by segment. For example, on Self-Managed Users, we can report at an aggregate user level using Usage Ping but not on an Individual user level.
Our reporting time periods varies by segment. For example, on Self-Managed Users, we can report all time counts and 28 day counts in Usage Ping.
Note: We've temporarily moved the Event Dictionary to a Google Sheet. The previous Markdown table exceeded 500+ rows making it difficult to manage. In the future, our intention is to move this back into our docs using a YAML file.
The event dictionary is a single source of truth for the metrics and events we collect for product usage data. The Event Dictionary lists all the metrics and events we track, why we're tracking them, and where they are tracked.
This is a living document that is updated any time a new event is planned or implemented. It includes the following information.
We're currently focusing our Event Dictionary on Usage Ping. In the future, we will also include Snowplow. We currently have an initiative across the entire product organization to complete the Event Dictionary for Usage Ping.
For future metrics and events you plan to track, please add them to the Event Dictionary and note the status as Planned
, In Progress
, or Implemented
. Once you have confirmed the metric has been implemented and have confirmed the metric data is in our data warehouse, change the status to Data Available.
We've recently had a large push across the product organization to become more data driven. Part of this push includes getting product metrics in place for each product section, stage, and group. In FY21-Q3 OKRs, we setup a couple of OKRs to help us accomplish this:
To accomplish these OKRs, we setup a seven step process to implement product metrics. This process was originally presented in the Weekly Product Meeting on August 11, 2020 (slide deck and video presentation) and has been refined over time.
Implementation Status | Description | Responsibility | Exit Criteria |
---|---|---|---|
Definition | The definition step outlines the process for deciding which product metrics to track. | PM Responsibility, Product Intelligence Support | Metric is defined in the Event Dictionary and in the Performance Indicator file with the future metric_name . Issue for instrumentation is completed and scheduled for the current release. |
Instrumentation | The instrumentation step outlines how each product team implements data collection. | PM Responsibility, Product Intelligence Support | Instrumentation is completed and feature flags are turned off so that data can be collected |
Data Availability | The data availability step outlines the timing of a product release to receiving product usage data in the data warehouse. | PM Responsibility, Product Intelligence Support | PM confirms that the metric_name is present in the data set, updates the PI file with a cc to @gitlab-org/growth/product_analytics/engineers @gitlab-data to inform them that the metric is available. For example, for self-managed usage ping implementation, check the #g_product_analytics slack channel for the latest SaaS Usage Ping Payload. |
Dashboard | The dashboarding step outlines how Sisense dashboards are built. | PM Responsibility, Product Intelligence Support | There is a chart in Sisense. |
Handbook | The Product PI handbook page describes how product performance indicators are added for each product section, stage, and group. | PM Responsibility, Product Intelligence Support | Chart is embedded into the handbook. Target has been assigned based off the data. |
Target | The target definition step outlines how targets are defined for each performance indicator. | PM Responsibility, Product Intelligence Support | The target value is in both the chart and in the Performance Indicator (PI) file. |
Complete | All of the prior steps have been completed. | PM Responsibility, Product Intelligence Support | :tada: |
Determine what metrics are important for your specific section, stage, or group.
Instructions:
metric_name
Note: We now enable you to deduplicate aggregated metrics implemented via Redis HLL, in order to get distinct counts (ex distinct users count across multiple actions in a stage). Please read our docs on Aggregated Metrics for more information. We are working towards the ability to deduplicate across multiple Database HLL metrics via #288848, and then deduplication across multiple Redis HLL and Database HLL metrics via #421
Term | Definition | Example |
---|---|---|
Aggregated | Metric contains rolled-up values due to an aggregate function (COUNT, SUM, etc) | Total Page Views (TPV) - the sum of all events when a page was viewed. |
Deduplicated | Metric counts each unit of measurement once. | Unique Monthly Active Users (UMAU) - each user_id is counted once |
Deduplicated Aggregated | Metric contains a rolled-up value where each unit is counted once. | UMAU is a deduplicated aggregated metric but TPV is not. |
Work with your engineering team to instrument tracking for your XMAU. Focus on using Usage Ping as your metrics will be available on both SaaS and self-managed.
The Usage Ping Guide outlines the steps required for instrumentation. It includes:
Also see the Product Intelligence Guide and Snowplow Guide.
Instructions:
Plan instrumentation with sufficient lead time for data availability. Ensure your metrics make it into the self-managed release as early as possible.
Timeline:
In total, plan for up to 51 day cycle times (Examples 1, 2). Cycle times are slow with monthly releases and weekly pings, so, implement your metrics early.
Instructions:
Dashboard the metric. This is done by creating a Sisense dashboard. Avoid cumulative views and instead focus on month-over-month growth. Instructions for creating dashboards are here.
We need PMs to self-serve their own dashboards as data team capacity is limited. The data team will be focused on enabling self-service, advising PMs, and working on the more challenging XMAU dashboards.
To learn how to create your own dashboard, see Data For Product Managers: Creating Charts
To update the SMAU Summary Dashboards: GitLab.com Postgres SMAU Dashboard (SaaS) and Usage Ping SMAU Dashboard (SaaS + Self-Managed), please open a data team issue.
Dashboard Prioritization
For GMAU and SMAU data issues:
XMAU
label to the data issues.For non-GMAU and non-SMAU data issues:
Instructions:
There are five Product PI pages: The Product Team page and section pages for Dev, Ops, Sec, Enablement.
We need all PMs to ensure their PIs are showing on the performance indicator pages. Based off What we're aiming for
To do so, we need a clear way to communicate to PMs exactly which PIs are remaining. We will be adding placeholder PIs for each section into the performance indicator file so that all required PIs show in the handbook. Once a PI is implemented, the actual PI will replace the placeholder PI.For more information about how PIs and XMAUs are related to one another, see PI Structure.
Instructions:
As a product organization, we need to get into the habit of understanding our baselines and setting targets for each stage & group. For the PI Target step, you will work with your Section or Group Leader to define targets for each of your XMAUs.
Set a growth target, and embed in the tracking dashboard. Growth targets should be ambitious but achievable.
Instructions:
All of the prior steps have been completed and a PI is successfully implemented.
Instructions
Resource | Description |
---|---|
Product Intelligence Guide | A guide to Product Intelligence |
Usage Ping Guide | An implementation guide for Usage Ping |
Snowplow Guide | An implementation guide for Snowplow |
Event Dictionary | A SSoT for all collected metrics and events |
Privacy Policy | Our privacy policy outlining what data we collect and how we handle it |
Implementing Product Performance Indicators | The workflow for putting product performance indicators in place |
Product Intelligence Direction | The roadmap for Product Intelligence at GitLab |
Product Intelligence Development Process | The development process for the Product Intelligence groups |
Product Intelligence Direction | The roadmap for Product Intelligence at GitLab |
Product Intelligence Development Process | The development process for the Product Intelligence groups |