Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Product Vision - Telemetry

On this page

Vision for telemetry

Telemetry manages a variety of technologies that are important for GitLab's understanding of how our users use our products. These technologies include but are not limited to; Snowplow Analytics, Pendo.io, and an in-house tool called Usage Ping which is hosted on version.gitlab.com and also includes a separate service called Version Check.

If you'd like to discuss this vision directly with the Interim Product Manager for Telemetry, feel free to reach out to Tim Hey via e-mail.

The overall vision for the Telemetry group is to ensure that we have a robust, consistent and modern telemetry data framework in place to best serve our internal Product, Finance, Sales, and Customer Success teams. The group also ensures that GitLab has the best visualization and analysis tools in place that allows the best possible insights into the data provided through the various collection tools we utilize.

Category Description
🧺 Collection The structure(s) and platform(s) of how we collect Telemetry data
🔍 Analysis Manages GitLab's internal needs for an analysis tool that serves the Product department

SMAU

Stage Monthly Active Users is a KPI that is required for all product stages. SMAU is defined as the number of unique users who take a specified action within a stage.

SMAU Definition & Dashboard Tracker

Stage SMAU(Title) SMAU (Definition) Dashboard Link (Periscope)
Configure Configure MAU https://gitlab.com/gitlab-org/telemetry/issues/53 https://app.periscopedata.com/app/gitlab/462967/Configure-Metrics
Create Create MAU https://gitlab.com/gitlab-org/telemetry/issues/49  
Defend Defend MAU https://gitlab.com/gitlab-org/telemetry/issues/56 Dashboard is unavailable at this time. Defend is a new stage with 0 users to date.
Enablement Enablement MAU https://gitlab.com/gitlab-org/telemetry/issues/107  
Manage Manage MAU https://gitlab.com/gitlab-org/telemetry/issues/47 https://app.periscopedata.com/app/gitlab/473113/Manage-Stage-Dashboard
Monitor Monitor MAU https://gitlab.com/gitlab-org/telemetry/issues/54  
Package Package MAU https://gitlab.com/gitlab-org/telemetry/issues/51  
Plan Plan MAU https://gitlab.com/gitlab-org/telemetry/issues/48  
Release Release MAU https://gitlab.com/gitlab-org/telemetry/issues/52 https://app.periscopedata.com/app/gitlab/451468/Release-Stage-Dashboard
Secure Secure MAU https://gitlab.com/gitlab-org/telemetry/issues/55 https://app.periscopedata.com/app/gitlab/410654/Secure-Metrics
Verify Verify MAU https://gitlab.com/gitlab-org/telemetry/issues/50  

Dashboards

Dashboards at GitLab have been created for the team to have better insights into how various parts of the company are performing. Today in Periscope we have dashboards that range from GitLab's overarching company KPIs, financial performance, Customer success, Sales forecasts to Product focused SMAU dashboards.

How to: Build Your Own Dashboard in Periscope

Please reference the “Self-Serve Analysis in Periscope” handbook page for a thorough step by step guide

Quick steps to get-started:

How To: Request A Dashboard

Data Fields Available for Tracking Today

We currently have a variety of data available for reporting today.

What do I do if the event I need to track is not available today?

If you do not see an event in the table above we have outlined a best practices guide for you to implement tracking here In this we cover:

Tracking and Instrumentation Overview

Up until now, the GitLab codebase has been optimized for the application. Now, we need to optimize the codebase for analytics.

Today we are using a few different systems to track users and usage in our product. Those systems are Snowplow and Usage Ping. Below we’ve broken down the best way to gain insights on both GitLab.com and the Self-hosted versions of GitLab.

What do we have in place today and where are we headed?**

Driver GitLab.com Self-Managed
% of Revenue 10% 90%
MAU ~750K Millions (paid and CE ~5.5M)
Ease of Data Collection Easy Complicated
Data Sources Today GitLab.com db
Snowplow (basic)
Usage Ping
Data Sources in Future GitLab.com db
Snowplow (enhanced)
Pendo
Usage Ping
(Snowplow?)
Pendo
Opt-out? TBD Yes

GitLab.com Instrumentation

The primary system to extract insights from GitLab.com is Snowplow. This system allows us to track user level events which includes frontend, backend and custom events. Snowplow also provides a level of flexibility for us to manage historical data.

Self-hosted Instrumentation

The primary system used to extract insights from GitLab's Self-hosted offering is Usage Ping. This system uses high-level data to help our product, support, and sales teams. It does not send any project names, usernames, or any other specific or personal data. The information from usage ping is not anonymous, it is linked to the hostname of the instance. Sending usage ping is optional.

Telemetry Technologies & Services

In this section we will explain the various types of technologies and services we leverage to support and provide data insights and visualizations that help tell a story about a products usage and answer questions pertinent to building world class products. We will breakdown Usage Ping, Snowplow, Snowflake and Pendo.

Usage Ping

Status: in production ready for use

Impacts Self-hosted and GitLab.com

GitLab sends a weekly payload containing usage data to GitLab Inc. The usage ping uses high-level data to help our product, support, and sales teams. It does not send any project names, usernames, or any other specific data. The information from the usage ping is not anonymous, it is linked to the hostname of the instance. Sending usage ping is optional, and any instance can disable analytics.

The usage data is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product.

In addition to row counts, there are many boolean flags indicating which features the instance has enabled. The payload also tells us what version of GitLab the instance is currently running and how many users are active on the instance.

Related Usage Ping Links:

Usage Ping limitations

Snowplow

Status: in production ready for use

Impacts GitLab.com only

Snowplow is an enterprise-strength marketing and product analytics platform. It does three things:

  1. Identifies your users, and tracks the way they engage with your website or application
  2. Stores your users' behavioral data in a scalable "event data warehouse" you control: in Amazon S3 and (optionally) Amazon Redshift or Postgres (we use Postgres)
  3. Lets you leverage the biggest range of tools to analyze that data, including big data tools (e.g. Spark) via EMR or more traditional tools e.g. Looker, Mode, Superset, Re:dash to analyze that behavioral data. (we use Periscope)

Snowplow technology 101

The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats:

To briefly explain these six sub-systems:

snowplow_flow

Snowflake

Status: in production ready for use

Houses both data from GitLab.com and Self-Hosted

Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.

Snowflake’s data warehouse is not built on an existing database or “big data” software platform such as Hadoop. The Snowflake data warehouse uses a new SQL database engine with a unique architecture designed for the cloud. To the user, Snowflake has many similarities to other enterprise data warehouses, but also has additional functionality and unique capabilities.

Pendo

Status: Not yet in production, awaiting approval, no ETA

Impacts GitLab.com only

Pendo will provide us with near immediate value in the following areas:

These three areas are highlighted below:

Customizable messaging solution

While there are no current plans in any stage to prioritize a customizable messaging framework, there has been some discussion of banners/messaging in this issue and if prioritized, most likely, this feature could be delivered in a few releases. It would not have all of the bells and whistles of Pendo, but could be useful for immediate communication with customers. MVC could be in 2-3 releases.

In-app user guides

Building a fully customizable guide solution, even if prioritized today where users (marketers, ux, growth team members) could dynamically create new guides via an admin interface will take longer than six months to complete. Individual guides for engineering to

implement can be added to each release, but will be difficult to experiment on.

Self-service product analytics

Product data analysis activities are simple to use and it is relatively easy to glean insights out of Pendo. Similar product insights are difficult to draw out of our existing data architecture and require advanced SQL queries and/or help from data analysts. It is not likely to replicate an easy analysis experience in Pendo with our existing data tools in the next six months, and will likely require significant development against the existing solution, additional data analysts to support the product team, or different data visualization technologies.

Regarding SMAU, a usable dashboard for a single stage is one release away and is only dependent on additional improvements to usage ping and the Data team creating a dashboard. Making SMAU more accurate will be an ongoing project. SMAU could also be calculated using the page view data in Pendo.

Priorities

Collection Priorities

Current focus

Priority Focus Why?
1️⃣ Define and measure monthly active users, overall and per-stage After this epic is closed, we should have an internally consistent view of MAU and SMAU across GitLab.com and self-managed. We should be able to measure active use in a Periscope dashboard, enabling us to improve MAU and SMAU. We can then tackle improving this further with SMAU/MAU v2.0.

Next up

Priority Focus Why?
2️⃣ SMAU/MAU v2.0. As our organization grows, we require better data to inform our product, marketing, and sales team as they make decisions to grow the business and realize our strategic goals. This epic will serve as the aggregation of issues required to improve our monthly active user metrics, so we can have a world-class data platform at GitLab.
3️⃣ Improve telemetry data collection from self-managed instances Currently we have little to no visibility into how many of our largest and most valuable customers are using GitLab. We need to understand how we can collect data more consistently from our Self-Managed users in order to better serve them.
4️⃣ Telemetry Documentation It is important that as we roll out new changes and develop processes and workflows, we clearly and transparently document everything in a way that is easily discoverable and digestible by both GitLab team members and users/customers.

Analysis Priorities

Current focus

Priority Focus Why?
1️⃣ Pendo Implementation for GitLab hosted services Pendo.io is an wildly popular, industry standard data collection and analysis tool widely used by many organisations to gain insights into how their users are using their products. GitLab has decided to subscribe to and implement Pendo while we improve our in-house data collection and analysis options and develop the overall vision for the Telemetry group.

Next up

Priority Focus Why?
2️⃣ Product Team Dashboards Parallel to improving SMAU/MAU v2.0., it's important that we roll out the process defined by the Telemetry working group to all of the other stages so that each Product Manager has visibility into how users are using their stage and stage categories.

How we prioritize

We follow the same prioritization guidelines as the product team at large. Issues tend to flow from having no milestone, to being added to the backlog, to a directional milestone (e.g. Next 3-4 releases), and are finally assigned a specific milestone.

Our entire public backlog for Telemetry can be viewed here, and can be filtered by labels or milestones. If you find something you are interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute!