Telemetry manages a variety of technologies that are important for GitLab's understanding of how our users use our products. These technologies include but are not limited to; Snowplow Analytics, Pendo.io, and an in-house tool called Usage Ping which is hosted on version.gitlab.com and also includes a separate service called Version Check.
If you'd like to discuss this vision directly with the Interim Product Manager for Telemetry, feel free to reach out to Tim Hey via e-mail.
The overall vision for the Telemetry group is to ensure that we have a robust, consistent and modern telemetry data framework in place to best serve our internal Product, Finance, Sales, and Customer Success teams. The group also ensures that GitLab has the best visualization and analysis tools in place that allows the best possible insights into the data provided through the various collection tools we utilize.
|🧺 Collection||The structure(s) and platform(s) of how we collect Telemetry data|
|🔍 Analysis||Manages GitLab's internal needs for an analysis tool that serves the Product department|
Dashboards at GitLab have been created for the team to have better insights into how various parts of the company are performing. Today in Periscope we have dashboards that range from GitLab's overarching company KPIs, financial performance, Customer success, Sales forecasts to Product focused SMAU dashboards.
Please reference the “Self-Serve Analysis in Periscope” handbook page for a thorough step by step guide
Quick steps to get-started:
We currently have a variety of data available for reporting today.
If you do not see an event in the table above we have outlined a best practices guide for you to implement tracking here In this we cover:
Up until now, the GitLab codebase has been optimized for the application. Now, we need to optimize the codebase for analytics.
Today we are using a few different systems to track users and usage in our product. Those systems are Snowplow and Usage Ping. Below we’ve broken down the best way to gain insights on both GitLab.com and the Self-hosted versions of GitLab.
|% of Revenue||10%||90%|
|MAU||~750K||Millions (paid and CE ~5.5M)|
|Ease of Data Collection||Easy||Complicated|
|Data Sources Today||GitLab.com db
|Data Sources in Future||GitLab.com db
The primary system to extract insights from GitLab.com is Snowplow. This system allows us to track user level events which includes frontend, backend and custom events. Snowplow also provides a level of flexibility for us to manage historical data.
The primary system used to extract insights from GitLab's Self-hosted offering is Usage Ping. This system uses high-level data to help our product, support, and sales teams. It does not send any project names, usernames, or any other specific or personal data. The information from usage ping is not anonymous, it is linked to the hostname of the instance. Sending usage ping is optional.
In this section we will explain the various types of technologies and services we leverage to support and provide data insights and visualizations that help tell a story about a products usage and answer questions pertinent to building world class products. We will breakdown Usage Ping, Snowplow, Snowflake and Pendo.
Status: in production ready for use
Impacts Self-hosted and GitLab.com
GitLab sends a weekly payload containing usage data to GitLab Inc. The usage ping uses high-level data to help our product, support, and sales teams. It does not send any project names, usernames, or any other specific data. The information from the usage ping is not anonymous, it is linked to the hostname of the instance. Sending usage ping is optional, and any instance can disable analytics.
The usage data is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product.
In addition to row counts, there are many boolean flags indicating which features the instance has enabled. The payload also tells us what version of GitLab the instance is currently running and how many users are active on the instance.
Related Usage Ping Links:
Status: in production ready for use
Impacts GitLab.com only
Snowplow is an enterprise-strength marketing and product analytics platform. It does three things:
Snowplow technology 101
The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats:
To briefly explain these six sub-systems:
Status: in production ready for use
Houses both data from GitLab.com and Self-Hosted
Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). Snowflake provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings.
Snowflake’s data warehouse is not built on an existing database or “big data” software platform such as Hadoop. The Snowflake data warehouse uses a new SQL database engine with a unique architecture designed for the cloud. To the user, Snowflake has many similarities to other enterprise data warehouses, but also has additional functionality and unique capabilities.
Status: Not yet in production, awaiting approval, no ETA
Impacts GitLab.com only
Pendo will provide us with near immediate value in the following areas:
These three areas are highlighted below:
While there are no current plans in any stage to prioritize a customizable messaging framework, there has been some discussion of banners/messaging in this issue and if prioritized, most likely, this feature could be delivered in a few releases. It would not have all of the bells and whistles of Pendo, but could be useful for immediate communication with customers. MVC could be in 2-3 releases.
Building a fully customizable guide solution, even if prioritized today where users (marketers, ux, growth team members) could dynamically create new guides via an admin interface will take longer than six months to complete. Individual guides for engineering to
implement can be added to each release, but will be difficult to experiment on.
Product data analysis activities are simple to use and it is relatively easy to glean insights out of Pendo. Similar product insights are difficult to draw out of our existing data architecture and require advanced SQL queries and/or help from data analysts. It is not likely to replicate an easy analysis experience in Pendo with our existing data tools in the next six months, and will likely require significant development against the existing solution, additional data analysts to support the product team, or different data visualization technologies.
Regarding SMAU, a usable dashboard for a single stage is one release away and is only dependent on additional improvements to usage ping and the Data team creating a dashboard. Making SMAU more accurate will be an ongoing project. SMAU could also be calculated using the page view data in Pendo.
|1️⃣||Define and measure monthly active users, overall and per-stage||After this epic is closed, we should have an internally consistent view of MAU and SMAU across GitLab.com and self-managed. We should be able to measure active use in a Periscope dashboard, enabling us to improve MAU and SMAU. We can then tackle improving this further with SMAU/MAU v2.0.|
|2️⃣||SMAU/MAU v2.0.||As our organization grows, we require better data to inform our product, marketing, and sales team as they make decisions to grow the business and realize our strategic goals. This epic will serve as the aggregation of issues required to improve our monthly active user metrics, so we can have a world-class data platform at GitLab.|
|3️⃣||Improve telemetry data collection from self-managed instances||Currently we have little to no visibility into how many of our largest and most valuable customers are using GitLab. We need to understand how we can collect data more consistently from our Self-Managed users in order to better serve them.|
|4️⃣||Telemetry Documentation||It is important that as we roll out new changes and develop processes and workflows, we clearly and transparently document everything in a way that is easily discoverable and digestible by both GitLab team members and users/customers.|
|1️⃣||Pendo Implementation for GitLab hosted services||Pendo.io is an wildly popular, industry standard data collection and analysis tool widely used by many organisations to gain insights into how their users are using their products. GitLab has decided to subscribe to and implement Pendo while we improve our in-house data collection and analysis options and develop the overall vision for the Telemetry group.|
|2️⃣||Product Team Dashboards||Parallel to improving SMAU/MAU v2.0., it's important that we roll out the process defined by the Telemetry working group to all of the other stages so that each Product Manager has visibility into how users are using their stage and stage categories.|
We follow the same prioritization guidelines as the product team at large. Issues tend to flow from having no milestone, to being added to the backlog, to a directional milestone (e.g. Next 3-4 releases), and are finally assigned a specific milestone.
Our entire public backlog for Telemetry can be viewed here, and can be filtered by labels or milestones. If you find something you are interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute!