This page is intended to help Product Managers at GitLab understand what data is available to them and how they can use it to understand how their product is used. This page primarily covers two topics: how to consume data, and what data is available.
The user-facing end of GitLab's data stack is comprised of our BI Tool, Sisense (formerly known as Periscope) which is connected to our Snowflake data warehouse. The Sisense page of the data team handbook has general information about Sisense aimed for a wider GitLab audience.
Here are some useful links that we recommend for you to bookmark:
Projects > gitlab_snowflake > models
will give a list of all of the models (think of these as tables) that exist in the data warehouse. Models are organized in directories according to their data source.legacy.gitlab_dotcom_groups
You will need to locate the file you wish to update or create in the gitlab-data analytics project. Please be sure to read and follow the SQL style guide when creating the changes. If you wish to update only the descriptions or information about tables you will be looking for a schema.yml
file. If you wish to actually change the structure of tables it will be a *.sql
file.
Next, create a branch and then submit an MR using the dbt Model Changes
template to the gitlab-data analytics project. When creating your branch and MR please folow the data team workflow and use the appropriate data team labels.
The first question we on the data team ask product managers is usually "are you interested in knowing this for self-managed or GitLab.com?" Our approach to answering your question differs greatly between the two. Although our self-managed offering has many more active customers, our GitLab.com offering has much more data available to analyze.
ping_delivery_type
field.Query Example filtering out GitLab.com:
SELECT *
FROM common_mart.mart_ping_instance
WHERE ping_delivery_type != 'SaaS'
LIMIT 100
;
Snippets are great ways to allow Sisense users to build charts without writing any SQL. Anyone with editor access can write their own snippets. The data team has created several snippets that have the official badge. To find a list of available snippets, click on the scissors in the left menu.
We created several snippets that allow you to get quickly without any SQL writing some feature usage from the Service Pings data source. You can find details about those snippets on the Product Manager Toolkit handbook page.
instance
level, it is not super useful for GitLab.com since we often want to see information at the namespace
level. For example, knowing that 40K people used your stage on GitLab.com is somewhat useful, but you'll probably want to know more context (Are they free or paid? What plan are they on? Do I have any power users or is usage equally distributed?)SELECT COUNT(*) FROM x
statements, making it trivial to replicate.user_id
on any of the snowplow events, making all events functionally anonymous. This severely limits the utility of these events.Examples
A lot!
Because Snowplow doesn't rely on Service Ping and is mainly for GitLab SaaS, data from Snowplow is much faster to collect (as soon as the feature is deployed) and visualize.
As mentioned, even though the anonymization of snowplow events is a major limitation, with the fast feedback, it is an effective source of data to measure feature adoption and usage.
We recommend Product Managers and their teams use Snowplow custom structured events, which are Snowplow's canonical events. We have built Tracking
and GitLab::Tracking
, 2 wrappers for Snowplow JavaScript and Ruby Trackers respectively.
To get started, use the Snowplow event tracking template when creating a new Snowplow tracking issue
Please read our Snowplow Guide for more information around the recommended taxonomy.
Once your Snowplow events have been instrumented, as part of the validation process, the newly instrumented event should be tested to ensure they're working properly. While you as the PM probably won't be doing the validation yourself everytime, it is nice to know how it works. The content under this heading should help you get started.
Testing Snowplow events can be tricky. Snowplow doesn't have a proper testing interface. However, several tools can help you debug, test, and validate your events implementation:
The data you have instrumented is useful only if it can be visualized in a chart. Follow the following steps to create a chart in a Sisense dashboard.
legacy.snowplow_structured_events_all
: contains ALL structured eventslegacy.snowplow_page_views_all
: contains ALL page viewslegacy.snowplow_unstructured_events_all
: contains ALL unstructured events (including click events, form submissions, etc).PRO TIP: Optimizing queries
To make your query faster, use a date filter in your WHERE
statement.
Example query:
SELECT
event_action,
COUNT(*) AS event_count
FROM legacy.snowplow_structured_events_all
WHERE derived_tstamp > CURRENT_DATE-30
GROUP BY 1
ORDER BY 2 DESC
Use these SQL snippets to help you get started visualizing your Snowplow events.
Snippet | Intended Use | Example |
---|---|---|
Chart Snowplow Actions(actions,categories) | Plot the Snowplow events by action or category | [Chart Snowplow Actions("('view_alerts_list', 'update_alert_status', 'view_alert_details')","('Alert Management')")] |
TODO