There are two main tools that we use for tracking users data: Service Ping and Snowplow. Here are the main differences between these two tools:
Snowplow | Service Ping | |
---|---|---|
Type of data | Snowplow collects events which are interactions with the application, such as the date and time of visit, and the feature and functionality that has been clicked on or used. | Service Ping reports only cumulative counts of things (ex. count.epics and analytics_unique_visits.p_analytics_repo and settings/instance information (ex. database.version and container_registry_enabled). |
Cadence | Snowplow events are collected and sent to the data warehouse as they occur. | - Self-managed: Service Ping is collected from individual self-managed instances via an automated process weekly according to a random distribution schedule. - SaaS: Service Ping is collected from GitLab.com weekly in a single payload via a manual process. |
Scope | - Snowplow event collection is currently only available in SaaS. - Snowplow can collect events for both frontend and backend activities. |
- Service Ping is available for both self-managed and SaaS. - Service Ping only collects backend data. |
Availability | When new Snowplow events are instrumented, data will begin flowing to the data warehouse immediately once the code is deployed to GitLab. | When new Service Ping metrics are instrumented, only self-managed instances which have upgraded to the GitLab version in which the metric instrumentation is available will start reporting applicable data. |
Analysis | - Snowplow events can be parsed downstream by `namespace_id`, `project_id` and/or pseudonyminized `user_id`. - Google Analytics ID and Snowplow ID are mapped which allow downstream analytics of (pseudonyminized) user journeys inside and outside the product. (issue reference) - Snowplow's data is enriched with browser-specific metadata: for example, name of the used browser, user timezone, page url. - Snowplow automatically records all `page view` events on GitLab. |
Self-managed Service Ping product usage data can be tied to customers, but SaaS Service Ping data cannot currently be tied to the customer/namespace level. |
Example use cases | (1) Track how many users entered the Issue board page (Snowplow already records all page views - there's no need for adding an additional Service Ping metric for that) (2) Track how many times have users clicked the "new issue" button (clicking a button is a frontend-only event - it cannot be tracked with Service Ping) (3) Track how many users entered the Pricing page after checking the Partners page (Note: Snowplow is able to track user journey) (4) Track how many users are viewing the handbook in Firefox (Snowplow events include browser metadata - in Service Ping, we don't have access to that data) |
(1) Track how many different labels exist on given GitLab instance *(it cannot be tracked with Snowplow, because it is not an event. With Snowplow, we would be able to track - for example - how many times has the label creation event happened)* (2) Track whether a GitLab instance has the Gravatar feature enabled *(it cannot be tracked with Snowplow, because it is not an event - it's a metric specific for a given instance)* |
Service Ping consists of two kinds of data:
The next step after deciding on the new metrics' data type is choosing the metric counter type.
The metric counter type depends on the type of data to be tracked:
After determining the metric counter type, it's time to implement the counter incrementing!
This step can be skipped for Database metrics and Generic metrics. These metrics do not need separate counter incrementation logic because they grab the data straight from GitLab's database/config.
To track Redis counter events, we offer a frontend API described in the Service Ping guide, and a JavaScript/Vue helper for using this API.
To add a RedisHLL counter you must add the event definition and then implement the counter incrementation.
To define a new event, you must add a yml
file to the known_events
folder. See Add new events for detailed information about defining events.
For RedisHLL counters, you can implement the counter incrementation logic in multiple ways.
In the backend you can implement the counter incrementation logic in:
RedisTracking
module.increment_unique_values
method.track_usage_event
method.In the frontend you can implement the counter incrementation logic in the POST /usage_data/increment_unique_users
API endpoint and its JavaScript/Vue helper.
The tracking methods' implementation is described in detail in the point 2
of the Service Ping guide.
Now that the events are being recorded, the next step is adding an instrumentation class that will define how they are counted. The instrumentation class implementation will depend on the metrics counter type.
See Database metrics for a definition of the database metrics instrumentation class.
Redis metrics typically don't require adding an instrumentation class - instead, they reuse the already defined RedisMetric
class. A new instrumentation class only needs to be added if we need to define the metric's availability.
RedisHLL metrics typically don't require adding an instrumentation class - instead, they reuse the already defined RedisHLLMetric
class. A new instrumentation class only needs to be added if we need to define the metric's availability.
See Generic metrics for a definition of the generic metrics instrumentation class.
Now that the events have an instrumentation class defined, the next step is adding them to the Service Ping data payload and to the Metrics Dictionary. Both of these goals can be achieved by adding a single YML event configuration file.
See Metrics Definition and validation for the instructions to add the YML event configuration file .
The instrumentation class, defined in the previous step of this guide, should be used as the value for the instrumentation_class
YAML attribute of the newly created config file.
There are multiple ways of implementing Snowplow tracking, depending on the framework used. However, regardless of the framework used, the events need to have at least two main attributes defined: their action
and category
. The values that these (and other) attributes should take are explained with examples in the event taxonomy guide. It's also possible to see the structure of existing events in the Metrics Dictionary.
The way in which those properties are passed to Snowplow
depends on framework used.
The framework options with their respective guides are:
To verify the added changes using your local environment, check local setup instructions.
To make sure the MR is ready for review, check out the review guidelines. They also include information about what the Analytics Instrumentation reviewer will later check in the MR.
If you want to verify the event attributes passed to Snowplow, check out the event taxonomy guide. You can also check the Metrics dictionary for examples of already existing implementations.
To verify the added changes using your local environment, check Snowplow testing guide.
To make sure the MR is ready for review, check out the review guidelines. They also include information about what the Analytics Instrumentation reviewer will later check in the MR.
Create a Data team issue when a database value you want to use is not available in Sisense (handbook doumentation, example use case, issue template).
You should create a Product Analysis team issue when you encounter one of these situations:
After the new code has been pushed to production, it's time to verify the tracked data. Depending on the tool used, the data's cadence and availability will vary.
New Service Ping data will only be collected from instances running the newly added code. This means that most likely, the first piece of data that you will be able to verify will come from the GitLab SaaS instance. For the first Service Ping containing deployed changes to appear in Sisense, you will need to wait this much time:
New Snowplow data will start getting collected right after the new changes are deployed to production. However, since the data needs to get processed by database pipelines, you may need to wait 24 hours for it to make its way to Sisense.
To access the data in Sisense, you will need to create a dashboard with a chart containing the new metric data.
We have this process (for the Service Ping case) illustrated in a video. This is what needs to be done:
New chart
button.event_action
/event_label
parameters.SQL
field of the new chart and click "Run SQL".Please follow process outlined by the guidance for removing and changing metrics that can be found in the Service Ping lifecycle documentation.
Resource | Description |
---|---|
Sisense handbook page | A guide for getting started with SiSense |
Metrics dictionary | A SSoT for all collected metrics from Usage Ping |
dbt data tool | A tool for viewing relations between databases |
FAQ | Analytics Instrumentation FAQ |