Tracking feature engagement after we've shipped is an important part of product management and software development. To accomplish this in the GitLab product, we use different approaches for self-managed and GitLab.com. To visualize the data we extract from both, we use our data visualization tool.
Self-managed instances are not required to maintain a connection with GitLab, Inc. Instead, we rely on an on-by-default usage ping that can be optionally disabled by any instance. Instances attempt to send this weekly.
To visualize and explore our existing dataset, please see our data visualization tool
Individual teams are free to prioritize and add additional attributes to the usage ping (see usage_data.rb for the current schema).
To add additional instrumentation to the usage ping:
We use the open-source Snowplow for identifying users and events on GitLab.com. These events are extracted to a Snowflake data warehouse and - like usage ping - visualized in our data visualization tool
Since GitLab.com is an GitLab EE instance, we're also able to query tables in PostgreSQL that are detailed in schema.rb for analytical purposes. As of November 2018, query access is dependent on allowing SSH access to a dedicated read-only replica.
Since GitLab.com is maintained by GitLab, Inc. and Snowplow allows for much richer event tracking, we capture better data more frequently. A comparison:
|Compared dimension||Self-managed (usage ping)||GitLab.com|
|Frequency||Infrequent; sent weekly from instances||Frequent; hours to see new events|
|Fidelity||Low; anonymized counts||High; page views, clicks, time spent on page, anything in schema.rb|
|User penetration||Low; most instances opt-out||High; users can opt out of event tracking in browser, but Snowplow/PSQL captures almost everything|
|Customer representation||Fair; self-managed is more representative of EE use, but opt-out remains high||Poor; high proportion of free individual users|
The last item is the most salient; analyses should consider differences in population and behavior (and ideally control for them, such as filtering individuals/non-groups out of analysis), as usage across GitLab.com will likely differ significantly from usage on a large self-managed instance.
Like usage ping, individual teams are free to prioritize and capture events using the existing Snowplow implementation.
Page views and page pings (time spent on page). Captured automatically. Snowplow was introduced as a configurable setting that when enabled, sits in the header of every page on GitLab.com. As a result, we get a destination and referral URL for each page (using the
trackPageView method) along with page pings every 30 seconds (using the
Custom click events. Capturing specific click events on individual pages requires specification of what we'd like to track. Tracking click events is accomplished by calling a generic JS function wherever desired. When clicked, the client sends a defined click event to our Snowplow endpoint. This function is based on Snowplow's built-in
To send additional click events to Snowplow:
When adding new click events, we should add them in a way that's internally consistent. If we don't, it'll be very painful to perform analysis across features since each feature will be capturing events differently.
The current method provides 5 attributes that are sent on each click event. Please try to follow these guidelines when specifying events to capture:
|category (required)||Describes the page that you're capturing click events on. Unless infeasible, please use the Rails page attribute by default.|
|action (required)||Describes the action the user is taking. The first word should always describe the action: clicks should be |
|label (optional)||Describes a specific element that was interacted on. This is either the label of the element (e.g. a tab labeled 'Create from template' may be |
|property (optional)||Describes an additional property of the element interacted on.|
|value (optional)||Describes a value or something directly related to the action taken. This could be the value of an input (e.g. |
click_tab(general action taken -> what the action was taken on).