Feature Instrumentation

On this page

Instrumentation at GitLab

Tracking feature engagement after we've shipped is an important part of product management and software development. To accomplish this in the GitLab product, we use different approaches for self-managed and GitLab.com. To visualize the data we extract from both, we use Looker.

Instrumentation for self-managed instances

Self-managed instances are not required to maintain a connection with GitLab, Inc. Instead, we rely on an on-by-default usage ping that can be optionally disabled by any instance. Instances attempt to send this weekly.

To visualize and explore our existing dataset, please see the Usage Data Explore in Looker.

Adding to usage ping

Individual teams are free to prioritize and add additional attributes to the usage ping (see usage_data.rb for the current schema).

To add additional instrumentation to the usage ping:

  1. Detail any desired additions in an issue. You can see examples of similar issues by searching for closed "usage ping" issues.
  2. Work with the relevant team to schedule the work. In general, the team that owns a feature should also be responsible for any instrumentation.
  3. Work with the Analytics team to model the new attributes in Looker. After your addition ships and instances begin sending us the updated usage ping, create an issue in the Analytics project to model and visualize the new attributes in Looker. Generally speaking, this involves:
    • Updating any relevant ETL jobs to include the addition. Simple additions (example) to existing columns may not require this, whereas adding new columns (example) will likely require an extractor update.
    • Updating the model in the Looker project. Before we're able to see the new dimension in Looker Explores like Usage Data, we need to define what to visualize.

Instrumentation for GitLab.com

We use the open-source Snowplow for identifying users and events on GitLab.com. These events are extracted to a Snowflake data warehouse and - like usage ping - visualized in Looker.

Since GitLab.com is an GitLab EE instance, we're also able to query tables in PostgreSQL that are detailed in schema.rb for analytical purposes. As of November 2018, query access is dependent on pending efforts in Looker and to allow SSH access to a dedicated read-only replica.

Differences with usage ping

Since GitLab.com is maintained by GitLab, Inc. and Snowplow allows for much richer event tracking, we capture better data more frequently. A comparison:

Compared dimension Self-managed (usage ping) GitLab.com
Frequency Infrequent; sent weekly from instances Frequent; hours to see new events
Fidelity Low; anonymized counts High; page views, clicks, time spent on page, anything in schema.rb
User penetration Low; most instances opt-out High; users can opt out of event tracking in browser, but Snowplow/PSQL captures almost everything
Customer representation Fair; self-managed is more representative of EE use, but opt-out remains high Poor; high proportion of free individual users

The last item is the most salient; analyses should consider differences in population and behavior (and ideally control for them, such as filtering individuals/non-groups out of analysis), as usage across GitLab.com will likely differ significantly from usage on a large self-managed instance.

Adding events to Snowplow

Like usage ping, individual teams are free to prioritize and capture events using the existing Snowplow implementation.

Page views and page pings (time spent on page). Captured automatically. Snowplow was introduced as a configurable setting that when enabled, sits in the header of every page on GitLab.com. As a result, we get a destination and referral URL for each page (using the trackPageView method) along with page pings every 30 seconds (using the enableActivityTracking method).

Custom click events. Capturing specific click events on individual pages requires specification of what we'd like to track. Tracking click events is accomplished by calling a generic JS function wherever desired. When clicked, the client sends a defined click event to our Snowplow endpoint. This function is based on Snowplow's built-in trackStructEvent method.

To send additional click events to Snowplow:

  1. Detail any desired additions in an issue. See examples(1, 2) of other issues. Issues should be as specific as possible.
    • Note that our current approach doesn't require tracking for actions that send the user to another URL (e.g. clicking a "Submit" CTA). Since capturing this event would require delaying the user from the new page while the event is sent, we rely on the referrer URL we obtain from page views.
    • See the section below on taxonomy.
  2. Work with the relevant team to schedule the work. As mentioned above, the team that owns a feature should also be responsible for any instrumentation.

Taxonomy

When adding new click events, we should add them in a way that's internally consistent. If we don't, it'll be very painful to perform analysis across features since each feature will be capturing events differently.

The current method provides 5 attributes that are sent on each click event. Please try to follow these guidelines when specifying events to capture:

Attribute Guidance
category (required) Describes the page that you're capturing click events on. Unless infeasible, please use the Rails page attribute by default.
action (required) Describes the action the user is taking. The first word should always describe the action: clicks should be click, activations should be activate. Use underscores to describe what was acted on; for example, activating a form field would be activate_form_input. Clicking on a dropdown is click_dropdown.
label (optional) Describes a specific element that was interacted on. This is either the label of the element (e.g. a tab labeled 'Create from template' may be create_from_template) or a unique identifier if no text is available (e.g. closing the Groups dropdown in the top navbar might be groups_dropdown_close)
property (optional) Describes an additional property of the element interacted on.
value (optional) Describes a value or something directly related to the action taken. This could be the value of an input (e.g. 10 when clicking internal visibility).

Other guidelines:

  1. Use underscores, not camelcase.
  2. Descriptive strings should always begin with a general term and become more specific. For example, say click_tab (general action taken -> what the action was taken on).