While there are examples of data collection used for malicious intent, data collection and analysis has also allowed companies to improve their product or service, benefiting their end user/consumer. It is in this vein, that GitLab collects usage data about its products. We collect individual usage data in a pseudonymized manner at the namespace level and then use this information to power our product decisions and improve GitLab for you. We may also aggregate all this information to understand broadly how GitLab product is used.
On our Product Performance Indicators page, under each graph there is a
Lessons Learned callout which summarizes insights and opportunities based on the usage data collected. The improvements made to the product through this process are largely attributable to usage data collection.
As an example, the Package team at GitLab watches the usage metrics related to the count of users who published a package to the Package Registry over time. To be clear, the data they are analyzing is in an aggregate form, no user-identifiable information is analyzed. As a result of digging into the trends they identified this insight:
From a funnel perspective, we saw significant growth in the packages pulled using a deploy token or by a Guest. Both are signs that the Package Registry is being integrated into our customers production workflows.
Based on this analysis, the team prioritized two bug issues related to deploy tokens. This is an efficient and effective use of usage data which never put in jeopardy a person's identity, nor GitLab's credibility.
This is the data space in which we operate and will continue to operate.
Over the past few years GitLab has made commitments to our community around the collection, processing, and use of service usage data. This page summarizes those commitments and provides guidance to team members working on projects that involve the collection of product analytics data from our customers.
Analytics Data can be too generic of a term. The list below are the specific types of Product data that is in scope:
As many other organizations offering digital products, GitLab strives to get better understanding on utility of its offering. In order to build the best DevOps platform for everyone, we try to understand what are the most used areas, which get overlooked, which are in need of improvement, and which we can be proud of.
To get to such understanding, we look to collected service usage data. And as part of collecting usage data, we aim to provide robust privacy protection, and assurance that this data would not be ill-used. With that obligation in mind, we are working to prepare a privacy protection mechanism that would include technical tools and various policies.
While we will be pseudonymizing personal information for individual users, there are cases where a project or namespace could be identified. There are a few primary examples:
project_ID, it can be used to identify the project name via our APIs but this is only true for projects set to Public visibility where you are a member.
namespace_ID, it can be used to identify the namespace name (which may be a personal name) via our API you can only return namespace information for namespaces you are a member of.
Our pseudonymization process to de-identify personally identifiable data which relies on one-way hashing is was released in milestone 14.4 (October 2021).
A key part of our process is pseudonymizing data at the collection layer, which allows GitLab to resolve any issues without a dependency for upgrading versions on your part.
Now that we have the ability to protect user privacy with the pseudonymization service in place, we have started collecting
Namespace_ID and pseudonymized
User_ID. Collecting these identifiers make the aggregated metrics we collect much more revealing. Now, instead of know there were 1000 clicks of some button, we can know things like: "Unidentified User "X" clicked a button, performed an action, then hit an error." This rich user journey will greatly improve GitLab's ability to improve our product for you, our end user.
Next up for our roadmap includes modeling user journeys to better understand the features are users value most and implementing events track in self-managed instances.