The Analytics Instrumentation Group is part of the Analytics section. Our group focuses on providing GitLab's team with data-driven product insights to build a better GitLab. To do this, we build data collection and analytics tools within the GitLab product in a privacy-focused manner. Insights generated from Analytics Instrumentation enable us to identify the best places to invest people and resources, what product categories mature faster, where our user experience can be improved, and how product changes impact the business. You can learn more about what we're building next on the Analytics Instrumentation Direction page.
How we work:
If you have any questions start by @ mentioning the product manager for the Analytics Instrumentation Group or by creating an issue in our issue board.
Every week Analytics Instrumentation team holds open office hours on Zoom for any questions that might arise. It's typically Wednesday for half an hour at 7:30 UTC. You can find the event in the GitLab Team Meetings calendar. The historical and upcoming meeting agendas can be accessed in our agenda document.
We define incidents as a deviation from the intended process that significantly disrupts the reporting of metrics to the point that immediate action is required. The process below outlines the different stages of the incident resolution process and the steps to be taken by the corresponding Directly Responsible Individuals (DRIs). Please reach out to the Analytics Instrumentation Group EM/PM for any recommendations to changes in the process.
(DRI: The team/individual detecting the issue)
~"Analytics Instrumentation::Incident-High Severity"
for impending loss of data for many metrics or moderate to severe loss in business critical metrics that have a performance_indicator_type value.~"Analytics Instrumentation::Incident-Medium Severity"
for data delay.(DRI: The PM of the Analytics Instrumentation group)
(DRI: To be identified by the EM of the Analytics Instrumentation group)
We're responsible to deliver a reliable Service Ping that runs every week on SaaS and Self Managed instances. Our responsiblity is tooling and automations for metric collections to set the company up for success to deliver Service Ping data to our data warehouse. Due to the amount of metrics we can't maintain the health of all metrics or can provide insights into the business logic of metrics.
performance_indicator_type
) we also inform the responsible team but will treat it as a Severity 1/Priority 1 issue and try to provide a fix.For an overview about the capabilities of the analytic tooling the team develops, you can watch the video Analytics Instrumentation 101, or look through the slides (internal)
The following people are permanent members of the Analytics Instrumentation Group:
Person | Role |
---|---|
Tanuja Jayarama Raju | Product Manager, Analytics:Analytics Instrumentation |
Lorena Ciutacu | Technical Writer - Analytics:Product Analytics, Analytics:Analytics Instrumentation, Data Stores:Tenant Scale and Plan:Optimize |
Ankit Panchal | Senior Frontend Engineer, Analytics:Analytics Instrumentation |
Sebastian Rehm | Manager, Fullstack Engineering, Analytics:Analytics Instrumentation |
Greg Myers | Security Engineer, Application Security, Package (Package Registry, Container Registry), US Public Sector Services, Gitaly Cluster, Analytics (Analytics Instrumentation, Product Analytics), AI Working Group |
Jonas Larsen | Senior Backend Engineer, Analytics:Analytics Instrumentation |
Michał Wielich | Backend Engineer, Analytics:Analytics Instrumentation |
Niko Belokolodov | Senior Backend Engineer, Analytics:Analytics Instrumentation |
Piotr Skorupa | Backend Engineer, Analytics:Analytics Instrumentation |
Sarah Yasonik | Senior Backend Engineer, Analytics:Analytics Instrumentation |
Our team uses a hybrid of Scrum for our project management process. This process follows GitLab's monthly milestone release cycle.
We do a weekly automated check-in within a separate Slack channel. We feel that a weekly cadence is enough to keep everyone up to date about the most important developments within the team. A bot asks every team member autoamtically the following questions at the beginning of the week and posts them to the channel
Our team use the following workflow stages defined in the Product Development Flow:
Label | Usage |
---|---|
~"workflow::validation backlog" |
Applied by the Product Manager for incoming issues that have not been refined or prioritized. |
~"workflow::problem validation" |
Applied by the Product Manager for issues where the PM is developing a thorough understanding of the problem |
~"workflow::design" |
Applied by the Product Manager or Designer (or Analytics Instrumentation Engineer) to ideate and propose solutions. The proposed solutions should be reviewed by engineering to ensure technical feasibility. |
~"workflow::solution validation" |
Applied by the Product Manager or Designer (or Analytics Instrumentation Engineer) to validate a proposed solution through user interviews or usability testing. |
Label | Usage |
---|---|
~"workflow::planning breakdown" |
Applied by the Product Manager for Engineers to begin breaking down issues and adding estimates. |
~"workflow::ready for development" |
Applied by either Engineering or Product Manager after an issue has been broken down and scheduled for development. |
~"workflow::in dev" |
Applied by the Engineer after work (including documentation) has begun on the issue. An MR is typically linked to the issue at some point throughout this stage. |
~"workflow::in review" |
Applied by the Engineer indicating that all MRs required to close an issue are in review. |
~"workflow::verification" |
Applied by the Engineer after the MRs in the issue have been merged, this label is applied signaling the issue needs to be verified in staging or production. |
~"workflow::complete" |
Applied by the Engineer after all MRs have merged and the issue has been verified. At this step, the issue should also be closed. |
~"workflow::blocked" |
Applied by any team member if at any time during development the issue is blocked. For example: technical issue, open question to PM or PD, cross-group dependency. |
We use an epic roadmap to track epic progress on a quarterly basis. The epic roadmap is a live view of the Analytics Instrumentation Direction page.
To keep things simple, we primarily use the gitlab.com/gitlab-org group for our roadmap. If epics are created on the gitlab.com/gitlab-com and gitlab.com/gitlab-services groups, we create placeholders of them on gitlab.com/gitlab-org so that all epics show up in a single roadmap view.
gitlab-org | gitlab-com | gitlab-services | all groups |
---|---|---|---|
gitlab-org Epic Roadmap | - | - |
We use issue boards to track issue progress on a daily basis. Issue boards are our single source of truth for the status of our work. Issue boards should be viewed at the highest group level for visibility into all nested projects in a group.
We prioritize our product roadmap in the Issue Board by Milestone. Issues appear on each list in order of priority and prioritization of our product roadmap is determined by our product managers.
Engineers can find and open the board for the current milestone. Engineers should start at the top of the "workflow::ready for development" column and pick the first available, non-assigned issue. When picking an issue, the engineer should assign themselves as a signal that they are taking ownership of the issue and move them to "workflow::in development" to signal the start of development.
If the next available issue is not a viable candidate (due to amount of capacity vs. issue weight, complexity, knowledge domain, etc.) the engineer may choose to skip an issue and pick the next issue in order of priority.
The following table will be used as a guideline for scheduling work within the milestone:
Type | % of Milestone | Description |
---|---|---|
Deliverable | 70% | business priorities (compliance, IACV, efficiency initiatives) |
Tech debt | 10% | nominated by engineers prior to milestone start in Milestone Planning Issue |
Other | 20% | engineer picks, critical security/data/availability/regression, urgent business priorities |
If all work within a milestone is picked, engineers are free to choose what to work on. Acceptable options include:
We follow the iteration process outlined by the Engineering function.
We estimate issues async and aim to provide an initial estimate (weight) for all issues scheduled for an upcoming milestone.
We require a minimum of two estimations for weighing an issue. We consider reacting with a ➕ emoji to the estimation as agreeing with it (and thus contributing to the minimal count of estimations). If both estimations agree, the engineer who did the second estimation should add the agreed-upon weight to the issue. If there is disagreement, the second engineer should @-mention the first one to resolve the conflict.
In planning and estimation, we value velocity over predictability. The main goal of our planning and estimation is to focus on the MVC, uncover blind spots, and help us achieve a baseline level of predictability without over-optimizing. We aim for 70% predictability instead of 90%.
We default spike issues to a weight of 8.
If an issue has many unknowns where it's unclear if it's a 1 or a 5, we will be cautious and estimate high (5).
If an initial estimate needs to be adjusted, we revise the estimate immediately and inform the Product Manager. The Product Manager and team will decide if a milestone commitment needs to be changed.
Issues estimation examples
Weight | Definition | Example (Engineering) |
---|---|---|
1 | The simplest possible change. We are confident there will be no side effects. | Add missing metric definition for "counts_monthly.promoted_issues", Add instrumentation classes for license standard metrics, Update Registration Features text |
2 | A simple change (minimal code changes), where we understand all of the requirements. | VersionApp: Add indexed on other tables that are exported, Set values for StandardContext in Frontend |
3 | A simple change, but the code footprint is bigger (e.g. lots of different files, or tests effected). The requirements are clear. | Update Registration Features CTA for repository size limit, More paid features available to free users |
5 | A more complex change that will impact multiple areas of the codebase, there may also be some refactoring involved. Requirements are understood but you feel there are likely to be some gaps along the way. | Spike Service Ping health dashboard, Remove deprecated metric status |
8 | A complex change, that will involve much of the codebase or will require lots of input from others to determine the requirements. | Dispatch Snowplow events from their event definitions, Add metrics yml files for usage data metrics definition |
13 | A significant change that may have dependencies (other teams or third-parties) and we likely still don't understand all of the requirements. It's unlikely we would commit to this in a milestone, and the preference would be to further clarify requirements and/or break in to smaller Issues. | Create Snowplow monitoring framework, Enable batch counting for some individual queries |
? | For issues where don't know how to estimate |
The following is a guiding mental framework for engineers to consider when contributing to estimates on issues.
### Refinement / Weighing
**Ready for Development**: Yes/No
<!--
Yes/No
Is this issue sufficiently small enough, or could it be broken into smaller issues? If so, recommend how the issue could be broken up.
Is the issue clear and easy to understand?
-->
**Weight**: X
**Reasoning**:
<!--
Add some initial thoughts on how you might break down this issue. A bulleted list is fine.
This will likely require the code changes similar to the following:
- replace the hexdriver with a sonic screwdriver
- rewrite backups to magnetic tape
- send up semaphore flags to warn others
Links to previous example. Discussions on prior art. Notice examples of the simplicity/complexity in the proposed designs.
-->
**Iteration MR/Issues Count**: Y
<!--
Are there any opportunities to split the issue into smaller issues?
- 1 MR to update the driver worker
- 1 MR to update docs regarding mag tape backups
Let me draw your attention to potential caveats.
-->
**Documentation required**: Y/N
<!--
- Do we need to add or change documentation for the issue?
-->
To properly set expectations for product managers and other stakeholders, our team may decide to add a due date onto an issue. Due dates are not meant to pressure our team but are instead used to communicate an expected delivery date.
We may also use due dates as a way to timebox our iterations. Instead of spending a month on shipping a feature, we may set a due date of a week to force ourselves to come up with a smaller iteration.
Our team mostly follows the Product Development Timeline as our group is dependent on the GitLab self-managed release cycle.
The specific application of this timeline to the Analytics Instrumentation Milestone planning process is summarized below.
Phase | Time |
---|---|
Planning & Breakdown Phase | 4th - 17th of month N |
Development Phase | 18th of month N - 17th of month N+1 |
Timeline: 4th - 17th of month N
Tasks:
Timeline: 18th of month N – 17th of month N+1.
Tasks:
Our milestone capacity tells us how many issue weights we can expect to complete in a given milestone. To estimate this we calculate the average daily weight completed by an engineer per day across the previous two milestones. This is multiplied with the actual working days available to us in a given milestone.
Previous Two Milestones:
Next Milestone:
In this example, the next milestone’s capacity is 64 weights for the whole team. Keep in mind that neither estimations nor this calculation are an exact science. The capacity planning is supposed to help the EM and PM set realistic expectations around deliverables inside and outside time. We do not expect to hit the exact amount of predicted weights.
A milestone commitment is a list of issues our team aims to complete in the milestone. The product team follows our GitLab principle of planning ambitiously and therefore expect that we won't always be able to deliver everything that we wanted in every milestone. After issues are broken down, estimated, and prioritized, the product manager will apply the ~Deliverable
label to applicable issues. Issues marked with the ~Deliverable
label represent the commitment we are intending to ship in that milestone.
Per the Next Prioritization initiative, we will review our team's performance in applying appropriate type labels to MRs. At the close of the milestone, on the Planning Issue, the EM or PM will post a link to this dashboard along with a summary of shipped work by type label (include null) to ensure we are observing the recommended work split of 60% feature, 30% maintenance, 10% bugs, and <=5% undefined.
In Analytics Instrumentation, determining if work is applicable to ~type::maintenance or ~type::feature is not readily apparent. As a guide, we denote work which benefits the Analytics Instrumentation team and technical processes as ~type::maintenance whereas work which benefits GitLab customers or team members is considered ~type::feature.
To help our team be efficient, we explicitly define how our team uses epics and issues.
We aim to create issues in the same project as where the future merge request will live. And we aim to create epics at the topmost-level group that makes the most sense for its collection of child epics and issues. For example, if an experiment is being run in the CustomersDot, the epic should be created in the gitlab-org
group, and the issue should be created in the gitlab-org/customers-gitlab-com
project.
We emphasize creating the epic at the topmost-level group so that it will show up on our epic roadmap. And we emphasize creating the issue in the right project to avoid having to close and move it later in the development process.
The ratio of issues to MRs is at the responsible engineer's discretion. MRs should follow the MVC principle. If it is evident in advance that an issue will require more than 2 MRs we should evaluate whether we can split the issue further to document the split of the work more clearly.
We group related issues together using parent epics and child epics, providing us with a better visual overview of our roadmap.
We use issue labels to keep us organized. Every issue has a set of required labels that the issue must be tagged with. Every issue also has a set of optional labels that are used as needed.
Required labels
~devops::analytics
~"group::analytics instrumentation"
~"workflow::planning breakdown"
, ~"workflow::ready for development"
, ~"workflow::in dev"
, etc.~"type::bug"
, ~"type::feature"
, ~"type::tooling"
, ~"type::maintenance"
(Easy to copy list: ~devops::analytics ~"group::analytics instrumentation" ~"workflow::planning breakdown" ~"workflow::ready for development" ~"workflow::in dev" ~"type::bug" ~"type::feature" ~"type::tooling" ~"type::maintenance"
)
The description for an issue assigned to our group should always include the following sections:
The sections Potential Solution(s) and How to verify can initially be empty but should to be filled when preparing the issue for development.
MR labels should mirror issue labels (which is automatically done when created from an issue):
Required labels
~section::analytics
~group::analytics instrumentation
~"type::bug"
, ~"type::feature"
, ~"type::tooling"
, ~"type::maintenance"
We tag each issue and MR with the planned milestone or the milestone at time of completion.
Our group holds synchronous meetings to gain additional clarity and alignment on our async discussions. We aim to record all of our meetings as our team members are spread across several timezones and often cannot attend at the scheduled time.
We like to share knowledge and learn! If your group would like someone from the Analytics Instrumentation group to attend a sync call and provide a brief overview of our responsibilities and scope, please open an issue and apply the ~group::analytics instrumentation
label (example issue).
In the same spirit, we want to learn more about the different teams at GitLab. If you'd like to participate in sharing information with our team, please comment in our slack channel #g_analytics_instrumentation.
If you would like to propose a new knowledge session for a topic you want to learn more about, open an issue in Analytics Instrumentation and provide the details. Issue 603 gives you a good example of how this is done.
Date | Topic / Recording | Speaker |
---|---|---|
2022-08-16 | Usage of Service Ping data | Jay Stemmer |
2023-01-10 | Service Ping Analysis Engine & Service Ping usage in Customer Success | Martin Brümmer |
(Sisense↗) We also track our backlog of issues, including past due security and infradev issues, and total open System Usability Scale (SUS) impacting issues and bugs.
(Sisense↗) MR Type labels help us report what we're working on to industry analysts in a way that's consistent across the engineering department. The dashboard below shows the trend of MR Types over time and a list of merged MRs.
(Sisense↗) Flaky test are problematic for many reasons.
(Sisense↗) Slow tests are impacting the GitLab pipeline duration.
We maintain UsageData API endpoints under the service_ping
feature to track events, and because of this we must monitor our budget spend.
To investigate budget spend, see the overview and details Grafana dashboards for Analytics Instrumentation. You can also check requests contributing to spending the budget in Kibana by filtering by the service_ping
feature. An example Kibana view can be found here.
Note that the budget spend is calculated proportionally by requests failing apdex or failing with an error, and not by how much the target is exceeded. For example, if we had an endpoint with a set goal of 1s request duration, then bringing the request duration from 10s to 5s would not improve the budget.
An OOO coverage process helps reduce the mental load of "remembering all the things" while preparing for being away from work. This process allows us to organize the tasks we need to complete before time off and make the team successful.
Open a new issue in the Analytics Instrumentation project with the out_of_office_coverage_template
.
All new team members to the Analytics Instrumentation teams are provided an onboarding issue to help ramp up on our analytics tooling. New team member members should create their own onboarding issue in the gitlab-org/analytics-section/analytics-instrumentation/internal project using the engineer_onboarding
template.
Resource | Description |
---|---|
Internal Analytics Docs | Docs for instrumenting internal analytics at Gitlab |
Analytics Instrumentation Monitoring and Troubleshooting | Information around Troubleshooting Analytics Instrumentation infrastructure |
Analytics Instrumentation Infrastructure | Information about the infrastructure we run |
Service Ping Guide | An implementation guide for Service Ping |
Privacy Policy | Our privacy policy outlining what data we collect and how we handle it |
Analytics Instrumentation Direction | The roadmap for Analytics Instrumentation at GitLab |
GitLab Performance Snowplow Dashboards | Performance dashboards for GitLab.com via Snowplow |
FAQ | A list of questions and answers related with Service Ping and Snowplow |