According to Gartner "Application performance monitoring (APM) is a suite of monitoring software comprising digital experience monitoring (DEM), application discovery, tracing and diagnostics, and purpose-built artificial intelligence for IT operations". The APM team in GitLab is responsible for building a suite of monitoring solutions focusing on Logs, Metrics and Tracing all within GitLab UI you can read more about our monitoring visions for Logs, Metrics and Traces.
The APM group's mission is to help our customers decrease the frequency and severity of their production issues. As such, we've defined the team's Performance Indicator (PI) to be the total number of metrics and log views. The decision to use this NSM is the following:
Since the PI is a single metric, we've decided to combine those two metrics into a single one, representing the team's PI.
|Matt Nohr||Backend Engineering Manager, Monitor:APM|
|Reuben Pereira||Backend Engineer, Monitor:APM|
|Ryan Cobb||Backend Engineer, Monitor:APM|
|Mikołaj Wawrzyniak||Backend Engineer, Monitor:APM|
|Clement Ho||Frontend Engineering Manager, Monitor|
|Jose Ivan Vargas||Frontend Engineer, Monitor:APM|
|Dhiraj Bodicherla||Senior Frontend Engineer, Monitor:APM|
|Miguel Rincon||Senior Frontend Engineer, Monitor:APM|
|Andrei Stoicescu||Frontend Engineer, Monitor:APM|
|Justin Mandell||Product Design Manager, Configure, Monitor, Secure & Defend|
|Sofia Vistas||Software Engineer in Test Monitor:APM (primary) & Monitor:Health (secondary)|
|Dov Hershkovitch||Senior Product Manager, Monitor:APM|
|Nadia Sotnikova||Product Designer, Monitor|
|Amy Qualls||Senior Technical Writer, Configure, Monitor|
|Kevin Chu||Group Manager of Product Management, Configure & Monitor|
The APM group is responsible for:
This team maps to the APM Group category.
The APM Group is responsible for providing the underlying libraries and tools to enable GitLab team-members to instrument their code. When adding new metrics, we need to consider a few facets: the impact on GitLab.com, customer deployments, and whether any default alerting rules should be provided.
Recommended process for adding new metrics:
We try to adhere to best practices from across the company in how we work. For example, our Product Manager owns the problem validation backlog and problem validation process as outlined in the Product Development Workflow and follows the Product Development Timeline. Engineers follow the Engineering Workflow.
In addition, here are some additional details on how we work.
When adding a new issue for the Monitor:APM group, follow these guidelines:
When creating a new issue, try to consider if this issue can be completed in a single milestone, with the collaboration of at most one frontend and/or one backend engineer and one UX team member. If your issue is larger than that, consider creating an epic or splitting your issue in smaller issues.
On a regular basis the product manager will review any new issues and schedule them for the correct milestone. This often happens during the Monitor:APM weekly meeting.
Often we need to create an issue to start a discussion about a new idea or feature. These are issues that do not have immediate implementation work, but rather are for discussion and will, in the future, lead to new issues to implement our idea.
Here is the process we use for those types of issues:
We try to break issues into small, deliverable pieces. To do this we use the
workflow::planning breakdown as described in the product development flow. This lets the team know that the issue needs to be broken down before we can start implementation. Anyone on the team can look for issues in this workflow state and break them down.
For each milestone we create a Planning issue and follow the process as described in the Ops section monthly cadence. As the list of issues is finalized, the correct
Filler labels are applied to the issues and they are assigned to the correct milestone. We also highlight work that is being done for UX, testing, and technical writing. The APM Next Milestone Board can be used to see what issues are planned for this upcomming milestone.
The priority issues for any given milestone are labeled with the
Deliverable label. In addition, we want to plan for work that can be completed once the
Deliverable issues are complete. We label these
As an experiment starting with the 13.2 milestone, we are going to try to limit the scope of these issues to about 1-2 days worth of development work.
To start the next milestone, the engineering manager will apply the
deliverable label to any issues that we have a high likelihood of completing.
The product manager will apply the
release post item label to the top issues for the upcoming milestone that we want to highlight in the Kickoff call.
As an engineer is available to start a new issue, he/she can self-assign the next highest priority issue. Once assigned, the engineer is responsible for keeping the workflow labels up-to-date and providing async issue updates (see below). If the issue will not be complete in the current milestone, the engineer assigned is also responsible for rescheduling the issue.
We use standard workflow labels on issues as described in the product development flow. Specifically we use
workflow::ready for development when the issue has enough information to start development,
workflow::In dev as we are working on the issue,
workflow::In review when a merge request is in review, and
workflow::verification after the merge request has been merged and we are testing the change in staging and production.
It is the responsibility of the assigned engineer for an issue to keep the workflow label up-to-date for the issue. We use the APM Workflow board to visualize the issues.
Engineers should typically ignore the suggestion from Dangerbot's Reviewer Roulette and assign their MRs to be reviewed by a frontend engineer or backend engineer from the Monitor stage. If the MR has domain specific knowledge to another team or a person outside of the Monitor Stage, the author should assign their MR to be reviewed by an appropriate domain expert. The MR author should use the Reviewer Roulette suggestion when assigning the MR to a maintainer.
Advantages of keeping most MR reviews inside the Monitor Stage include:
Every Friday, each engineer is expected to provide a quick async issue update by commenting on their assigned issues using the following template:
<!--- Please be sure to update the workflow labels of your issue to one of the following (that best describes the status)" - ~"workflow::In dev" - ~"workflow::In review" - ~"workflow::verification" - ~"workflow::blocked" --> ### Async issue update 1. Please provide a quick summary of the current status (one sentence). 1. When do you predict this feature to be ready for maintainer review? 1. Are there any opportunities to further break the issue or merge request into smaller pieces (if applicable)?
We do this to encourage our team to be more async in collaboration and to allow the community and other team members to know the progress of issues that we are actively working on.
Towards the end of a milestone, if we find any issues that are not going to be completed, it is the responsibility of the assigned engineer to follow this process for moving the issue to the next milestone.
We only use issue weights when we have to move an issue from one milestone to the next. This is to help us understand how much remaining work we have for any issue that had to move. For example, we may schedule an issue that just needs a final review differently than an issue that has not been started. We use a simple 1 to 10 scale to estimate the remaining work:
|10||100% Remaining/Not Started|
To help us stay on track with the Development KPIs, we track merge request rate and mean time to merge:
Starting in 13.2, we are experimenting with some best practices to increase our MR Rate and decrease our Mean Time to Merge metrics. Here are some of the things we are working on:
While we try to keep our process pretty light on meetings, we do hold a Monitor APM Weekly Meeting to triage and prioritize new issues, discuss our upcoming issues, and uncover any unknowns.