Gitlab hero border pattern left svg Gitlab hero border pattern right svg

APM Group

APM

According to Gartner "Application performance monitoring (APM) is a suite of monitoring software comprising digital experience monitoring (DEM), application discovery, tracing and diagnostics, and purpose-built artificial intelligence for IT operations". The APM team in Gitlab is responsible for building a suite of monitoring solutions focusing on Logs, Metrics and Tracing all within Gitlab UI you can read more about our monitoring visions for Logs, Metrics and Traces.

Backend Team members

Person Role
Matt Nohr Engineering Manager, Monitor:APM
Reuben Pereira Backend Engineer, Monitor:APM
Ryan Cobb Backend Engineer, Monitor:APM
Adrien Kohlbecker Senior Backend Engineer, Monitor:APM
David Wilkins Senior Backend Engineer, Monitor:APM
MikoĊ‚aj Wawrzyniak Backend Engineer, Monitor:APM
Kirstie Cook Backend Engineer, Monitor:APM

Frontend Team members

Person Role
New Vacancy - Clement Ho (Interim) Frontend Engineering Manager, Monitor:APM
Jose Ivan Vargas Frontend Engineer, Monitor:APM
Dhiraj Bodicherla Senior Frontend Engineer, Monitor:APM
Miguel Rincon Senior Frontend Engineer, Monitor:APM
Andrei Stoicescu Frontend Engineer, Monitor:APM

Stable counterparts

Person Role
Achilleas Pipinellis Senior Technical Writer, Create, Package, Monitor, Secure, Defend
Amelia Bauerly Product Designer, Monitor & Package
Sofia Vistas Software Engineer in Test Monitor:APM & Monitor:Metrics
Dov Hershkovitch Senior Product Manager, Monitor:APM
Nadia Sotnikova Product Designer, Monitor
Matthew Nearents Senior Product Designer, Monitor
Kevin Chu Group Manager, Product, Monitor

Responsibilities

The APM group is responsible for:

This team maps to the APM Group category.

How to work with APM

Adding new metrics to GitLab

The APM Group is responsible for providing the underlying libraries and tools to enable GitLab team-members to instrument their code. When adding new metrics, we need to consider a few facets: the impact on GitLab.com, customer deployments, and whether any default alerting rules should be provided.

Recommended process for adding new metrics:

  1. Open an issue in the desired project outlining the new metrics desired
  2. Label with the ~group::apm label, and ping @gl-monitoring for initial review
  3. During implementation consider:
    1. The Prometheus naming and instrumentation guidelines
    2. Impact on cardinality and performance of Prometheus
    3. Whether any alerts should be created
  4. Assign to an available APM Group reviewer

How We Work

We try to adhere to best practices from across the company in how we work. For example, our Product Manager owns the problem validation backlog and problem validation process as outlined in the Product Development Workflow and follows the Product Development Timeline. Engineers follow the Engineering Workflow.

In addition, here are some additional details on how we work.

Adding New Issues

When adding a new issue for the Monitor:APM group, follow these guidelines:

When creating a new issue, try to consider if this issue can be completed in a single milestone, with the collaboration of at most one frontend and/or one backend engineers and one UX team member. If your issue is larger than that, consider creating an epic or splitting your issue in smaller issues.

On a regular basis the product manager will review any new issues and schedule them for the correct milestone. This often happens during the Monitor:APM weekly meeting.

Creating Issues for Discussion

Often we need to create an issue to start a discussion about a new idea or feature. These are issues that do not have immediate implementation work, but rather are for discussion and will, in the future, lead to new issues to implement our idea.

Here is the process we use for those types of issues:

  1. Create an issue with a title like "Discussion: My Great Idea". This issue can then be assigned to specific people for comments and be assigned to a specific milestone. We do not use epics for this type of discussion because we have found it is hard to keep track of epics on our main issue boards.
  2. We should continue to update the description of the issue as we find new information or refine our ideas.
  3. Once the discussion around the new idea gets to a point where we want to start breaking it down into implementation details, we create an epic. We use epics at this point so we can be sure to group all the issues together and still have the discussion comments in one place that can be easily referenced. We do this in one of two ways:
    1. Promote the issue to an epic.
    2. Close the original issue, create a new epic, and then add a link from the epic to the original issue.
  4. Create issues to cover the different iterations of implementation. Each issue should be small enough to be completed in a single milestone. If there are dependencies between these issues, we should be sure to include that information for planning purposes.

Breaking Down Issues

We try to break issues into small, deliverable pieces. To do this we use the workflow::planning breakdown as described in the product development flow. This lets the team know that the issue needs to be broken down before we can start implementation. Anyone on the team can look for issues in this workflow state and break them down.

Prioritizing Issues

Before the start of a milestone, the product manager is responsible for organizing the APM Planning Board by putting all issues for the upcoming milestone in priority order. By using the planning board as a priority list, and by keeping it in order, then we should always be able to look at the current and upcoming milestone columns to have a prioritized list of upcoming work.

Starting a Milestone

To start the next milestone, the engineering manager will apply the deliverable label to any issues that we have a high likelyhood of completing.

The product manager will apply the release post item label to the top issues for the upcoming milestone that we want to highlight in the Kickoff call.

Assigning Issues

As an engineer is available to start a new issue, he/she can self-assign the next highest priority issue. Once assigned, the engineer is responsible for keeping the workflow labels up-to-date and providing async issue updates (see below). If the issue will not be complete in the current milestone, the engineer assigned is also responsible for rescheduling the issue.

Workflow Labels

We use standard workflow labels on issues as described in the product development flow. Specifically we use workflow::ready for development when the issue has enough information to start development, workflow::In dev as we are working on the issue, workflow::In review when a merge request is in review, and workflow::verification after the merge request has been merged and we are testing the change in staging and production.

It is the responsibility of the assigned engineer for an issue to keep the workflow label up-to-date for the issue. We use the APM Workflow board to visualize the issues.

Assigning MRs for code review

Engineers should typically ignore the suggestion from Dangerbot's Reviewer Roulette and assign their MRs to be reviewed by a frontend engineer or backend engineer from the Monitor stage. If the MR has domain specific knowledge to another team or a person outside of the Monitor Stage, the author should assign their MR to be reviewed by an appropriate domain expert. The MR author should use the Reviewer Roulette suggestion when assigning the MR to a maintainer.

Advantages of keeping most MR reviews inside the Monitor Stage include:

Weekly async issue updates

Every Friday, each engineer is expected to provide a quick async issue update by commenting on their assigned issues using the following template:

<!---
Please be sure to update the workflow labels of your issue to one of the following (that best describes the status)"
- ~"workflow::In dev"
- ~"workflow::In review"
- ~"workflow::verification"
- ~"workflow::blocked"
-->
### Async issue update
1. Please provide a quick summary of the current status (one sentence).
1. When do you predict this feature to be ready for maintainer review?
1. Are there any opportunities to further break the issue or merge request into smaller pieces (if applicable)?

We do this to encourage our team to be more async in collaboration and to allow the community and other team members to know the progress of issues that we are actively working on.

Rescheduling Issues

Towards the end of a milestone, if we find any issues that are not going to be completed, it is the responsibility of the assigned engineer to follow this process for moving the issue to the next milestone.

  1. Add a comment to the issue with what work is remaining.
  2. Add the to schedule label
  3. Add an issue weight (see below)
  4. Move to the next milestone

Issue Weights

We only use issue weights when we have to move an issue from one milestone to the next. This is to help us understand how much remaining work we have for any issue that had to move. For example, we may schedule an issue that just needs a final review differently than an issue that has not been started. We use a simple 1 to 10 scale to estimate the remaining work:

Weight Meaning
1 10% Remaining
5 50% Remaining
10 100% Remaining/Not Started

Recurring Meetings

While we try to keep our process pretty light on meetings, we do hold a Monitor APM Weekly Meeting to triage and prioritize new issues, discuss our upcoming issues, and uncover any unknowns.

Async Daily Standups

The purpose of our async standups is to allow every team member to have insight into what everyone else is doing and whether anyone is blocked and could use help. This should not be an exhaustive list of all of your tasks for the day, but rather a summary of the major deliverable you are hoping to achieve. All question prompts are optional. We use the geekbot slack plugin to automate our async standup in the #g_monitor_standup_apm channel. Every team member should be added to the async standup by their manager.

Repos we own or use

Issue boards