The Monitor group at GitLab is responsible for building tools that enable DevOps teams to respond to, triage and remediate errors and IT alerts for the systems and applications they maintain. We work in parallel with the APM group, who is responsible for GitLab's suite of application performance monitoring solutions (Logging, Metrics and Tracing). Together, we aim to provide a streamlined Operations experience within GitLab that enables the individuals who write the code, to maintain it at the same time.
This section details the happenings within the Monitor group. At any given time this section will have the top three most exiciting things and/or accomplishments of the team.
We recently experienced a surge in early adopters for the Incident Management category!
In 13.5 we added the Service Level Agreement timer to incidents to help teams manage and meet their Service Level Agreements with their customers.
We overhauled all documentation for Incident Management to make it better organized and discoverable.
The Monitor Group’s mission is to decrease the frequency and severity of incidents. By helping our users respond to alerts and incidents with a streamlined workflow, and capturing useful artifacts for feedback and improvement, we can accomplish our mission. To get started, we simply want to understand if our product is helping to address some of our customer’s needs. With that in mind, our initial Product Performance Indicator is to increase the total count of incidents within GitLab. This NSM will inform us if we are on the right path to provide meaningful incident response tools.
We expect to track the journey of users through the following funnel:
Please view this sisense chart for a list of events we have instrumented for Monitor categories.
This chart looks for the Snowplow categories the Monitor team has created. Currently, the Monitor team has added
Alert Management and
If any additional categories get added, please amend this page and edit the Sisense chart accordingly.
|Crystal Poole||Backend Engineering Manager, Monitor:Monitor|
|Tristan Read||Senior Frontend Engineer, Monitor:Monitor|
|Olena HK.||Senior Frontend Engineer, Monitor:Monitor|
|Peter Leitzen||Senior Backend Engineer, Monitor:Monitor|
|Sarah Yasonik||Backend Engineer, Monitor:Monitor|
|Vitali Tatarintev||Senior Backend Engineer, Monitor:Monitor|
|Sean Arnold||Backend Engineer, Monitor:Monitor|
|David O'Regan||Senior Frontend Engineer, Monitor:Monitor|
|Sarah Waldner||Senior Product Manager, Monitor:Monitor|
|Amelia Bauerly||Senior Product Designer, Monitor:Monitor|
|Amy Qualls||Senior Technical Writer, Configure, Monitor, Growth (Activation, Conversion)|
|Justin Mandell||Product Design Manager, Configure, Monitor, Secure & Protect|
|Andrej Kiripolský||UX Researcher, Configure, Monitor, Enablement|
|Kevin Chu||Group Manager of Product Management, Configure, Monitor, Release|
This team maps to the Monitor Group category and focuses on:
Note: Started in 13.1 milestone.
It is pretty common for issues that were assigned and/or picked up by engineer(s) to have the
workflow::ready for development label but not be actionable or ready for development. Sometimes, this is because the scope of the issue is unclear or follow up questions and answers are needed in order to get started. Depending on an engineer(s) timezone relative to other team members (with answers to the questions), an issue could take several working days before an engineer is able to start working on the issue. This additional "waiting time" before being able to start on an issue can be inefficient; especially in the beginning of the milestone. This is because there are more issues assigned and/or picked up in the beginning of a milestone which creates a bottleneck before development can begin. We call this "waiting time", issue refinement.
By distributing the work of refining issues to a point where issues are actionable when the
workflow::ready for development label is set throughout the milestone we can become more efficient with engineer(s) development time, which will lead to higher MR rate.
For this experiment, we will have a new column on our current milestone issue board called
workflow::refinement. Engineering manager(s) will work with the PM to determine which issues should be placed in that column. Engineer(s) are expected to refine 1-2 issues per milestone.
When an engineer is ready to refine an issue, the engineer should:
workflow::ready for development
To surface blockers, mention your Engineering Manager in the issues, and then contact them via slack and or 1:1's. Also make sure to raise any blockers in your daily async standup using Geekbot.
The engineering managers want to make unblocking their teams their highest priority. Please don't hesitate to raise blockers
The Product Manager is responsible for scheduling issues in a given milestone. During the backlog refinement portion of our weekly meeting, all parties will make sure that issues are scoped and well-defined enough to implement and whether they need UX involvement and/or technical investigation.
As we approach the start of the milestone, Engineering Managers are responsible for adding the ~deliverable label to communicate which issues we are committing to finish in the given milestone. Generally, the Engineering Manager will use the prioritized order of issues in the milestone to determine which issues to label as ~deliverable. The Product Manager will have follow-up conversations with the Engineering Managers if the deliverables do not meet their expectations or if there are other tradeoffs we should make.
We use the following values for estimating the effort of issues to help determine our capacity during the planning process.
When new bugs are reported, the engineering managers ensure that they have proper Priority and Severity labels. Bugs are discussed during backlog refinement session and are scheduled according to severity, priority, and the capacity of the teams. Ideally, we should work on a few bugs each release regardless of priority or severity.
As new technical debt issues are created, the engineering manager and product manager will triage, prioritize and schedule these issues on a weekly basis using the Monitor - Tech Debt board. Issues with a P1 or P2 will be scheduled during milestone planning. When new issues are created by Monitor team members, add any relevant context to the description about the priority or timing of the issue, as this will help streamline the triage work.
Priorities for scheduling technical debt will apply as follows:
Every Friday, each engineer is expected to provide a quick async issue update by commenting on their assigned issues using the following template:
<!--- Please be sure to update the workflow labels of your issue to one of the following (that best describes the status)" - ~"workflow::In dev" - ~"workflow::In review" - ~"workflow::verification" - ~"workflow::blocked" --> ### Async issue update 1. Please provide a quick summary of the current status (one sentence). 1. When do you predict this feature to be ready for maintainer review? 1. Are there any opportunities to further break the issue or merge request into smaller pieces (if applicable)?
We do this to encourage our team to be more async in collaboration and to allow the community and other team members to know the progress of issues that we are actively working on.
Community contributions are encouraged and prioritized at GitLab. Please check out the Contribute page on our website for guidelines on contributing to GitLab overall.
Within the Monitor stage, Product Management will assist a community member with questions regarding priority and scope. If a community member has technical questions on implementation, Engineering Managers will connect them with engineers within the team to collaborate with.
Engineers use spikes to conduct research, prototyping, and investigation to gain knowledge necessary to reduce the risk of a technical approach, better understand a requirement, or increase the reliability of a story estimate (paraphrased from this overview). When we identify the need for a spike for a given issue, we will create a new issue, conduct the spike, and document the findings in the spike issue. We then link to the spike and summarize the key decisions in the original issue.
Engineers should typically ignore the suggestion from Dangerbot's Reviewer Roulette and assign their MRs to be reviewed by a frontend engineer or backend engineer from the Monitor stage. If the MR has domain specific knowledge to another team or a person outside of the Monitor Stage, the author should assign their MR to be reviewed by an appropriate domain expert. The MR author should use the Reviewer Roulette suggestion when assigning the MR to a maintainer.
Advantages of keeping most MR reviews inside the Monitor Stage include:
Product designers generally try to work one milestone ahead of the engineers, to ensure scope is defined and agreed upon before engineering starts work. So, for example, if engineering is planning on getting started on an issue in 12.2, designers will assign themselves the appropriate issues during 12.1, making sure everything is ready to go before 12.2 starts.
To make sure this happens, early planning is necessary. In the example above, for instance, we'd need to know by the end of 12.0 what will be needed for 12.2 so that we can work on it during 12.1. This takes a lot of coordination between UX and the PMs. We can (and often do) try to pick up smaller things as they come up and in cases where priorities change. But, generally, we have a set of assigned tasks for each milestone in place by the time the milestone starts so anything we take on will be in addition to those existing tasks and dependent on additional capacity.
The current workflow:
Though Product Designers make an effort to keep an eye on all issues being worked on, PMs add the UX label to specific issues needing UX input for upcoming milestones.
The week before the milestone starts, the Product Designers divide up issues depending on interest, expertise and capacity.
Product Designers start work on assigned issues when the milestone starts. We make an effort to start conversations early and to have them often. We collaborate closely with PMs and engineers to make sure that the proposed designs are feasible.
In terms of what we deliver: we will provide what's needed to move forward, which may or may not include a high-fidelity design spec. Depending on requirements, a text summary of the expected scope, a balsamiq sketch, a screengrab or a higher fidelity measure spec may be provided.
When we feel like we've achieved a 70% level of confidence that we're aligned on the way forward, we change the label to ~'workflow::ready for development' as a sign that the issue is appropriately scoped and ready for engineering.
We usually stay assigned to issues after they are ~'workflow::ready for development' to continue to answer questions while the development process is taking place.
Finally, when development is complete, we conduct UX Reviews on the MRs to ensure that what's been implemented matches the spec.
In order to develop and test Zoom features for the integration with GitLab we now have our own Zoom sandbox account.
To request access to this Zoom sandbox account please open an issue providing your non-GitLab email address (which can already be associated an existing non-GitLab Zoom account).
User Type- most likely
Add- the users receive invitations via email
While we try to keep our process pretty light on meetings, we do hold a Monitor:Monitor Backlog Refinement meeting weekly to triage and prioritize new issues, discuss our upcoming issues, and uncover any unknowns.
The Monitor team uses labels for issue tracking and to organize issue boards. Many of the labels we use also drive reporting for Product Management and Engineering Leadership to track delivery metrics. It's important that labels be applied correctly to each issue so that information is easily discoverable.
||Yes||Identifies which stage of GitLab an issue is assigned to.|
||Yes||Identifies which team this issue belongs to. This triggers new issues to appear in the weekly triage report for the team's Product and Engineering managers.|
|Team||Yes||Identifies which team (or both) will develop a solution.|
|Milestone||%#.##||No||While technically not a label, if the issue is being worked on immediately, add the current milestone. If you know when the issue needs to be scheduled (such as follow-up work), add the future milestone that it should be scheduled in. Otherwise, leave it empty.|
|Priority||Yes, when scheduled||If an issue is scheduled in the current milestone it mush have a
||Issues committed to being completed in the current milestone.|
||Issues which are not committed in the current milestone. These are typically either stretch goals, technical debt or non-customer facing.|
||Issues that need further input from team members in order for it to be
||Waiting on external factors or another issue to be completed before work can resume.|
||The issue is refined and ready to be scheduled in a current or future milestone.|
||Issues that are actively being worked on by a developer.|
||Issues that are undergoing code review by the development team.|
||Everything has been merged, waiting for verification after a deploy.|
In our group, the (frontend + backend) engineering managers are responsible for adding the
~deliverable label to any issues that the team is publicly stating that to the best of their ability, they expect that issue to be completed in that milestone. We are not perfect but our goal is that 100% of the issues with that label do ship in the release that they are scheduled in. This allows engineering to share what issues they commit to and helps set expectations for the product manager and for the community.
Just like the rest of the company, we use PTO by Roots to track when team members are traveling, attending conferences, and taking time off. The easiest way to see who has upcoming PTO is to run the
/pto-roots whosout command in the
#g_monitor_standup slack channel. This will show you the upcoming PTO for everyone in that channel.