This is the product direction for Monitor. If you'd like to discuss this direction directly with the product managers for Monitor, feel free to reach out to Dov Hershkovitch (GitLab, Email), Sarah Waldner (GitLab, Email Zoom call) or Kevin Chu (GitLab, Email Zoom call).
The Monitor stage comes after you've configured your production infrastructure and deployed your application to it.
The mission of the GitLab Monitor stage is to provide feedback that decreases the frequency and severity of incidents and improves operational and product performance.
The categories within the Monitor stage fits together to support the mission in the following way:
The Monitor stage directly competes in several markets, including Application Performance Monitoring (APM), Log Management, Infrastructure Monitoring, IT Service Management (ITSM), Digital Experience Management (DEM) and Product Analytics. The total addressable market for the Monitor stage was already more than $1.5 billion in 2018 and is expected to grow as businesses continues to shift to digital.
All of these markets are well-established and crowded. However, they are also being disrupted by the underlying technologies used. The shift to cloud, containers, and microservices architectures changed users' expectation, and many existing vendors have struggled to keep pace. Successful vendors, such as market leader Datadog have leveraged a platform strategy to expand their markets, and even stages within DevOps.
The changes in the market have also revealed opportunities that new entrants into this stage, like GitLab, can take advantage of. Specfically, the Ops section opportunities worth re-emphasizing are:
In 2 year’s time, the Monitor stage categories of observability, incident management, and product feedback are the default choice for cloud-native teams using GitLab by being complete, cost effective, and simple to setup and operate, enabling continuous improvement.
GitLab is uniquely qualified to deliver on this bold and ambitious vision because:
A trade-off in our approach is that we are explicitly not striving to be a fully turn-key experience that can be used to monitor all applications, particularly legacy applications. Wholesale removing an existing monitoring solution is painful and a land and expand strategy is prudent here. As a customer recently explained, "Every greenfield application that we can deploy with your monitoring tools saves us money on New Relic licenses."
As this stage matures, we will begin to shift our attention and compete more directly with incumbent players as a holistic Monitoring solution for modern applications.
Dovetailing on our 2 year vision statement, our 3 year goal is to have built an integrated package of observability and operations tools that can displace today's front-runner in modern observability, Datadog and compete in all Monitor categories. We'll do that by focusing on the four core workflows of Instrument, Triage, Resolve and Improve.
The following links describe our strategy for each individual workflow:
From 2020-05 through 2020-07, the following are the goals we are pursuing within the Monitor stage.
The quarterly goals fit within the larger overarching objectives of the Monitor stage described below.
First, we plan to provide a streamline triage experience to allows our users to quickly identify and effectively troubleshoot an application problem as described in the following flow:
Detailed information can be found in the triage to minimal epic
Second, we plan to dogfood our current capabilities. Monitor and observability solutions, by nature of what they are, have a high bar to meet before adoption. By continuing to improve the triage workflow, we will at the same time enable our GitLab teammates to use GitLap Monitor more fully.
You can see our entire public backlog for Monitor at this link; filtering by labels or milestones will allow you to explore. If you find something you're interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute!
Monitor SMAU is determined by tracking how users configure, interact, and view the features contained within the stage. The following features are considered:
|Install Prometheus||Add/Update/Delete Metric Chart||View Metrics Dashboard|
|Enable external Prometheus instance integration||Download CSV data from a Metric chart||View Kubernetes pod logs|
|Enable Jaeger for Tracing||Generate a link to a Metric chart||View Environments|
|Enable Sentry integration for Error Tracking||Add/removes an alert||View Tracing|
|Enable auto-creation of issues on alerts||Change the environment when looking at pod logs||View operations settings|
|Enable Generic Alert endpoint||Selects issue template for auto-creation||View Prometheus Integration page|
|Enable email notifications for auto-creation of issues||Use /zoom and /remove_zoom quick actions||View error list|
|Click on metrics dashboard links in issues|
|Click View in Sentry button in errors list|
See the corresponding Periscope dashboard (internal).
There are a few workflows that are critical to our users in this stage.
Each of these workflows has a designated level of maturity; you can read more about our category maturity model to help you decide which categories you want to start using and when.
This workflow is planned, but not yet available.
Starting with the highest level alert, using preconfigured dashboards to review relevant metrics, enabling ad-hoc visualization and immediate drill down from time sliced metrics into logs and traces in the same screen This workflow is planned, but not yet available.
This workflow is planned, but not yet available.
There are a few product categories that are critical for success here; each one is intended to represent what you might find as an entire product out in the market. We want our single application to solve the important problems solved by other tools in this space - if you see an opportunity where we can deliver a specific solution that would be enough for you to switch over to GitLab, please reach out to the PM for this stage and let us know.
Each of these categories has a designated level of maturity; you can read more about our category maturity model to help you decide which categories you want to start using and when.
GitLab collects and displays performance metrics for deployed apps, leveraging Prometheus. Developers can determine the impact of a merge and keep an eye on their production systems, without leaving GitLab. This category is at the "complete" level of maturity.
Consolidate all of your IT alerts in GitLab. Quickly triage and investigate problems by correlating alerts to relevant metrics, logs, traces, and errors. Elevate the critical ones to incidents for speedy resolution. This category is at the "minimal" level of maturity.
Track incidents within GitLab, providing a consolidated location to understand the who, what, when, and where of the incident. Define service level objectives and error budgets, to achieve the desired balance of velocity and stability. This category is at the "viable" level of maturity.
GitLab makes it easy to view the logs distributed across multiple pods and services using log aggregation with Elastic Stack. Once Elastic Stack is enabled, you can view your aggregated Kubernetes logs across multiple services and infrastructure, go back in time, conduct infinite scroll, and search through your application logs from within the GitLab UI itself. This category is at the "viable" level of maturity.
Tracing provides insight into the performance and health of a deployed application, tracking each function or microservice which handles a given request. This makes it easy to understand the end-to-end flow of a request, regardless of whether you are using a monolithic or distributed system. This category is at the "viable" level of maturity.
Self-managed GitLab instances come out of the box with great observability tools, reducing the time and effort required to maintain a GitLab instance.
Error tracking allows developers to easily discover and view the errors that their application may be generating. By surfacing error information where the code is being developed, efficiency and awareness can be increased. This category is at the "viable" level of maturity.
Digital experience management includes both real user monitoring (passive) and synthetics monitoring (active) to allow developers to detect problems in end-to-end workflows and understand real-world performance as experienced by users. This category is planned, but not yet available.
Priority: medium • Direction
This category is planned, but not yet available.
Priority: medium • Direction
There are a number of other issues that we've identified as being interesting that we are potentially thinking about, but do not currently have planned by setting a milestone for delivery. Some are good ideas we want to do, but don't yet know when; some we may never get around to, some may be replaced by another idea, and some are just waiting for that right spark of inspiration to turn them into something special.
Remember that at GitLab, everyone can contribute! This is one of our fundamental values and something we truly believe in, so if you have feedback on any of these items you're more than welcome to jump into the discussion. Our vision and product are truly something we build together!