This is the product vision for Monitor. If you'd like to discuss this vision directly with the product manager for Monitor, feel free to reach out to Sarah Waldner via e-mail or by scheduling a video call.
The Monitor stage comes after you've configured your production infrastructure and deployed your application to it. As part of the verification and release process you've done some performance validation - but you need to ensure your service(s) maintain the expected service-level objectives (SLOs) for your users.
GitLab's Monitor stage product offering makes instrumentation of your service easy, giving you the right tools to prevent, respond to, and restore SLO degradation. Current DevOps teams either lack exposure to operational tools or utilize ones that put them in a reactive position when complex systems fail inexplicably. Our mission is to empower your DevOps teams by finding operational issues before they hit production and enabling them to respond like pros by leveraging default SLOs and responses they proactively instrumented. GitLab Monitoring allows you to successfully complete the DevOps loop, not just for the features in your product, but for its performance and user experience as well.
Using dashboards, and in the future a status page, we provide an easy way for you to gain a holistic understanding of the state of your production services across multiple groups and projects. When you are deploying a suite of services, it's critical that you can drill into each individual services SLO attainment as well as troubleshoot issues which span multiple services.
We track epics for all the major deliverables associated with the north stars, and category maturity levels. You can view them on our Monitor Roadmap.
We're pursuing a few key principles within the Monitor Stage.
Your team's service(s), first and foremost, need to be observable before you are able to evaluate production performance characteristics. We believe that observability should be easy. GitLab will ship with smart conventions that setup your applications with generic observability. We will also make it simple to instrument your service, so that custom metrics, ones that you'd like to build your own SLOs around, can be added with a few lines of code.
We want to help teams resolve outages faster, accelerating both the troubleshooting and resolution of incidents. GitLab's single platform can correlate the incoming observability data with known CI/CD events and source code information, to automatically suggest potential root causes.
Continuously learning and driving those insights back into your development cycle is a critical part of the DevOps loop. The tools in the Monitor stage make it possible to gain insights about production SLOs, incidents and observability sources across the multi-project systems that make up a complete application.
Container based deployments have rapidly expanded the amount of observability information available. It is no longer possible to collate and visualize this information without automation and distillation of valuable insights which GitLab can do for you.
We'll also provide views across a suite of applications so that managers of a large number of DevOps or Operations teams can get a quick view of their application suite, and team's health.
Our north stars are the guide posts for where we are headed. Our principles inform how we will get there. First and foremost we abide by GitLab's universal Product Principles. There are a few unique principles to the Monitor stage itself.
As part of our general principle of Flow One the Monitor stage will seek to complete the full observability feedback loop for limited use cases first, before moving on to support others. As a starting point this will mean supoprt for modern, cloud-native developers first.
In modern DevOps organizations developers are expected to also operate the services they develop. In many cases this expectation isn't met. Whether a developer is the one operating an application or not, we will build tools that work for those doing the operator job. This means forgoing preferences, like developers to avoid deep production troubleshooting, and instead building tools that allow those who operate to be best-in-class operators, regardless of their title.
Our users can't expect a complete set of Monitoring tools if we don't utilize it ourselves for instrumenting and operating GitLab. That's why we will dogfood everything.
We will start with GitLab Self-Monitoring and our own Infrastructure teams. We want self-managed administrator users to utilize the same tools to observe and respond to health alerts about their GitLab instance as they would to monitor their own services. We'll also complete our own DevOps loop by having our Infrastructure teams for GitLab.com utilize our incident management feature.
There are a few product categories that are critical for success here; each one is intended to represent what you might find as an entire product out in the market. We want our single application to solve the important problems solved by other tools in this space - if you see an opportunity where we can deliver a specific solution that would be enough for you to switch over to GitLab, please reach out to the PM for this stage and let us know.
Each of these categories has a designated level of maturity; you can read more about our category maturity model to help you decide which categories you want to start using and when.
GitLab collects and displays performance metrics for deployed apps, leveraging Prometheus. Developers can determine the impact of a merge and keep an eye on their production systems, without leaving GitLab. This category is at the "viable" level of maturity.
GitLab makes it easy to view the logs of running pods in connected Kubernetes clusters. By displaying the logs directly in GitLab, developers can avoid having to manage console tools or jump to a different interface. This category is at the "minimal" level of maturity.
Tracing provides insight into the performance and health of a deployed application, tracking each function or microservice which handles a given request. This makes it easy to understand the end-to-end flow of a request, regardless of whether you are using a monolithic or distributed system. This category is at the "minimal" level of maturity.
Self-managed GitLab instances come out of the box with great observability tools, reducing the time and effort required to maintain a GitLab instance. This category is planned, but not yet available.
Out-of-the-box Kubernetes cluster monitoring let you know the health of your deployment environments with traceability back to every issue and code change as part of a single application for end-to-end DevOps. This category is at the "viable" level of maturity.
Error tracking allows developers to easily discover and view the errors that their application may be generating. By surfacing error information where the code is being developed, efficiency and awareness can be increased. This category is at the "minimal" level of maturity.
Simulate user activity within your application, to detect problems in end-to-end workflows and undestand real-world performance. This category is planned, but not yet available.
Track incidents within GitLab, providing a consolidated location to understand the who, what, when, and where of the incident. Define service level objectives and error budgets, to achieve the desired balance of velocity and stability. This category is at the "minimal" level of maturity.
Easily communicate the status of your services to users and customers. This category is planned, but not yet available.
In general, we follow the same prioritization guidelines as the product team at large. Issues will tend to flow from having no milestone, to being added to the backlog, to being added to this page and/or a specific milestone for delivery.
You can see our entire public backlog for Monitor at this link; filtering by labels or milestones will allow you to explore. If you find something you're interested in, you're encouraged to jump into the conversation and participate. At GitLab, everyone can contribute!
Issues with the "direction" label have been flagged as being particularly interesting, and are listed in the section below.
Incidentlabel to issues created by the Alert Bot
environmentdrop down to pod logs screen Ultimate
There are a number of other issues that we've identified as being interesting that we are potentially thinking about, but do not currently have planned by setting a milestone for delivery. Some are good ideas we want to do, but don't yet know when; some we may never get around to, some may be replaced by another idea, and some are just waiting for that right spark of inspiration to turn them into something special.
Remember that at GitLab, everyone can contribute! This is one of our fundamental values and something we truly believe in, so if you have feedback on any of these items you're more than welcome to jump into the discussion. Our vision and product are truly something we build together!