The product team at GitLab is working to close the DevOps loop by accelerating development on new monitoring products that will offer more observability into application performance and the health of your deployments.
Where does monitoring fit into the DevOps lifecycle?
Monitoring is the final Ops stage of the DevOps loop, coming up after the production environment is configured and the application deployed. No developer should really ship code and forget it. Monitoring is essential to proactively respond to simple and complex problems, and helps GitLab customers uphold the expectations outlined in their service level objectives (SLOs) with their users.
Our vision for monitoring at GitLab
We outlined big plans for building out our Ops capabilities in our 2018 GitLab product vision: “A big milestone for GitLab will be when operations people log into GitLab every day and consider it their main interface for getting work done.”
Since then, GitLab has been working diligently to build out our monitoring products to close the DevOps loop. The goal is to build instrumentation that allows developers to proactively identify SLO degradation and observe the impacts of code changes across multiple deployments in real-time. The "North Stars" that guide product development in the monitoring stage include:
- Instrument with ease: GitLab is set up so teams have generic observability into their application performance.
- Resolve like a pro: GitLab correlates incoming observability data with CI/CD events and source code information so troubleshooting is easy.
- Gain insights seamlessly: Our use of container-based deployments make it simpler to continuously collect insights into production SLOs, incidents, and observability sources across complex projects and multiple applications.
One of our core principles at GitLab is to dogfood everything — after all, if it doesn’t work for us, how can it work for our customers? We begin by setting up our own infrastructure teams at GitLab.com to use the incident management system we’re developing, and also building out GitLab self-monitoring so our administrators can monitor their self-managed GitLab instance the same way their developers use GitLab to monitor their applications.
We also are committed to closing the DevOps loop by prioritizing cloud native first, and building tooling designed to provide more insight in to application performance and the health of deployments for Ops professionals.
Kenny Johnston, director of product (Ops) at GitLab, gave me an overview of some of the new products the monitoring team is working on to help make this vision a reality. Watch the full video of our conversation below and check out the monitoring product roadmap for an in-depth look at our goals and timeline.
Building an observability suite to close out the DevOps loop
The top priority for the monitoring team is to close the DevOps feedback loop for GitLab customers. This means that if SLOs are degraded in any way, an alert is triggered and an incident is created in GitLab allowing for an immediate response.
Our priority product categories at this stage are metrics, cluster monitoring, and incident management, says Kenny.
“First I want to make sure that we can provide our customers with the instrumentation so that they can define an SLO, and when their application exceeds or fails to achieve that SLO, that they can respond in an instant,” says Kenny. “Once we have them doing that, we'll get a lot of good feedback, and immediate feedback from users about what tools they need for diagnostic purposes.”
Measure your performance with enhanced metrics
We already have a successful integration with open source metrics tool, Prometheus, which we use to collect and display performance metrics for applications deployed on Kubernetes. The integration is sophisticated enough that developers do not have to leave GitLab to collect important information on the impact of a merge request or to monitor production systems. Our product category for metrics is “viable,” meaning customers are using the instrumentation we’ve developed to solve real problems, bringing us a step closer to closing out the DevOps loop.
Diagnostic tooling in product categories such as logging, tracing, and error tracking for monitoring application performance (APM) is currently at the MVC stage, though the team has made plans to accelerate development on logging in future GitLab deployments.
Kenny notes that our observability suite is one of the primary ways GitLab provides value for operators that are thinking of making the move to cloud native.
“GitLab out-of-the-box keeps up with new cloud native technologies because we're constantly adopting the newest versions, and our whole convention of configuration means we don't leave it to you to figure it out, we've figured it out for you as a default,” explains Kenny.
Simplify Kubernetes management using GitLab
There is quite a bit of overlap between product category metrics and cluster monitoring at this stage, as Prometheus is used to collect metrics on applications deployed using Kubernetes. By offering out-of-the-box cluster monitoring on Kubernetes, we make it possible for operators to monitor the health of their deployed environments all in one place.
One of the high-value cluster monitoring features we’ve set up on GitLab is memory usage and capacity metrics (CPU) administration, so users can be automatically alerted if either of those numbers are out of bounds on their deployed environments.
“We'd like to start adding capabilities for cluster cost optimization, so informing users not just when they're hitting capacity but when they're significantly under capacity and should probably size down,” says Kenny. “That helps users who've configured a Kubernetes cluster to not end up wasting it because it's being underutilized and not end up wasting money.”
Cluster monitoring was brought to “viable” stages in earlier GitLab releases as we transition to Kubernetes, but the product team is building out alerting and other cluster monitoring features in upcoming releases.
Dogfooding our new incident management system on GitLab
Creating an incident management system is key to a robust observability suite on monitoring: “The features we've prioritized are oriented towards getting the right person the right information to enable them to restore the services they are responsible for as quickly as possible,” according to the category vision for an incident management system.
Because we recognize the urgency of building a functional incident management system, GitLab is leveraging issues as the base for creating a viable platform. The goal is to stress the capacity of our existing tooling by focusing on integrations with communications tools such as Slack, Zoom, etc., so we can accelerate time-to-market and iterate as we go, while also focusing on building out new functionality.
The infrastructure team on GitLab.com is dogfooding the incident management system so we can put the tooling through its paces, making improvements as we go.
Outside the loop: Getting GitLab administrators to monitor GitLab using GitLab
Kenny says the product team has a strategy for creating more exposure to the monitoring capabilities GitLab has in development: putting our monitoring capabilities front and center for administrators of the GitLab self-managed instance.
“Today you can create a project for your application that's an e-commerce app, and get the instrumentation to know whether the Kubernetes cluster is experiencing pain, whether SLOs that you custom define have alerts and respond to that with incidents,” says Kenny. “We'd like you to have that exact same experience, or expose you to that same experience with your GitLab self-managed instance, so that as an administrator you're using the same tools to monitor and respond to the GitLab instance as your developers would use to monitor and respond to their applications.”
By essentially setting up administators to dogfood the monitoring features we are providing to developers for application management, we can ensure that they're battle-tested on a larger application.
The core challenge of the observability suite
While the product team at GitLab has a vision and roadmap for building a comprehensive suite of observability instrumentation, there isn’t a clear consensus among monitoring experts as to what is required for a robust observability suite in this new, cloud native world.
“There's varied opinion in the new world that's Kubernetes-based about what an observability system looks like,” says Kenny. “There's a legacy view that seems to be evolving. So, we need to keep up with that and of the industry's evolution of what we consider required. We as a company just need to stay focused on what our users are asking for, and that's why I think completing that DevOps loop is important first, because then we'll start getting immediate user feedback.”
Keep an eye out for these new monitoring updates in our 12.2 and 12.3 releases.
Cover photo by Glen . on Unsplash.
“Anticipate SLO degredation and manage performance with new GitLab monitoring tooling” – Sara Kassabian
Click to tweet