The Monitor:Observability group at GitLab is responsible for building tools that enable DevOps teams to respond to, triage and remediate errors and IT alerts for the systems and applications they maintain.
Based on the recent Opstrace acquisition, we aim to provide a streamlined Operations experience within GitLab that enables the individuals who write the code, to maintain it at the same time.
Our primary goals right now are to deliver a set of 3 Integration Milestones:
We have Geekbot standups on Weds and retrospectives on Fridays. We use these async standups to communicate what we have accomplished, any current blockers and what we plan to work on next.
Weekly Meetings: These are focused on organizing ongoing work or specific efforts such as rollout-outs or bigger initiatives.
Bi-weekly Cross-functional meeting: This weekly meeting is focused on aligning the EM, PM, Principal Engineer, Developer Advocate, and UX on cross-functional objectives. Goals are set and weekly status is communicated.
Bi-monthly social hour: This meeting is non-work related and helps team socialize and get to know each other better.
Team member coffee chats: Each team member should schedule a coffee chat with all other team members rough every 4-6 weeks. Feel free to discuss work or non-work topics. If timezones are an issue find another way to connect, such as a async slack thread to checkin. The goal is to get to know your other team members on a 1:1 basis.
Dev Syncs: These are developer-organized sync meetings where ICs can meet and discuss technical issues or organize technical work amongst themselves without requiring the presence of a EM.
We use several Slack channels to organize ourselves:
Currently, during our initial phase, we are using a 2 month milestone cadence. All work is organized into Epics, sub-epics, and assigned to the relevant Milestone.
Normally at the beginning of the Milestone the EM will discuss an overview of the work and what relevant areas you will focus on. Sometimes issues will already be assigned to you before the Milestone begins.
If you are ever looking for additional issues to work on:
workflow:in devlabel to the issue
|Sebastien Pahl||Principal Product Manager, Monitor:Observability|
|Andy Volpe||Staff Product Designer, Secure:Composition Analysis, Configure, Monitor, Secure, Govern|
|Daniel Croft||Senior Engineering Manager, Monitor & Runner|
|Joe Shaw||Senior Backend Engineer, Monitor:Observability|
|Mat Appelman||Principal Engineer, Monitor|
|Nicole Williams||Interim Senior Engineering Manager, Monitor & Runner|
|Nicholas Klick||Engineering Manager, Monitor:Observability and Acting Backend Engineering Manager, Verify:Runner SaaS|
|Kevin Chu||Group Manager of Product Management, Configure, Monitor, Release|
A Pulse survey was conducted for the Opstrace team and the feedback was focused on a few key areas:
The Observability team is involved in the introduction of several new technologies and technical components to GitLab's tech stack.
The GitLab Monitor Stage Product Direction Handbook Page has information about the product strategy for integrating GitLab and Opstrace.
We also encourage you to read our architecture documentation.
Observability and analytics features have big data and insert heavy requirements which are not a good fit for Postgres or Redis. ClickHouse was selected as a good fit to meet these features requirements. ClickHouse is an open-source column-oriented database management system. It is attractive for these use cases because it can efficiently filter, aggregate, and sum across large numbers of rows. ClickHouse is not intended to replace Postgres or Redis in GitLab's stack.
ClickHouse is the backend datastore for these features (currently under development):
ClickHouse is also being considered as a backend for:
GitLab Observability UI is a data-visualization library used to visualize metrics. We intend to extend it to visualize logging and tracing data as these features are added to the platform. GitLab Observability UI is based on a version of Grafana prior to the license change from Apache to AGPL. This approach was chosen for the following reasons:
Cortex is a highly scalable timeseries database for Prometheus. Cortex has been part of the Opstrace stack since its founding and the team has invested energy in everything from scale testing and developing its own operator.
Moving forward, we are proposing to remove Cortex from the stack for several reasons:
We would love community input on this proposal: https://gitlab.com/gitlab-org/opstrace/opstrace/-/issues/1656
Timeline: April 6 - June 22, 2022
Many of the Goals and reasoning is discussed in the borrow request proposal.
"Clickhouse integrated as part of a standard Opstrace + GitLab .COM deployments with Error Tracking backed by Clickhouse enabled by default."
We will communicate ongoing progress via:
(Sisense↗) MR Type labels help us report what we're working on to industry analysts in a way that's consistent across the engineering department. The dashboard below shows the trend of MR Types over time and a list of merged MRs.