Would you like to contribute? Become a Data Champion, recommend an improvement, visit Slack #data, watch a Data Team video. We want to hear from you!
The Data Team Handbook contains a large amount of information! To help you navigate the handbook we've organized it into the following major sections:
The collective set of people, projects, and initiatives focused on advancing the state of data at GitLab is called the GitLab Data Program. GitLab has two primary distinct groups within the Data Program who use data to drive insights and business decisions. These groups are complementary to one another and are focused on specific areas to drive a deeper understanding of trends in the business. The two teams are the (central) Data Team and, separately, Function Analytics Teams located in Sales, Marketing, Product, Engineering or Finance. Watch the Data Recruiting Video to hear from some of the teams involved and what they are working on.
The Data Team reports into Business Technology and is the Center of Excellence for analytics, analytics technology, operations, and infrastructure. The Data Team is also responsible for analytics strategy, building enterprise-wide data models, providing Self-Service Data capabilities, maintaining the data platform, developing Data Pumps, and monitoring and measuring Data Quality. The Data Team is responsible for data that is defined and accessed on a regular basis by GitLab team members from the Snowflake Enterprise Data Warehouse. The Data Team builds data infrastructure to power approximately 80% of the data that is accessed on a regular basis. The Data Team also provides a Data Science center of excellence to launch new advanced analytics initiatives and provide guidance to other GitLab team members.
Function Analytics Teams reside and report into their respective divisions and departments. These teams perform specific analysis for business activities and workflows that take place within the function. These teams perform ad-hoc analysis and develop dashboards based on the urgency and importance of the analysis required, following the Data Development approach. The most important and repeatable analysis will be powered by the centralized Trusted Data Model managed by the central Data Team. Function Analytics Teams also build function-specific/ad-hoc data models and business insights models to solve for urgent and operational needs, not requiring trusted data features. Function Analytics Teams work closely with the Data Team in a variety of ways: expand GitLab's overall analytics capabilities, extend the Data Catalog, provide requirements for new Trusted Data models and dashboards, validate metrics, and help drive prioritization of work asked of the Data Team. When data gaps are found in our business processes and source systems, the team members will provide requirements to product management, sales ops, marketing ops, and others to ensure the source systems capture correct data.
The GitLab Data Program includes teams focused in the following areas:
List of the Data Program Teams meeting series, subject DRIs, slack channels, and initiatives can be found in the Data Program Collaboration Hub page.
On a normal operational basis, the Data Team and Function Analyst teams work in a "Hub & Spoke" model, with the Data Team serving as the "Hub" and Center of Excellence for analytics, analytics technology, operations, and infrastructure, while the "Spokes" represent each Division or Departments Function analysts. Function analysts develop deep subject matter expertise in their specific area and leverage the Data Team when needed. From time to time, the Data Team provides limited development support for GitLab Departments that do not yet have dedicated Function Analysts or those teams which do have dedicated Function Analysts, but might need additional support. The teams collaborate through Slack Data Channels, the GitLab Data Project, and ad-hoc meetings.
To support stable business workflows and longer-term initiatives, the Data Team has deployed three sub-teams, or Data Fusion Teams. A single Data Fusion Team includes members from Division leadership, Function Analysts, and the Data Team. Data Fusion Teams collaborate towards development of data solutions, including Dashboards, Data Models and ad-hoc analysis. Data Fusion Teams are staffed with the appropriate set of team members required to develop a full data solution. The Data Fusion Teams meet on a weekly basis, define, review, and align on priorities, define quarterly objectives, and collaboratively develop successful data solutions. Data Team staff allocation into Fusion Teams is defined in Fusion Team Assignments.
The Go-To-Market Data Fusion Team focuses on the activities required to bring GitLab's products and services to the marketplace and measuring the business performance of these activities. The team meets on a weekly basis and covers topics defined in the GTM Data Weekly Agenda, while the team's priorities are defined in the GTM Data Priorities sheet.
The Research & Development Data Fusion Team focuses on understanding how GitLab's products are used by customers, towards the goal of developing complete customer intelligence and product intelligence capabilities to make better decisions. The team meets on a weekly basis and covers topics defined in the R&D Data Weekly Agenda, while the team's priorities are defined in the R&D Data Priorities sheet.
The General & Administrative Data Fusion Team focuses on organization performance of GitLab teams, including People metrics. The team currently meets with Engineering Analytics on a bi-weekly basis as part of the Engineering & Data Team Weekly Sync and with the People Analytics Team in the People Analytics <> Data Team Sync.
The Analytics Engineering team also drives Enterprise Data Program and supports the wider data community. The team focuses on inventorying, integrating, maintaining, and governing the data at an Enterprise level. This includes collaborating with the business units and data teams in establishing and facilitating commonly accepted guidelines around Enterprise data along with building enterprise-wide data models, supporting Self-Service BI and Analytical capabilities by providing Data Enablement and required training to the Users on Enterprise Data Models.
The Data Platform Team is a critical team within the larger Data Team and focuses on development and operations of data infrastructure. The Data Platform Team is both a development team and an operations/site reliability team. The team supports all Data Fusion Teams with available, reliable, and scalable data compute, processing, and storage. Platform components include the Data Warehouse, New Data Sources, Data Pumps, Data Security, and related new data technology. The Data Platform team also drives the Data Management processes. The Data Platform Team is composed of Data Engineers.
The Data Science Team facilitates making better decisions faster by delivering descriptive, predictive, and prescriptive solutions that promote and improve GitLab's KPIs. Team also acts as a Center of Excellence for predictive analytics and supports other teams in their data science endeavours by developing tooling, processes, and best practices for data science and machine learning. List of the current projects can be found in the Data Science handbook page.
The job families are designed to support all of the routine activities expected of a Data Team. In FY22 we are introducing two new job families, Data Scientist and Analytics Engineer.
We measure the impact of Data in the following ways:
DMAU Measures the direct usage of the Data Platform by GitLab Team Members based on usage of the primary analysis tools we provide: Snowflake, Tableau, and Sisense. Over time we will include additional tools such as Jupyter and Data Studio, as well as usage of data pumped into EApps such as Marketo (PQLs), Gainsight (Usage Data), and Salesforce (Propensity Scores). The DMAU worksheet stores the code and historical stats and a visualization of these numbers can be found in the Data Monthly Active Users report.
- Data Monthly Active Users (DMAU) = Unique Sisense Users + Unique Snowflake Users + Unique Tableau Users in a given month
- Quarterly Data Monthly Active Users (Q-DMAU) = Unique Sisense Users + Unique Snowflake Users + Unique Tableau Users across all months in a quarter OR sum(months in quarter)/
- Note: Users of Sisense, Tableau and Snowflake might be double counted if they access multiple systems. We do not count distinct users across the tools.
First we have the evaluation criteria known as Dollar Value of our Results as calculated by the Data Value Calculator. We can use the Data Team Value Calculator to calculate the dollar value of the initiatives we contribute to and the issues we complete. Additionally we want to shift to a more aspirational measurement which is to measure the ARR impact or efficiency gain from each of our data products.
In order to measure this we need to have
We measure this based on time of team members spent on Level 1 and Level 2 work
We periodically generate Data Team CSAT to seek feedback from internal customers.
This performance indicator tracks the financial position of the actual cost vs the planned costs for the data infrastructure (warehouse, ETL pipelines, etc.).
Aligns with the following core business objectives:
This performance indicator tracks the mean time since last system failure based on set cadence.
Aligns with the following core business objectives:
This performance indicator tracks the mean time to get the system back up and running based on set cadence.
Aligns with the following core business objectives:
Data Team
Percentage of all company data in data warehouse
Aligns with the following core business objectives:
This performance indicator tracks all metrics related to achieve the service-level objective (SLO) per data source.
You can also tag subsets of the Data Team using:
Except for rare cases, conversations with folks from other teams should take place in #data, and possibly the fusion team channels when appropriate. Posts to other channels that go against this guidance should be responded to with a redirection to the #data channel, and a link to this handbook section to make it clear what the different channels are for.
The Data Team primarily uses these groups and projects on GitLab:
You can tag the Data Team in GitLab using: