The Data Team Handbook contains a large amount of information! To help you navigate the handbook we've organized it into the following major sections:
The collective set of people, projects, and initiatives focused on advancing the state of data at GitLab is called the GitLab Data Program. GitLab has two primary distinct groups within the Data Program who use data to drive insights and business decisions. These groups are complementary to one another and are focused on specific areas to drive a deeper understanding of trends in the business. The two teams are the (central) Enterprise Data Team and, separately, Function Analytics Teams located in Sales, Marketing, Product, Engineering or Finance. Watch the Data Recruiting Video to hear from some of the teams involved and what they are working on.
The Data Team reports into Business Technology and is the Center of Excellence for enterprise insights & analytics (not operational), data science, data platform & infrastructure, BI technologies, master data, data governance and data quality. The Data Team is also responsible for the enterprise data strategy, building enterprise-wide data models, providing Self-Service Data capabilities, maintaining the data platform, developing Data Pumps, and monitoring and measuring Data Quality. The Data Team is responsible for data that is defined and accessed on a regular basis by GitLab team members from the Snowflake Enterprise Data Warehouse. The Data Team builds data infrastructure to power approximately 80% of the data that is accessed on a regular basis. The Data Team also provides a Data Science center of excellence to launch new advanced analytics initiatives and provide guidance to other GitLab team members.
Function Analytics Teams reside and report into their respective divisions and departments. These teams perform specific analysis for business activities and workflows that take place within the function. These teams perform ad-hoc analysis and develop dashboards based on the urgency and importance of the analysis required, following the Data Development approach. The most important and repeatable analysis will be powered by the centralized Trusted Data Model managed by the central Data Team. Function Analytics Teams also build function-specific/ad-hoc data models and business insights models to solve for urgent and operational needs, not requiring trusted data features. Function Analytics Teams work closely with the Data Team in a variety of ways: expand GitLab's overall analytics capabilities, extend the Data Catalog, provide requirements for new Trusted Data models and dashboards, validate metrics, and help drive prioritization of work asked of the Data Team. When data gaps are found in our business processes and source systems, the team members will provide requirements to product management, sales ops, marketing ops, and others to ensure the source systems capture correct data.
The GitLab Data Program includes teams focused in the following areas:
List of the Data Program Teams meeting series, subject DRIs, slack channels, and initiatives can be found in the Data Program Collaboration Hub page.
On a normal operational basis, the Data Team and Function Analyst teams work in a "Hub & Spoke" model, with the Data Team serving as the "Hub" and Center of Excellence for analytics, analytics technology, operations, and infrastructure, while the "Spokes" represent each Division or Departments Function analysts. Function analysts develop deep subject matter expertise in their specific area and leverage the Data Team when needed. From time to time, the Data Team provides limited development support for GitLab Departments that do not yet have dedicated Function Analysts or those teams which do have dedicated Function Analysts, but might need additional support. The teams collaborate through Slack Data Channels, the GitLab Data Project, and ad-hoc meetings.
The Data Platform Team & Architecture Team is part of the Enterprise Data Team and focuses on building and maintaing secure, efficent, and reliable data systems data infrastructure. The Data Platform & Architecture Team is both a development team and an operations/site reliability team. The team supports all Data Fusion Teams with available, reliable, and scalable data compute, processing, and storage. Platform components include the Data Warehouse, New Data Sources, Data Pumps, Data Security, and related new data technology. The Data Platform team also drives the Data Management processes. The Data Platform Team is composed of Data Engineers.
The Analytics Engineering Team** transforms raw data into clean, structure and usable formats for data decision-making. The Analytics Engineering team also drives Enterprise Data Program and supports the wider data community. The team focuses on inventorying, integrating, maintaining, and governing the data at an Enterprise level. This includes collaborating with the business units and data teams in establishing and facilitating commonly accepted guidelines around Enterprise data along with building enterprise-wide data models, supporting Self-Service BI and Analytical capabilities by providing Data Enablement and required training to the Users on Enterprise Data Models.
The Enterprise Insights & Data Science Team utilize analytics and Machine Learning (ML) for insights into customer behavior and company performance. The Enterprise Insights & Data Science team focuses on delivering a complete view of the customer (Customer 360), predict customers that are likely to buy, expand or churn, develop models to predit the long-term value of customers, create detailed customer profiles, and deliver insights on company performance. The Team acts as a Center of Excellence for predictive analytics and supports other teams in their data science endeavours by developing tooling, processes, and best practices for data science and machine learning. List of the current projects can be found in the Data Science handbook page.
The job families are designed to support all of the routine activities expected of a Data Team. In FY22 we are introducing two new job families, Data Scientist and Analytics Engineer.
Our impact will be measured against 4 dimensions (these metrics will adjust as our data maturity increases and our focus areas change):
Data Monthly Active Users (DMAU): DMAU Measures the direct usage of the Data Platform by GitLab Team Members based on usage of the primary analysis tools we provide: Snowflake, Tableau, and Sisense. Over time we will include additional tools such as Jupyter and Data Studio, as well as usage of data pumped into EApps such as Marketo (PQLs), Gainsight (Usage Data), and Salesforce (Propensity Scores). The DMAU worksheet stores the code and historical stats and a visualization of these numbers can be found in the Data Monthly Active Users report.
Data Maturity Score: measured annually, evaluates our current data maturity against 8 data capabilities: 1. Strategy & Approach 2. Culture & leadership 3. Metrics & KPIs 4. Organization & Skills 5. Architecture & Integration 6. Governance & Quality 7. Deployment & Usage 8. Technology & Operations
First we have the evaluation criteria known as Dollar Value of our Results as calculated by the Data Value Calculator. We can use the Data Team Value Calculator to calculate the dollar value of the initiatives we contribute to and the issues we complete. Additionally we want to shift to a more aspirational measurement which is to measure the ARR impact or efficiency gain from each of our data products.
You can also tag subsets of the Data Team using:
Except for rare cases, conversations with folks from other teams should take place in #data, and possibly the fusion team channels when appropriate. Posts to other channels that go against this guidance should be responded to with a redirection to the #data channel, and a link to this handbook section to make it clear what the different channels are for.
The Data Team primarily uses these groups and projects on GitLab:
You can tag the Data Team in GitLab using:
|TECH GUIDES||INFRASTRUCTURE||DATA TEAM|
|SQL Style Guide||High Level Diagram||How We Work|
|dbt Guide||System Data Flows||Team Organization|
|Python Guide||Data Sources||Calendar|
|Airflow & Kubernetes||Snowplow||Triage|
|Data CI Jobs||DataSiren||Planning Drumbeat|
|Rstudio Guide||Trusted Data||Data Science Team|
|Jupyter Guide||Data Management|
|Experimentation Best Practices|
|Sisense Style Guide|
|Tableau Style Guide|