The Data Team Handbook contains a large amount of information! To help you navigate the handbook we've organized it into the following major sections:
The collective set of people, projects, and initiatives focused on advancing the state of data at GitLab is called the GitLab Data Program. GitLab has two primary distinct groups within the Data Program who use data to drive insights and business decisions. These groups are complementary to one another and are focused on specific areas to drive a deeper understanding of trends in the business. The two teams are the (central) Data Team and, separately, Function Analytics Teams located in Sales, Marketing, Product, Engineering or Finance. Watch the Data Recruiting Video to hear from some of the teams involved and what they are working on.
The Data Team reports into Business Technology and is the Center of Excellence for analytics, analytics technology, operations, and infrastructure. The Data Team is also responsible for analytics strategy, building enterprise-wide data models, providing Self-Service Data capabilities, maintaining the data platform, developing Data Pumps, and monitoring and measuring Data Quality. The Data Team is responsible for data that is defined and accessed on a regular basis by GitLab team members from the Snowflake Enterprise Data Warehouse. The Data Team builds data infrastructure to power approximately 80% of the data that is accessed on a regular basis. The Data Team also provides a Data Science center of excellence to launch new advanced analytics initiatives and provide guidance to other GitLab team members.
Function Analytics Teams reside and report into their respective divisions and departments. These teams perform specific analysis for business activities and workflows that take place within the function. These teams perform ad-hoc analysis and develop dashboards based on the urgency and importance of the analysis required, following the Data Development approach. The most important and repeatable analysis will be powered by the centralized Trusted Data Model managed by the central Data Team. Function Analytics Teams also build function-specific/ad-hoc data models and business insights models to solve for urgent and operational needs, not requiring trusted data features. Function Analytics Teams work closely with the Data Team in a variety of ways: expand GitLab's overall analytics capabilities, extend the Data Catalog, provide requirements for new Trusted Data models and dashboards, validate metrics, and help drive prioritization of work asked of the Data Team. When data gaps are found in our business processes and source systems, the team members will provide requirements to product management, sales ops, marketing ops, and others to ensure the source systems capture correct data.
The teams which compose the GitLab Data Program include:
On a normal operational basis, the Data Team and Function Analyst teams work in a "Hub & Spoke" model, with the Data Team serving as the "Hub" and Center of Excellence for analytics, analytics technology, operations, and infrastructure, while the "Spokes" represent each Division or Departments Function analysts. Function analysts develop deep subject matter expertise in their specific area and leverage the Data Team when needed. From time to time, the Data Team provides limited development support for GitLab Departments that do not yet have dedicated Function Analysts or those teams which do have dedicated Function Analysts, but might need additional support. The teams collaborate through Slack Data Channels, the GitLab Data Project, and ad-hoc meetings.
To support stable business workflows and longer-term initiatives, the Data Team has deployed three sub-teams, or Data Fusion Teams. A single Data Fusion Team includes members from Division leadership, Function Analysts, and the Data Team. Data Fusion Teams collaborate towards development of data solutions, including Dashboards, Data Models, Data Pipelines, Data Pumps, and ad-hoc analysis. Data Fusion Teams are staffed with the appropriate set of team members required to develop a full data solution. The Data Fusion Teams meet on a weekly basis, define, review, and align on priorities, define quarterly objectives, and collaboratively develop successful data solutions.
The Go-To-Market Data Fusion Team focuses on the activities required to bring GitLab's products and services to the marketplace and measuring the business performance of these activities. The team meets on a weekly basis and covers topics defined in the GTM Data Weekly Agenda, while the teams priorities are defined in the GTM Data Priorities sheet. Data Team staff allocation is defined in Team Organization.
The Research & Development Data Fusion Team focuses on understanding how GitLab's products are used by customers, towards the goal of developing complete customer intelligence and product intelligence capabilities to make better decisions. The team meets on a weekly basis and covers topics defined in the R&D Data Weekly Agenda, while the team's priorities are defined in the R&D Data Priorities sheet. Data Team staff allocation is defined in Team Organization.
The General & Administrative Data Fusion Team focuses on organization performance of GitLab teams, including People metrics. The G&A Fusion Team is currently
under construction and is expected to launch in late FY22. The team currently meets on a weekly basis as part of the Engineering & Data Team Weekly Sync. Data Team staff allocation is defined in Team Organization.
The Data Platform & Engineering Team is a critical team within the larger Data Team and focuses on development and operations of data infrastructure. The Data Platform Team is both a development team and an operations/site reliability team. The team supports all Data Fusion Teams with available, reliable, and scalable data compute, processing, and storage. Platform components include the Data Warehouse, New Data Sources, Data Pumps, Data Security, and related new data technology. The Data Platform Team is composed of Data Engineers.
The Data Science Team is a new team within the larger Data Team and works on projects such as customer propensity to buy and churn predictions, customer clustering and segmentation, product sentiment analysis, and Machine Learning frameworks and processes. The team also focuses on creating standard tooling, processes, and practices for performing advanced analytics in support of all GitLab Teams. As the Data Science program matures, the Data Science Team plans to partner with GitLab’s new MLOps Group to provide feedback to help improve the product. The Data Science Team’s planned FY22 projects focus on Sales and Customer Adoption:
The job families are designed to support all of the routine activities expected of a Data Team. In FY22 we are introducing two new job families, Data Scientist and Analytics Engineer.
You can also tag subsets of the Data Team using:
Except for rare cases, conversations with folks from other teams should take place in #data, and possibly the fusion team channels when appropriate. Posts to other channels that go against this guidance should be responded to with a redirection to the #data channel, and a link to this handbook section to make it clear what the different channels are for.
The Data Team primarily uses these groups and projects on GitLab:
You can tag the Data Team in GitLab using:
|TECH GUIDES||INFRASTRUCTURE||DATA TEAM|
|SQL Style Guide||High Level Diagram||How We Work|
|dbt Guide||System Data Flows||Team Organization|
|Python Guide||Data Sources||Calendar|
|Airflow & Kubernetes||Snowplow||Triage|
|Data CI Jobs||DataSiren||Planning Drumbeat|
|SiSense Style Guide||Trusted Data||Data Science Team|
|Experimentation Best Practices|