Product Section Direction - Data Science

The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.

Section Overview

The Data Science section is comprised of two stages:

AI-powered - AI-powered workflows boost efficiency and reduce cycle times with the help of AI.
ModelOps - enable GitLab to be used for machine learning and artificial intelligence use cases.

Team and Investment

GitLab's Data Science section was introduced in late 2021 and has grown and laid the foundation of data science at GitLab throughout 2022 and 2023 along with the introduction of GitLab Duo, our suite of AI-Powered capabilities. In 2024 we continued to invest heavily in Data Science use cases across the platform and build features to enable our customers to more effectively and efficiently build software using GitLab Duo. We're also developing new MLOps capabilities that help customers build ML/AI into their software with our ModelOps stage.

To learn more about GitLab’s investment areas, please visit the Product Investments section of the GitLab Handbook.

Aligning Use Cases

This section aligns cross-functional teams and organizational structures across Product, Engineering, UX, and technical writing teams. This streamlines the management chain of all individuals across functions as well as aligns unique product development areas of focus and challenges. Both the ModelOps and AI-powered stages share some unique properties that other Gitlab sections/stages do not:

Unlike our vertical groups and stages, both ModelOps and AI-powered stages horizontally cross all other stages and sections at GitLab. Both stages will interact with features across the platform and the data that underlies those features in order to provide their core value.
A key focus on consuming and leveraging product usage data to provide customer value. That data spans the entire platform and includes user activity and repository metadata. This will be used by the AI Assisted group to enrich GitLab features to make them more intelligent and automated. ModelOps will integrate AI/ML specific features and workflows into the wider GitLab product making it more efficient for customers to develop AI/ML technologies into their products built with GitLab.
Automation and Action are key tenets for these product areas. Smart defaults will enable customers to discover new features, recommendations will surface features across the platform based on usage heuristics, and automations will reduce the overhead of managing and operating a GitLab instance.
Wide surface area. Both stages require 'T'-shaped knowledge that is both broad across all GitLab features, but also deep in the specific knowledge areas of data science and anti-abuse.

Important PI milestones

We've established a Data Science internal handbook PI page (internal link) which will be updated monthly as part of PI review meetings. We're still working to actively orchestrate all our performance indicator metrics.

3 Year Section Themes

Reduce complexity

With complex toolchains and new vendors emerging every day the data science landscape is a lot of glue and ducktape holding many systems together. We want to streamline this complexity into the GitLab platform to reduce complexity, remove maintenance burden, and enable faster model development and exploration.

As examples, GitLab will provide:

Native integrations to popular data science toolchains and open-source frameworks.
First-party solutions for ModelOps workloads.
Open APIs to allow flexibility through the platforms.

Repeatability for Collaboration

Many data science teams struggle with lack of repeatability cobbling together environments on local machines. These environments rarely have source code management or CI. We want to bring the best practices of DevOps with SCM and CI/CD to data sciences and make it easy for them to start with repeatable and stable environments.

As examples, GitLab will provide:

Improved Python Notebook experience across GitLab
Support for more powerful compute within GitLab runner
Simplified CI configuration for popular data science toolchains

Smooth HandOffs

Model handoffs are only one part of the collaboration needed to make data science handoffs smooth. We want to create seamless handoffs across the software development lifecycle of data science workloads, from connecting data to pipelines, managing model code, and the deployment to production. GitLab already is critical for modern software developers managing production applications. We'll bring the best of our existing DevOps platform to data scientists.

As examples, GitLab will provide:

Model registry for management and versioning of ML/AI Learning models
Open APIs for smooth handoffs whether you are using GitLab tools or integrating your choice toolchains
Integrations across existing GitLab features to better support data science workloads and enrich our platform with AI/ML technologies
Intelligent recommendations and suggestions across existing GitLab features to increase velocity and increase efficiency.

Data in Motion

Long gone are the days of static data. Data today is in motion. It's always being created, moved, transformed, and drifting. It's in the cloud and sometimes many clouds. Modern data science toolchains need to support cloud-native, data in motion.

As examples, GitLab will provide:

Native data connectors to cloud-native data warehouses
Basic ELT tools to prepare data for data science workloads
Integrated data versioning and feature stores for tracking data definitions
Real time platform usage insights

Pricing

We expect the Data Science section will provide multiple monetization strategies across all GitLab plans with features targeted for data science use cases and Insider Threat detection capabilities. These paid features will follow GitLab's pricing themes to determine how to package various features we develop.

Duo Addons

AI-Powered GitLab Duo features will be priced with additional addon pricing due to the material ongoing costs to deliver these functionality with paid API calls to AI vendors (like Google Vertex AI and Anthropic) who provide powerful and state of the art Large Language Models (LLMs). Usage of GitLab Duo capabilities generate millions of LLM API requests and process billions of input and output tokens from these AI vendors. Learn more about GitLab Duo

Ultimate

Data Science aims to make GitLab smarter and more automated using ML. Features we develop will help organizations automate their portfolio management, improve their security posture, and detect Insider Threats.

As a general rule of thumb, features will fall in the Ultimate tier when they meet one or more of the following criteria:

The feature is focused on enabling an organization or enterprise to operate at scale rather than an individual with a few smaller personal projects
The feature is natively developed or acquired by GitLab rather than being provided by an open-source project
The feature has a significant ongoing cost for GitLab to maintain and update the feature

Some examples include:

Features provided by our acquisition of UnReview

Premium

Features targeted at premium will include a focus on enabling data science use cases across existing GitLab features like source code management (SCM), CI/CD as well as help protect precious intellectual property like source code hosted within GitLab. We want GitLab natively to support data science workloads and much of the value of managing workloads is found in the premium tier which ModelOps will seek to enhance.

Free

Although paid features are the primary focus, there are several reasons why features for unpaid tiers might be prioritized above paid features:

Data Science workloads are increasing across all industries and verticals, though many organizations are still only dabbling in ML/AI. We want to ensure we support these organizations at every stage of the software development lifecycle which in turn will encourage them to find more value in our paid tiers as they become more advanced with their use cases.
Data Science is still very new. The wider open source community has contributed greatly to many frameworks and tools to enable the foundations of AI/ML as we currently know them. To be good stewards in the open-source community basic integrations we support to popular open-source data science tools will be available in an unpaid tier by default, along with the "table stakes" set of functionality required to allow that feature to be usable with GitLab.

As a general rule of thumb, features will fall in the Core/Free tier when they meet one or more of the following criteria:

The feature is primarily for an individual with a few small projects rather than meeting the needs of an organization or enterprise that is operating at scale
The feature is provided by an integration with an open-source project rather than being natively developed by GitLab
The ongoing cost for GitLab to maintain and update the feature is relatively minimal

Some examples include:

Basic support for Python notebooks in source code management (SCM)
Basic GPU support in self-hosted GitLab Runner

Target audience

GitLab identifies who our DevSecOps application is built for utilizing the following categorization. We list our view of who we will support when in priority order:

🟩 - Targeted with strong support
🟨 - Targeted but incomplete support
⬜️ - Not targeted but might find value

Today

To capitalize on the potential opportunities, the AI-Powered and ModelOps Stages have features that make it useful to the following personas today:

🟩 - Developers
🟨 - Data scientists
🟨 - Data analysts
🟨 - Security Teams
🟨 - QA engineers / QA Teams

Medium Term (1-2 years)

As we execute our 3 year strategy, our medium-term (1-2 year) goal is to provide a single DevSecOps application that enables collaboration between developers, data teams, data scientists, and engineers across organizations.

🟩 - Developers
🟩 - Data scientists
🟩 - Data analysts
🟩 - Security Teams
🟨 - QA engineers / QA Teams

Developers

Data Science workloads can be complicated and can leverage specialized hardware and development environments not common to traditional software development teams. The ModelOps stage is focused on the intersection of data scientists exploring models and feature development and the developers who must then deploy those data science features into production.

Personas

Data Scientists

Data scientists have unique roles within organizations. They are more scientists than developers, following hypotheses and data to explore models and develop data science-powered features.

We aim to serve data scientists as they balance art and science within software engineering teams. Data scientists wear a lot of hats to get from hypothesis to data science feature that generates value. GitLab is not a tool of choice for data scientists and we aim to change that by making it easy to configure, build, and execute data science feature development within GitLab.

Personas

Security Teams

The larger the organization, the harder it is for security teams to stay on top of everything happening in complex, ever-changing environments. As an organization's source code management and DevSecOps platform, GitLab holds a lot of sensitive, high-value data. We want to help security teams secure that data. This is a job to which automated data science features can be well suited, including monitoring high-value assets around the clock.

Personas

Last Reviewed: 2024-10-05
Last Updated: 2024-10-05

Product Section Direction - Data Science

On this page

Section Overview

Team and Investment

Aligning Use Cases

Important PI milestones

3 Year Section Themes

Reduce complexity

Repeatability for Collaboration

Smooth HandOffs

Data in Motion

Pricing

Duo Addons

Ultimate

Premium

Free

Target audience

Today

Medium Term (1-2 years)

Developers

Data Scientists

Security Teams

Pricing

Contact Us

Product

Topics

Solutions

Resources

Company

Suggestions

Product Section Direction - Data Science

On this page

Section Overview

Team and Investment

Aligning Use Cases

Important PI milestones

3 Year Section Themes

Reduce complexity

Repeatability for Collaboration

Smooth HandOffs

Data in Motion

Pricing

Duo Addons

Ultimate

Premium

Free

Target audience

Today

Medium Term (1-2 years)

Developers

Data Scientists

Security Teams