The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
The Data Science Section focuses on transforming data into useful insights and actions.
The Data Science section is comprised of two stages:
GitLab's Data Science section was introduced in late 2021 and has grown and laid the foundation of data science at GitLab throughout 2022. 2023 will see us continue to invest in Data Science usecases across the platform and build features to enable our customers to more effectively and efficiently build ML/AI into their products.
To learn more about GitLab’s investment areas, please visit the Product Investments section of the GitLab Handbook.
Over the last decade GitLab has helped companies navigate digital transformation into software companies.
Digital Transformation is about creating new opportunities for your business to drive innovation and efficiency, to improve how your teams work, and to leapfrog ahead of competitors, all with the goal of delivering new and improved customer experiences.
Every industry is undergoing a transformation. Customers are expecting more and retaining them is increasingly difficult as your competitor is now just a click away. Irrespective of your industry, technology now needs to be front and center of your offering as competition is coming from unexpected sources.
Companies are increasingly leveraging data within their businesses to power next generation software powered by machine learning (ML) and artificial intelligence (AI). We believe that the next stage of digital transformation is software companies adopting ML/AI to power next generation, data rich applications. This comes with new challenges with managing the big data needed to power these algorithms, and unique challenges running AI/ML at scale including data cleaning, job orchestration, model training/testing/deployment, and observability.
Leveraging over a decade of experience with DevOps best practices, we're aiming to support businesses making this data science transformation. This section focuses on the new challenges of building these data rich, highly interactive, ML/AI applications. Our ModelOps stage will extend the GitLab platform to enrich features with data science features while also enabling customers to build ML/AI workloads with GitLab.
Most businesses today generate a lot of data. Data about their customers, their products, metadata and more. Businesses are literally drowning in data and struggle on to extract value from it to power next generation applications and experiences for their customers.
As businesses advance their digital transformations they increasingly create more applications that generate more and more data. This creates challenges just to manage all that data. From storing, aggregating, cleaning, organizing, and even deleting data. That's all just the management of the data, not actually doing anything with it. Many organizations also have data in many different locations, from within their applications themselves, in bespoke data stores, or possibly even the cloud. This leads organizations to build data warehouses where they can manage and unify disparate data sources. This is where the concept of Extract, Load, and Transform (ELT or ETL) derives. ELT platforms have become big businesses with organizations spending lots of money to just store and organize all their data. Data comes at a cost and thus organizations need to extract value from it, that leads to the next challenge.
With businesses generating endless data streams and spending money to store, manage, and organize it, it's easy to understand why organizations want to extract value from it. Most businesses today have internal business intelligence groups or data analysts who comb through this data looking for insights and ways to extract insights. These insights might be used to answer business questions about what product features to build next, or power next generation customer experiences. It all comes down to extracting value from data. This is usually how data science gets started within an organization.
Data analysts and data sciences within organizations work with the vast data businesses have within their data warehouses cleaning, organizing, and deriving data into more useful forms. As organizations become more data driven they tend to increase the integration of data into their customer facing applications. This introduces new software development lifecycle (SDLC) challenges. Applications that are data rich usually need connections to data, that data flows through applications which has to be managed leading to more complex software development. The most modern organizations are now even embedding real time data science into their applications further complicating software stacks. Live data flows through applications, through ML and AI models which make realtime decisions and outcomes based on the data flowing through them leading to even more complex applications. All of this introduces new challenges within the software development lifecycle (SLDC) that have to be managed by engineering teams that build, deploy, and run these customer facing applications.
Looking back over the past decade of software engineering we've seen a transition of companies going through digital transformations to become software companies. Today most companies are software companies. Part of GitLab's historical success has been helping companies streamline complex software development lifecycles into our single application DevOps platform reducing complexity and speeding up time to value. We're now seeing these software companies embrace data science with many of the same challenges as before:
Our Data Science section aims to help organizations solve these new challenges as they add ML/AI into their applications. But it's not just our customer's software that's going through this transformation. GitLab itself is transforming our software to become more intelligent. With our ModelOps stage we're integrating machine learning and artificial intelligence into the GitLab product itself to allow Gitlab to offer suggestions and recommendations. We're also leveraging the data our platform generates to provide new and advanced features to our platform customers. Our Anti-Abuse stage is using GitLab data within the platform to make real time decisions to keep the platform running smoothly. In the future we'll also use this data to the platform more reactive to real time insider threats.
GitLab builds GitLab with GitLab, we dogfood all our own features. As we enrich our platform with ML/AI, we experience the same challenges our customers experience building ML/AI into their applications. These insights will inform the features we build into the GitLab DevSecOps platform to support these ML/AI workloads making it easier for our customers and GitLab itself to integrate ML/AI into applications built with GitLab.
The work of the Data Science section cuts across the entire Gitlab DevOps platform, from our reliance on features like source code management (SCM) and CI/CD to support machine learning (ML) and artificial intelligence (AI) workloads to how we enhance platform features with ML/AI to make them more intelligent and automated. The section is unique in that the value it creates slices horizontally across all other GitLab sections and stages, providing a holistic approach to data science use cases across the software development lifecycle.
Both ModelOps and Anti-abuse are components of GitLab's Data Science product strategy. ModelOps focuses on enabling Data Scientists to use GitLab effectively. Anti-Abuse will use Data Science techniques to build a user activity data system and automation to protect GitLab from abuse and misuse. Initially, Anti-abuse's work is focused on stabilizing GitLab from abuse, but it will also build new revenue-generating products related to Insider Threat detection and UEBA tooling, both of which will rely on data science techniques.
This section aligns cross-functional teams and organizational structures across Product, Engineering, UX, and technical writing teams. This streamlines the management chain of all individuals across functions as well as aligns unique product development areas of focus and challenges. Both the ModelOps and Anti-Abuse stages share some unique properties that other Gitlab sections/stages do not:
We've established a Data Science internal handbook PI page (internal link) which will be updated monthly as part of PI review meetings. We're still working to actively orchestrate all our performance indicator metrics.
With complex toolchains and new vendors emerging every day the data science landscape is a lot of glue and ducktape holding many systems together. We want to streamline this complexity into the GitLab platform to reduce complexity, remove maintenance burden, and enable faster model development and exploration.
As examples, GitLab will provide:
Many data science teams struggle with lack of repeatability cobbling together environments on local machines. These environments rarely have source code management or CI. We want to bring the best practices of DevOps with SCM and CI/CD to data sciences and make it easy for them to start with repeatable and stable environments.
As examples, GitLab will provide:
Model handoffs are only one part of the collaboration needed to make data science handoffs smooth. We want to create seamless handoffs across the software development lifecycle of data science workloads, from connecting data to pipelines, managing model code, and the deployment to production. GitLab already is critical for modern software developers managing production applications. We'll bring the best of our existing DevOps platform to data scientists.
As examples, GitLab will provide:
Long gone are the days of stale data. Today data is in motion. It's always being created, moved, transformed, and drifting. It's in the cloud and sometimes many clouds. Modern data science toolchains need to support cloud-native, data in motion.
As examples, GitLab will provide:
We expect the Data Science section will provide multiple monetization strategies across all GitLab plans with features targeted for data science use cases and Insider Threat detection capabilities. These paid features will follow GitLab's pricing themes to determine how to package various features we develop.
Data Science aims to make GitLab smarter and more automated using ML. Features we develop will help organizations automate their portfolio management, improve their security posture, and detect Insider Threats.
As a general rule of thumb, features will fall in the Ultimate/Gold tier when they meet one or more of the following criteria:
Some examples include:
Features targeted at premium will include a focus on enabling data science use cases across existing GitLab features like source code management (SCM), CI/CD as well as help protect precious intellectual property like source code hosted within GitLab. We want GitLab natively to support data science workloads and much of the value of managing workloads is found in the premium tier which ModelOps will seek to enhance.
Although paid features are the primary focus, there are several reasons why features for unpaid tiers might be prioritized above paid features:
As a general rule of thumb, features will fall in the Core/Free tier when they meet one or more of the following criteria:
Some examples include:
GitLab identifies who our DevSecOps application is built for utilizing the following categorization. We list our view of who we will support when in priority order:
To capitalize on the potential opportunities, the ModelOps Stage has features that make it useful to the following personas today:
As we execute our 3 year strategy, our medium-term (1-2 year) goal is to provide a single DevSecOps application that enables collaboration between developers, data teams, data scientists, and engineers across organizations.
Data Science workloads can be complicated and can leverage specialized hardware and development environments not common to traditional software development teams. The ModelOps stage is focused on the intersection of data scientists exploring models and feature development and the developers who must then deploy those data science features into production.
Data scientists have unique roles within organizations. They are more scientists than developers, following hypotheses and data to explore models and develop data science-powered features.
We aim to serve data scientists as they balance art and science within software engineering teams. Data scientists wear a lot of hats to get from hypothesis to data science feature that generates value. GitLab is not a tool of choice for data scientists and we aim to change that by making it easy to configure, build, and execute data science feature development within GitLab.
The larger the organization, the harder it is for security teams to stay on top of everything happening in complex, ever-changing environments. As an organization's source code management and DevSecOps platform, GitLab holds a lot of sensitive, high-value data. We want to help security teams secure that data. This is a job to which automated data science features can be well suited, including monitoring high-value assets around the clock.
Last Reviewed: 2022-12-12
Last Updated: 2022-12-12