The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
Enable and empower data science workloads on GitLab
GitLab ModelOps aims to bring data science into GitLab both within existing features to make them smarter and more intelligent, but also empowering GitLab customers to build and integrate data science workloads within GitLab.
The ModelOps Stage is currently outside of the GitLab DevOps lifecycle. We believe that data science features can span across all DevOps stages, making existing features more intelligent and automated.
Watch VP of Product David DeSanto, Engineering Manager Monmayuri Ray, and Principal Product Manager Taylor McCaslin discuss an overview of the GitLab ModelOps stage. They discuss the three pillars of ModelOps, including how to integrate Data Science into DevOps. This includes a brief history on how we got here, as well as where we are going. It discusses GitLab’s recent acquisition of UnReview and how GitLab plans to leverage ML/AI within our platform to improve user experience, as well as empower users to include ML/AI within their applications.
Looking back over the past decade of software engineering we've seen a transition of companies going through digital transformations to become software companies. Today most companies are software companies. Part of GitLab's historical success has been helping companies streamline complex software development lifecycles into our single application DevOps platform reducing complexity and speeding up time to value. We're now seeing these software companies embrace data science with many of the same challenges as before:
One of our primary goals for our ModelOps stage is to reduce the complexities of data science workloads and integrate them to easily be managed and developed within GitLab.
Data scientists do not have the experience of DevOps engineers (and vice-versa). Their skills are not focused on building robust and production-ready systems. Much of data science work is experimentation, cobbling together whatever is needed to identify and produce value. Throughout this experimentation, lots of data, packages, tools, and code get written on a data scientist's machine. This creates a bespoke environment that is hard to reproduce, adds friction to handoffs, and diverges from production systems.
We want to help data scientists create repeatable environments with source code management and CI/CD at the heart of them. It should be easy for anyone on the team to explore the latest model experiment and iterate on it.
Because of the challenges with complex toolchains and lack of repeatable environments, handoffs can be a challenge with data science teams. These teams may produce amazingly valuable models and insights for an organization but when it comes time to deploy those models to production, it can take months. We want to help different teams across the software development lifecycle (SDLC) to better collaborate and handoff data, code, and models. We want to do that with the toolchain software engineering teams are already using.
All together, these challenges lead data science teams to use specialized tools that don't integrate with each other or the existing software development lifecycle tools organizations already use. It leads teams to work in silos creating handoff friction and finger-pointing as well as guesswork and lack of predictability. Applications end up not leveraging data well and models take months to get into production and security is an afterthought. This creates risk for organizations, slows innovation, increases complexity, and increases the time to value. All of this could be avoided with an integrated DevOps platform that natively supports data science workloads. That's exactly what we are building.
We are taking best practices from DevOps and applying them to data science workloads: From the processing of data workloads with Dataops to the productionization of data science models. Teams streamline handoffs because they are working in the same platform based on source code management with CI/CD and integrated security testing. Organizations can reduce risks associated with ML/AI, speed up innovation, reduce complexity, and reduce time to value.
There are two areas of relevance to GitLab ModelOps which we believe are critical to having end to end functioning data science workloads on GitLab:
With our learnings about building and deploying data science workloads with DataOps and MLOps, we will be putting that experience into practice with the stage's other groups:
GitLab ModelOps is currently composed of four groups with a variety of open roles we are actively recruiting with more roles opening throughout 2022:
To learn more about GitLab’s investment areas, please visit the Product Investments section of the GitLab Handbook.
Today, the ModelOps Stage is actively staffing up. We've recently hired multiple engineering roles and are actively hiring for many more throughout 2022 (see above).
Internal team members can watch/read our latest updates from our latest ModelOps Group Conversation ( slides, video )
We've established a ModelOps internal handbook PI page (internal link) which will be updated monthly as part of PI review meetings. We're still working to actively orchestrate all our performance indicator metrics.
With complex toolchains and new vendors emerging every day the data science landscape is a lot of glue and ducktape holding many systems together. We want to streamline this complexity into the GitLab platform to reduce complexity, remove maintenance burden, and enable faster model development and exploration.
As examples, GitLab will provide:
Many data science teams struggle with lack of repeatibility cobbling together environments on local machines. These environments rarely have source code management or CI. We want to bring the best practices of DevOps with SCM and CI/CD to data sciences and make it easy for them to start with repeatable and stable environments.
As examples, GitLab will provide:
Model handoffs are only one part of the collaboration needed to make data science handoffs smooth. We want to create seamless handoffs across the software development lifecycle of data science workloads, from connecting data to pipelines, managing model code, and the deployment to production. GitLab already is critical for modern software developers managing production applications. We'll bring the best of our existing DevOps platform to data scientists.
As examples, GitLab will provide:
Long gone are the days of stale data. Today data is in motion. It's always being created, moved, transformed, and drifting. It's in the cloud and sometimes many clouds. Modern data science toolchains need to support cloud-native, data in motion.
As examples, GitLab will provide:
The Modelops stage is actively working on staffing the team and implementing quality of life improvements to improve the GitLab experience for data scientists. The following are some highlights from recent GitLab releases:
The ModelOps team is actively working to integrate machine learning into GitLab and the following outlines where we are currently investing our efforts:
The following will NOT be a focus over the next 12 months:
ModelOps is focused on empowering data science workloads across GitLab and enriching existing GitLab features with ML.
We expect ModelOps will provide multiple monetization strategies across all GitLab plans with features targeted for data science use cases. ModelOps paid features will follow GitLab's pricing themes to determine how to package various features we develop.
ModelOps aims to make GitLab smarter and more automated using ML. Features we develop will help organizations automate their portfolio management and improve their security posture
As a general rule of thumb, features will fall in the Ultimate/Gold tier when they meet one or more of the following criteria:
Some examples include:
Features targeted at premium will include a focus on enabling data science use cases across existing GitLab features like source code management (SCM), CI/CD. We want GitLab natively to support data science workloads and much of the value of managing workloads is found in the premium tier which ModelOps will seek to enhance.
Although paid features are the primary focus, there are several reasons why features for unpaid tiers might be prioritized above paid features:
As a general rule of thumb, features will fall in the Core/Free tier when they meet one or more of the following criteria:
Some examples include:
TBD
GitLab identifies who our DevSecOps application is built for utilizing the following categorization. We list our view of who we will support when in priority order:
To capitalize on the potential opportunities, the ModelOps Stage has features that make it useful to the following personas today:
As we execute our 3 year strategy, our medium-term (1-2 year) goal is to provide a single DevSecOps application that enables collaboration between developers, data teams, data scientists, and engineers across organizations.
Data Science workloads can be complicated and can leverage specialized hardware and development environments not common to traditional software development teams. The ModelOps stage is focused on the intersection of data scientists exploring models and feature development and the developers who must then deploy those data science features into production.
Personas
Data scientists have unique roles within organizations. They are more scientists than developers, following hypotheses and data to explore models and develop data science-powered features.
We aim to serve data scientists as they balance art and science within software engineering teams. Data scientists wear a lot of hats to get from hypothesis to data science feature that generates value. GitLab is not a tool of choice for data scientists and we aim to change that by making it easy to configure, build, and execute data science feature development within GitLab.
Personas
The larger the organization, the harder it is for security teams to stay on top of everything happening in complex, ever-changing environments. As an organization's source code management and DevSecOps platform, GitLab holds a lot of sensitive, high-value data. We want to help security teams secure that data. This is a job to which automated data science features can be well suited, including monitoring high-value assets around the clock.
Personas
Last Reviewed: 2022-03-15
Last Updated: 2022-03-15