AI Evaluation Foundational Models

The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.

On this page

Why Foundational Models?

In the dynamic landscape of AI research and development, ensuring the reliability and effectiveness of the AI models upon which we build GitLab Duo AI features is paramount. To achieve this, the AI Model Validation Team relies on foundational models as benchmarks for the Centralized Evaluation Framework.

Validation

Foundational models provide a reference point for validating the correctness and reliability of Duo's AI features. We use foundational models as a benchmark for comparison with Duo Features to assess the quality of our underlying models and AI-powered features against the industry landscape.

Performance and Progress

GitLab is constantly identifying new models for inclusion in the Centralized Evaluation Framework, allowing us to continually iterate on our features and ensure that the underlying models are the best fit for the feature. By comparing the performance metrics of a new model against those of foundational benchmarks and our Duo Features, we can assess whether the new model achieves improvements in accuracy, efficiency, or other relevant criteria. Tracking the performance of foundational models over time also allows GitLab to monitor progress and leverage advancements in AI technology. By regularly updating benchmarks and evaluating models against historical baselines, we can rapidly identify and assess new models for use in Duo Features.

Standardization

Use of foundational models also helps standardize evaluation methodologies and benchmark datasets across the industry. Using established benchmarks enables consistent and reproducible evaluation of AI models, facilitating fair comparisons.

Provider	Foundational Model	Feature Use
Anthropic	Claude 2	Code Suggestion
Anthropic	Claude 3 Opus	Duo Chat
Anthropic	Claude 3.5 Sonnet	Duo Chat
Anthropic	Claude 3 Haiku	Duo Chat
GitLab	Duo Chat	Duo Chat
Meta	Code Llama 13B	Code Suggestion
Mistral	Mixtral 8x7B	Code Suggestion
OpenAI	GPT 3.5	Code Suggestion
OpenAI	GPT 4	Code Suggestion
OpenAI	GPT 4 Turbo	Code Suggestion
Vertex AI	Code Gecko	Code Suggestion
Vertex AI	code-bison	Code Suggestion
Vertex AI	text-bison	Code Suggestion
Vertex AI	Gemini Pro 1.5	Duo Chat
Vertex AI	Code Gemma	Code Suggestion