The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
In the dynamic landscape of AI research and development, ensuring the reliability and effectiveness of the AI models upon which we build GitLab Duo AI features is paramount. To achieve this, the AI Model Validation Team relies on foundational models as benchmarks for the Centralized Evaluation Framework.
Foundational models provide a reference point for validating the correctness and reliability of Duo's AI features. We use foundational models as a benchmark for comparison with Duo Features to assess the quality of our underlying models and AI-powered features against the industry landscape.
GitLab is constantly identifying new models for inclusion in the Centralized Evaluation Framework, allowing us to continually iterate on our features and ensure that the underlying models are the best fit for the feature. By comparing the performance metrics of a new model against those of foundational benchmarks and our Duo Features, we can assess whether the new model achieves improvements in accuracy, efficiency, or other relevant criteria. Tracking the performance of foundational models over time also allows GitLab to monitor progress and leverage advancements in AI technology. By regularly updating benchmarks and evaluating models against historical baselines, we can rapidly identify and assess new models for use in Duo Features.
Use of foundational models also helps standardize evaluation methodologies and benchmark datasets across the industry. Using established benchmarks enables consistent and reproducible evaluation of AI models, facilitating fair comparisons.
Provider | Foundational Model | Feature Use |
Anthropic | Claude 2 | Code Suggestion |
Anthropic | Claude 3 Opus | Duo Chat |
Anthropic | Claude 3.5 Sonnet | Duo Chat |
Anthropic | Claude 3 Haiku | Duo Chat |
GitLab | Duo Chat | Duo Chat |
Meta | Code Llama 13B | Code Suggestion |
Mistral | Mixtral 8x7B | Code Suggestion |
OpenAI | GPT 3.5 | Code Suggestion |
OpenAI | GPT 4 | Code Suggestion |
OpenAI | GPT 4 Turbo | Code Suggestion |
Vertex AI | Code Gecko | Code Suggestion |
Vertex AI | code-bison | Code Suggestion |
Vertex AI | text-bison | Code Suggestion |
Vertex AI | Gemini Pro 1.5 | Duo Chat |
Vertex AI | Code Gemma | Code Suggestion |
Last Reviewed: 2024-10-05
Last Updated: 2024-10-05