The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
|Group||AI Model Validation|
|Content Last Reviewed||
AI Evaluation are critical cornerstone for successfully implementing Generative AI solutions. They ensure the reliability and the quality of code generated by AI models, mitigating the risk of introducing errors into the codebase. We want to customize AI models with rigorous evaluations to align with coding standards, industry best practices (devops principals, secure coding, test driven development, etc), and specific organizational needs, resulting in accurate and contextually relevant Code Suggestions, accurate natural language chat interactions, and generally aligned model responses. By fostering continuous improvement, we want to enhance developer productivity and contribute to building a robust and maintainable codebase, all while instilling confidence in the reliability of AI-powered processes for tasks across the software development lifecycle Inititally as we build the foundation for AI Evlations our primary first priority is supporting Code Suggesitons quality, we’ll then expand to support Duo Chat, and after that enable evaluations for future AI-powered features.
The AI Evaluations category will focus on assessing the performance, tuning parameters, prompt engineering techniques, and quality of algorithms for various AI models designed for code generation and completion. The models we are exploring initially for this evaluation include a subset of Google models -
text-bison for 12 programming languages. This evaluation is crucial in developing and improving Code Suggestions, as it amplifies our understanding of how well the models are performing and identifies areas that require enhancements.
Our goal for AI evaluations on Code Suggestions is to assess what are high-quality prompts, languages, different semantics of code, the taxonomy of code completion and code generation, mapping the taxonomy, and then adding similarity metrics to historically written code. We intend to improve Code Suggestions through comprehensive and robust assessment of Generative AI models, leading to a reliable, efficient and users friendly product. As a long-term initiative, we want to evaluate the models Quality, Cost and Latency.
To drive usage for Code Suggestions and help increase the user acceptance rate, this is a critical business initiative to dial up the accuracy of Code Suggestions via improving the prompt engine and building a database of prompts for validating code completion at scale for High-Quality Suggestions.
Implement insights from the analysis to fine-tune AI models or enhance training data to improve their performance in various contexts.
Collaborate with engineering and UX teams to seamlessly integrate A/B testing and prompt transformation into the code suggestion workflow.