Published on: June 25, 2025
14 min read
Discover how this new GitLab feature can find exact matches, use regex patterns, and see contextual results across terabytes of codebases.

TL;DR: What if you could find any line of code across 48 TB of repositories in milliseconds? GitLab's new Exact Code Search makes this possible, delivering pinpoint precision, powerful regex support, and contextual multi-line results that transform how teams work with large codebases.
Anyone who works with code knows the frustration of searching across repositories. Whether you're a developer debugging an issue, a DevOps engineer examining configurations, a security analyst searching for vulnerabilities, a technical writer updating documentation, or a manager reviewing implementation, you know exactly what you need, but traditional search tools often fail you. These conventional tools return dozens of false positives, lack the context needed to understand results, and slow to a crawl as codebases grow. The result? Valuable time spent hunting for needles in haystacks instead of building, securing, or improving your software. GitLab's code search functionality has historically been backed by Elasticsearch or OpenSearch. While these are excellent for searching issues, merge requests, comments, and other data containing natural language, they weren't specifically designed for code. After evaluating numerous options, we developed a better solution.
Enter GitLab's Exact Code Search, currently in beta testing and powered by Zoekt (pronounced "zookt", Dutch for "search"). Zoekt is an open-source code search engine originally created by Google and now maintained by Sourcegraph, specifically designed for fast, accurate code search at scale. We've enhanced it with GitLab-specific integrations, enterprise-scale improvements, and seamless permission system integration. This feature revolutionizes how you find and understand code with three key capabilities: 1. Exact Match mode: Zero false positives When toggled to Exact Match mode, the search engine returns only results that match your query exactly as entered, eliminating false positives. This precision is invaluable when:
Instead of seeing just a single line with your matching term, you get the surrounding context that's crucial for understanding the code. This eliminates the need to click through to files for basic comprehension, significantly accelerating your workflow.Let's see how these capabilities translate to real productivity gains in everyday development scenarios:
Before Exact Code Search: Copy an error message, search, wade through dozens of partial matches in comments and documentation, click through multiple files, and eventually find the actual code. With Exact Code Search:
Before Exact Code Search: Browse through directories, make educated guesses about file locations, open dozens of files, and slowly build a mental map of the codebase. With Exact Code Search:
Before Exact Code Search: Attempt to find all instances of a method, miss some occurrences, and introduce bugs through incomplete refactoring. With Exact Code Search:
Security teams can:
Search across your entire namespace or instance to:
Before diving into our scale achievements, let's explore what makes Zoekt fundamentally different from traditional search engines — and why it can find exact matches so incredibly fast.
Zoekt's speed comes from its use of positional trigrams — a technique that indexes every sequence of three characters along with their exact positions in files. This approach solves one of the biggest pain points developers have had with Elasticsearch-based code search: false positives.
Here's how it works:
Traditional full-text search engines like Elasticsearch tokenize code into words and lose positional information. When you search for getUserId(), they might return results containing user, get, and Id scattered throughout a file — leading to those frustrating false positives for GitLab users.
Zoekt's positional trigrams maintain exact character sequences and their positions. When you search for getUserId(), Zoekt looks for the exact trigrams like get, etU, tUs, Use, ser, erI, rId, Id(", "d(), all in the correct sequence and position. This ensures that only exact matches are returned.
The result? Search queries that previously returned hundreds of irrelevant results now return only the precise matches you're looking for. This was one of our most requested features for good reason - developers were losing significant time sifting through false positives.
Zoekt excels at exact matches and is optimized for regular expression searches. The engine uses sophisticated algorithms to convert regex patterns into efficient trigram queries when possible, maintaining speed even for complex patterns across terabytes of code.
Exact Code Search is powerful and built to handle massive scale with impressive performance. This is not just a new UI feature — it's powered by a completely reimagined backend architecture.
On GitLab.com alone, our Exact Code Search infrastructure indexes and searches over 48 TB of code data while maintaining lightning-fast response times. This scale represents millions of repositories across thousands of namespaces, all searchable within milliseconds. To put this in perspective: This scale represents more code than the entire Linux kernel, Android, and Chromium projects combined. Yet Exact Code Search can find a specific line across this massive codebase in milliseconds.
Our innovative implementation features:
Behind the scenes, Exact Code Search operates as a distributed system with these key components:
Exact Code Search automatically integrates with GitLab's permission system:
While Zoekt provided the core search technology, it was originally designed as a minimal library for managing .zoekt index files - not a distributed database or enterprise-scale service. Here are the key engineering challenges we overcame to make it work at GitLab's scale"
The problem: Zoekt was designed to work with local index files, not distributed across multiple nodes serving many concurrent users. Our solution: We built a comprehensive orchestration layer that:
The problem: How do you efficiently manage terabytes of index data across multiple nodes while ensuring fast updates? Our solution: We implemented:
gitlab-zoekt binary that can operate in both indexer and webserver modesThe problem: Zoekt had no concept of GitLab's complex permission system - users should only see results from projects they can access. Our solution: We built native permission filtering directly into the search flow:
The problem: Managing a distributed search system shouldn't require a dedicated team. Our solution:
Rolling out a completely new search backend to millions of users required careful planning. Here's how we minimized customer impact while ensuring reliability:
We started by enabling Exact Code Search only for the gitlab-org group - our own internal repositories. This allowed us to:
Before expanding, we focused on ensuring the system could handle GitLab.com's scale:
We gradually expanded to customers interested in testing Exact Code Search:
gitlab-org/gitlab now index in ~10 seconds)Today, over 99% of Premium and Ultimate licensed groups on GitLab.com have access to Exact Code Search. Users can:
For technical deep dive: Interested in the detailed architecture and implementation? Check out our comprehensive design document for in-depth technical details about how we built this distributed search system.
Getting started with Exact Code Search is simple because it's already enabled by default for Premium and Ultimate groups on GitLab.com (over 99% of eligible groups currently have access).
Whether using Exact Match or Regular Expression mode, you can refine your search with modifiers:
| Query Example | What It Does |
|---|---|
file:js |
Searches only in files containing "js" in their name |
foo -bar |
Finds "foo" but excludes results with "bar" |
lang:ruby |
Searches only in Ruby files |
sym:process |
Finds "process" in symbols (methods, classes, variables) |
Pro Tip: For the most efficient searches, start specific and then broaden if needed. Using
file:andlang:filters dramatically increases relevance.
Stack multiple filters for precision:
is_expected file:rb -file:spec
This finds "is_expected" in Ruby files that don't have "spec" in their name.
Use regular expressions for powerful patterns:
token.*=.*[\"']
Watch this search performed against the GitLab Zoekt repository.
The search helps find hardcoded passwords, which, if not found, can be a security issue.
For more detailed syntax information, check the Exact Code Search documentation.
Exact Code Search is currently in Beta for GitLab.com users with Premium and Ultimate licenses:
For self-managed instances, we offer several deployment methods:
gitlab-zoekt Helm chartWhile Exact Code Search is already powerful, we're continuously improving it:
GitLab's Exact Code Search represents a fundamental rethinking of code discovery. By delivering exact matches, powerful regex support, and contextual results, it solves the most frustrating aspects of code search:
Ready to experience smarter code search? Learn more in our documentation or try it now by performing a search in your Premium or Ultimate licensed namespaces or projects. Not a GitLab user yet? Try a free trial of GitLab Ultimate with Duo!