Introducing Token-Hunter

Greg Johnson ·
Dec 20, 2019 · 9 min read

We operate business at GitLab in a “public by default” mindset so other people can benefit from our transparent business practices. Defaulting to public sharing also means we store massive amounts of data in a public format by design. Much of what we do as a company takes the form of a GitLab issue and is open for the world to see, including those individuals with nefarious goals. Naturally, for a Red Team, we’re curious about what all of this public information could do to aid someone intent on attacking GitLab. We started our investigation by identifying those secrets that are unintentionally shared across the assets we make public like issues, issue discussions, merge requests, merge request discussions, and snippets. There was no tooling available that accomplished what we set out to do, so we developed it ourselves and just released it: Token-Hunter.


API tokens are a keystone in the development world. They facilitate important functionality not only in the software developers build, but also in the deployment, maintenance, integration, and security of both closed and open source projects. Many companies providing services on the internet offer API tokens in multiple flavors that allow interaction with their systems, as does GitLab. Ideally, these tokens offer configurable access control to otherwise closed systems allowing you to impersonate a user’s session and access raw data. Developers, DevOps professionals, infrastructure professionals and the like often depend on API tokens to do their job successfully.

It’s a common and understandable mistake to make a commit to a Git repository containing one of these tokens when building software in a shared environment. Moving quickly, trying to support your fellow developer, and generally working quickly to get things done efficiently can lead to mistakes made under pressure, which can happen to us all. Popular tools that search for these commits like gitrob, TruffleHog, gitleaks, and even GitLab’s own SAST project can find leaked tokens given proper configuration. Our Red Team had early success leveraging these known techniques, tactics, and procedures (TTPs).

The tools referenced above are fantastic at finding secrets unintentionally left in source code. However, it's also a common mistake to submit sensitive data like API tokens, usernames, and passwords to public locales like GitLab snippets, issues, issue discussions, merge requests, and merge request discussions. Sharing this type of information by accident can happen easily when attempting to share relevant information to facilitate a public support request as we often do at GitLab for many different products. Though most people know not to post sensitive information in a public place directly, mistakes do happen, sometimes shortcuts are taken, logs get shared, configuration files get dropped, and information inadvertently gets leaked and leveraged. More often than not these areas of exposure are often forgotten, but not by attackers.

Exploring the wide-open

Token-Hunter is intended to complement tools like gitrob, gitleaks, TruffleHog, and others. It can be used if you’re hosting your groups and projects on, or on a self-managed GitLab instance of your own. We created Token-Hunter to support the following features:

For full details on using the tool and the functionality of each of its available arguments, visit the Token-Hunter project page on GitLab.

Taming the wild… mostly

Hitting an API to gather large amounts of raw data is daunting. Internet connections flake out, servers need maintenance, rate limits get hit, WiFi drops, performance degrades, timeouts happen, and you end up with a headache attempting to simply get the data you’d like to analyze. To counter some of these issues as pragmatically as possible, two simple algorithms were applied: request retries and dynamic page-size reduction. Request retries simply retries a failed request after a few seconds. The tool will retry a failed request twice, each after a four-second delay with a four-second backoff. In other words, the first retry will occur four seconds after the initial failed request. The second retry will occur eight seconds after the first failed retry attempt. If each of these retry attempts fails, the tool then attempts to reduce the paging size in order to complete the request. Reducing the page size reduces the number of records the request needs to return lessening the likelihood of a timeout. Though simple, these two algorithms allowed the tool to reliably pull data for nearly 1.3 million individual GitLab assets with only three recorded request errors resulting in over 1600 pattern matches.

More to explore

The ability to search discussions and other popular channels where sensitive data is likely to be shared is the key benefit of the Token-Hunter tool over other related tooling. The Red Team plans to continue iterating to support our operations, including adding support for more assets such as merge requests, commit discussions, and epics. We learned during our operation that the specifics of the regular expressions we used in relation to the context in which we were searching (posted log data format, configuration file format, code structure, etc.) largely determined our level of success. It can be necessary to tune these expressions depending on your environment and context. To start, we made a few adjustments to TruffleHog’s regular expressions to add coverage for GitLab-specific token formats. However, there’s still much room for improvement depending on your environment and objective.

Looking for a specific password for a user name? Trying to find all mentions of a specific server DNS name or IP? Expecting a specific log format that has the potential to contain an API token? Tune the regular expressions, and you just may find what you’re looking for.

We want your ideas and contributions

There is still plenty to be done and we welcome community contributions and ideas. If the tool is helpful to you in defense of your infrastructure and you’d like to contribute, there are instructions in the on how to get started. If you’re not sure what to do, pick an issue out of our issue list or add to the existing discussions. I'd like to extend a special thank you to GitLab user Ohad Dahan for his many contributions to this and other GitLab projects. These types of contributions are paramount to the continued success of open source projects.

Some of the ideas we’re currently pursuing are:

We have learned a lot from this initial attempt at gathering OSINT from rather unique and unorthodox locations, but this exercise was just a start. We hope you find the tooling useful and if you have questions or ideas to share please reach out through email, through our issue board, or on Twitter. Happy hacking!

Photo by Lightscape on Unsplash.

“.@GitLab introduces a new #opensource #infosec tool to help teams find sensitive data shared in assets unintentionally. Community contributions are welcome!” – Greg Johnson

Click to tweet

10 Steps Every CISO Should Take to Secure Next-Gen Software

Understand three software shifts impacting security, and the steps CISOs can take to protect their business.

Get the eBook
Edit this page View source