Protecting sensitive information is more critical than ever. GitLab's Secret Detection feature provides a powerful solution to identify and prevent the exposure of sensitive data. This tutorial explores how GitLab Secret Detection works, how to create custom rulesets for finding personally identifiable information, and how GitLab Duo Chat can streamline the creation of regex patterns for PII detection.
Understanding GitLab Secret Detection
GitLab Secret Detection is a security scanning feature integrated into the GitLab CI/CD pipeline. It automatically scans your codebase to identify hardcoded secrets, credentials, and other sensitive information that shouldn't be stored in your repository.
Key benefits
- Data breach prevention detects secrets before they're committed to your repository.
- Automated scanning runs as part of your CI/CD pipeline without manual intervention.
- Customizable rules extend detection capabilities with custom patterns.
- Compliance support helps meet regulatory requirements like GDPR, HIPAA, and the California Privacy Protection Act.
Create custom rulesets for PII detection
While GitLab's default secret detection covers common secrets like API keys and passwords, you may need custom rules to identify specific types of PII relevant to your organization.
To get started, create a new GitLab project and follow the steps below. You can follow along and see usage examples in our PII Demo Application.
Step 1: Set up Secret Detection
Ensure Secret Detection is enabled in your .gitlab-ci.yml
file:
include:
- template: Security/Secret-Detection.gitlab-ci.yml
secret_detection:
variables:
SECRET_DETECTION_EXCLUDED_PATHS: "rules,.gitlab,README.md,LICENSE"
SECRET_DETECTION_HISTORIC_SCAN: "true"
Step 2: Create a custom ruleset file
Create the directory and file rules/pii-data-extenson.toml
, which contains the regex patterns for PII data along with an allowlist of patterns to ignore. Below are patterns to detect passport numbers (USA), phone numbers (USA), and email addresses:
[extend]
# Extends default packaged ruleset, NOTE: do not change the path.
path = "/gitleaks.toml"
# Patterns to ignore (used for tests)
[allowlist]
description = "allowlist of patterns and paths to ignore in detection"
regexTarget = "match"
regexes = ['''555-555-5555''', '''user@example.com''']
paths = ['''(.*?)(jpg|gif|doc|pdf|bin|svg|socket)''']
# US Passport Number (USA)
[[rules]]
id = "us_passport_detection"
title = "US Passport Number"
description = "Detects US passport numbers"
regex = '''\b[A-Z]{1,2}[0-9]{6,9}\b'''
keywords = ["passport"]
# Phone Number (USA)
[[rules]]
id = "us_phone_number_detection_basic"
title = "US Phone Number"
description = "Detects US phone numbers in basic format"
regex = '''\b\d{3}-\d{3}-\d{4}\b'''
keywords = ["phone", "mobile"]
# Email Address
[[rules]]
id = "email_address"
title = "Email Address"
description = "Detects email addresses"
regex = '''[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'''
keywords = ["email", "e-mail"]
Step 3: Extend Secret Detection with the custom ruleset file
Create a directory and file .gitlab/secret-detection-ruleset.toml
in the root of your repository. This file allows you to extend the standard configuration with the PII rules file, and overwrite the severity of the detected vulnerabilities (default severity is Critical
).
# Define the pii rules to add to default configuration
[[secrets.passthrough]]
type = "file"
target = "gitleaks.toml"
value = "rules/pii-data-extension.toml"
# Overwrite Phone Number (USA) PII Severity
[[secrets.ruleset]]
[secrets.ruleset.identifier]
type = "gitleaks_rule_id"
value = "us_phone_number_detection_basic"
[secrets.ruleset.override]
severity = "Medium"
# Overwrite Email Address PII Severity
[[secrets.ruleset]]
[secrets.ruleset.identifier]
type = "gitleaks_rule_id"
value = "email_address"
[secrets.ruleset.override]
severity = "Low"
Step 4: Commit your changes
Now add the changes in the above steps to your project.
cd /path/to/your/project
git add .
git commit -m "Add PII data ruleset and Secret Scanning"
git push
Once the code is committed, Secret Detection will run within the default branch.
Step 5: Test detection of PII data
Now that we have configured the Secret Detection scanner, we should perform a test to see if the scanner is detecting the new custom patterns. This can be done by creating a merge request, which adds a new file named customer-data.yaml
with the following:
customers:
test_user:
phone_number: 555-555-555
email: user@example.com
justin_case:
phone_number: 512-123-4567
passport_number: A12345678
email: justin_case@example.com
chris_p_bacon:
phone_number: 305-123-4567
passport_number: B09876543
email: chris_p_bacon@example.com
The scanner should now perform the following:
- Ignore the
phone_number
andemail
oftest_user
due to patterns being in allowlist - Detect six potential vulnerabilities due to the information present for both
justin_case
andchris_p_bacon
- U.S. passport number severity is set to
Critical
(default) - U.S. phone number severity is set to
Medium
(override) - Email address severity is set to
Low
(override) - Data from rules override is added to each vulnerability
- U.S. passport number severity is set to
Once the merge request is submitted, the Secret Detection scanner runs and provides the following results:
When clicking on a vulnerability, you are presented with detailed vulnerability data based on what was configured in your newly set up rules:
This data allows you to determine the validity of the data present and address it accordingly.
There are additional ways to configure custom rulesets. For example, rules can be applied remotely to several projects, avoiding the need to duplicate the rules file. See the Secret Detection Configuration documentation for more information.
Common PII types to consider
When building your custom ruleset, consider including patterns for:
- Social Security Numbers
- Credit card numbers
- Driver's license numbers
- Passport numbers
- Email addresses
- Phone numbers
- IP addresses
- Physical addresses
- Medical record numbers
Leveraging GitLab Duo Chat for pattern creation
Creating effective regex patterns can be challenging. This is where GitLab Duo Chat comes in handy. GitLab Duo Chat is an AI-powered assistant that accelerates development with contextual, conversational AI. Chat is available to any user with an assigned GitLab Duo seat.
Using GitLab Duo Chat to generate PII regex patterns
-
Access GitLab Duo Chat: In the upper right corner, select GitLab Duo Chat. A drawer opens on the right side of your screen.
-
Prompt the chat: Enter a prompt in the chat input box and press Enter or select Send:
I need a regex pattern to detect Canadian Zip Codes in our codebase for secret detection
- Review and refine: GitLab Duo Chat will suggest a regex pattern that you can review and refine:
Canadian postal codes follow the format "A1A 1A1" (where A is a letter and 1 is a digit). Here's a regex pattern that will detect Canadian postal codes in your codebase, accounting for both spaced and non-spaced formats:
[A-Za-z]\d[A-Za-z][ -]?\d[A-Za-z]\d
This pattern matches:
* A letter, followed by a digit, followed by a letter
* An optional space or hyphen
* A digit, followed by a letter, followed by a digit
If you need a more strict pattern that only matches uppercase letters (as official Canadian postal codes use), you can use:
[A-Z]\d[A-Z][ -]?\d[A-Z]\d
For implementation in a secret detection tool, you might want to add word boundaries to avoid partial matches:
\b[A-Z]\d[A-Z][ -]?\d[A-Z]\d\b
Would you like more details about implementing this in a specific secret detection system?
-
Test the pattern: Test the pattern against sample data to ensure it works correctly. You can use sites like https://regex101.com/ to validate patterns before you submit them to your ruleset.
-
Add to your ruleset: Once satisfied, add the pattern to your
rules/pii-data-extension.toml
file and perform some tests to validate. This sample merge request contains a newly added rule for Canadian Zip Codes based on the above.
Additionally, you can use GitLab Duo Chat in:
- The GitLab Web IDE (VS Code in the cloud)
- VS Code, with the GitLab Workflow extension for VS Code
- JetBrains IDEs, with the GitLab Duo Plugin for JetBrains
- Visual Studio for Windows, with the GitLab Extension for Visual Studio
In the future, you’ll be able to leverage GitLab Duo Workflow (currently in private beta) to automatically generate and add these patterns to your code base directly from your IDE. GitLab Duo Workflow is an AI agent, which transforms AI from reactive assistant to autonomous contributor, optimizing your software development lifecycle. Learn more about GitLab Duo Workflow.
Best practices for PII detection
- Start small: Begin with a few critical PII types and expand gradually.
- Test thoroughly: Test your patterns against sample data to avoid false positives.
- Update regularly: Review and update your rulesets as new PII requirements emerge.
- Document patterns: Maintain documentation for your custom regex patterns.
- Balance precision: Make patterns specific enough to avoid false positives but flexible enough to catch variations.
- Implement Secret Push Protection: Prevent PII data from making it into your repository.
- Set up Merge Request Approval Policies: Require approval before merging any possible PII data to your repository.
Once you have set up a PII data ruleset to meet your organization's needs, remote rulesets can scan for PII data across multiple repositories without the need to duplicate the rules file. Watch this video to learn more:
Handling Secret Detection findings
When GitLab Secret Detection identifies potential PII in your code:
- Review the finding: Assess whether it's a legitimate finding or a false positive.
- Remediate: Remove the sensitive data and replace it with environment variables or secrets management.
- Update history: For existing repositories, consider using tools like BFG Repo-Cleaner to remove sensitive data from history.
- Track progress: Use GitLab's security dashboard to monitor ongoing compliance.
Get started today
GitLab Secret Detection, combined with custom PII rulesets, provides a powerful defense against inadvertent exposure of sensitive information. By leveraging GitLab Duo Chat to create precise regex patterns, teams can efficiently implement comprehensive PII detection across their codebase, ensuring regulatory compliance and protecting user data.
Remember that secret detection is just one component of a comprehensive security strategy. Combine it with other GitLab security features like static application security testing, dynamic application security testing, and dependency scanning for a more robust security posture.
Start implementing these practices today to better protect your users' personal information and maintain the security integrity of your applications.
Start a free, 60-day trial of GitLab Ultimate and GitLab Duo today!
More resources
To learn more about GitLab security and compliance and how we can help enhance your AppSec workflows, follow the links below: