Blog Security Strengthen data security with custom PII detection rulesets
Published on: April 1, 2025
9 min read

Strengthen data security with custom PII detection rulesets

This tutorial explains how GitLab's customizable Secret Detection rulesets enhance data security by identifying PII patterns in code repositories. Learn how AI can help.

security - chainlink - cover

Protecting sensitive information is more critical than ever. GitLab's Secret Detection feature provides a powerful solution to identify and prevent the exposure of sensitive data. This tutorial explores how GitLab Secret Detection works, how to create custom rulesets for finding personally identifiable information, and how GitLab Duo Chat can streamline the creation of regex patterns for PII detection.

Understanding GitLab Secret Detection

GitLab Secret Detection is a security scanning feature integrated into the GitLab CI/CD pipeline. It automatically scans your codebase to identify hardcoded secrets, credentials, and other sensitive information that shouldn't be stored in your repository.

Key benefits

  • Data breach prevention detects secrets before they're committed to your repository.
  • Automated scanning runs as part of your CI/CD pipeline without manual intervention.
  • Customizable rules extend detection capabilities with custom patterns.
  • Compliance support helps meet regulatory requirements like GDPR, HIPAA, and the California Privacy Protection Act.

Create custom rulesets for PII detection

While GitLab's default secret detection covers common secrets like API keys and passwords, you may need custom rules to identify specific types of PII relevant to your organization.

To get started, create a new GitLab project and follow the steps below. You can follow along and see usage examples in our PII Demo Application.

Step 1: Set up Secret Detection

Ensure Secret Detection is enabled in your .gitlab-ci.yml file:

include:
  - template: Security/Secret-Detection.gitlab-ci.yml

secret_detection:
  variables:
    SECRET_DETECTION_EXCLUDED_PATHS: "rules,.gitlab,README.md,LICENSE"
    SECRET_DETECTION_HISTORIC_SCAN: "true"

Step 2: Create a custom ruleset file

Create the directory and file rules/pii-data-extenson.toml, which contains the regex patterns for PII data along with an allowlist of patterns to ignore. Below are patterns to detect passport numbers (USA), phone numbers (USA), and email addresses:

[extend]
# Extends default packaged ruleset, NOTE: do not change the path.
path = "/gitleaks.toml"

# Patterns to ignore (used for tests)
[allowlist]
description = "allowlist of patterns and paths to ignore in detection"
regexTarget = "match"
regexes = ['''555-555-5555''', '''user@example.com''']
paths = ['''(.*?)(jpg|gif|doc|pdf|bin|svg|socket)''']

# US Passport Number (USA)
[[rules]]
id = "us_passport_detection"
title = "US Passport Number"
description = "Detects US passport numbers"
regex = '''\b[A-Z]{1,2}[0-9]{6,9}\b'''
keywords = ["passport"]

# Phone Number (USA)
[[rules]]
id = "us_phone_number_detection_basic"
title = "US Phone Number"
description = "Detects US phone numbers in basic format"
regex = '''\b\d{3}-\d{3}-\d{4}\b'''
keywords = ["phone", "mobile"]

# Email Address
[[rules]]
id = "email_address"
title = "Email Address"
description = "Detects email addresses"
regex = '''[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'''
keywords = ["email", "e-mail"]

Step 3: Extend Secret Detection with the custom ruleset file

Create a directory and file .gitlab/secret-detection-ruleset.toml in the root of your repository. This file allows you to extend the standard configuration with the PII rules file, and overwrite the severity of the detected vulnerabilities (default severity is Critical).

# Define the pii rules to add to default configuration
[[secrets.passthrough]]
type = "file"
target = "gitleaks.toml"
value = "rules/pii-data-extension.toml"

# Overwrite Phone Number (USA) PII Severity
[[secrets.ruleset]]
[secrets.ruleset.identifier]
type = "gitleaks_rule_id"
value = "us_phone_number_detection_basic"
[secrets.ruleset.override]
severity = "Medium"

# Overwrite Email Address PII Severity
[[secrets.ruleset]]
[secrets.ruleset.identifier]
type = "gitleaks_rule_id"
value = "email_address"
[secrets.ruleset.override]
severity = "Low"

Step 4: Commit your changes

Now add the changes in the above steps to your project.

cd /path/to/your/project
git add .
git commit -m "Add PII data ruleset and Secret Scanning"
git push

Once the code is committed, Secret Detection will run within the default branch.

Step 5: Test detection of PII data

Now that we have configured the Secret Detection scanner, we should perform a test to see if the scanner is detecting the new custom patterns. This can be done by creating a merge request, which adds a new file named customer-data.yaml with the following:

customers:  
  test_user:  
    phone_number: 555-555-555  
    email: user@example.com  
  justin_case:  
    phone_number: 512-123-4567  
    passport_number: A12345678  
    email: justin_case@example.com  
  chris_p_bacon: 
    phone_number: 305-123-4567  
    passport_number: B09876543  
    email: chris_p_bacon@example.com  

The scanner should now perform the following:

  • Ignore the phone_number and email of test_user due to patterns being in allowlist
  • Detect six potential vulnerabilities due to the information present for both justin_case and chris_p_bacon
    • U.S. passport number severity is set to Critical (default)
    • U.S. phone number severity is set to Medium (override)
    • Email address severity is set to Low (override)
    • Data from rules override is added to each vulnerability

Once the merge request is submitted, the Secret Detection scanner runs and provides the following results:

Secret Detection finding custom PII data MR

When clicking on a vulnerability, you are presented with detailed vulnerability data based on what was configured in your newly set up rules:

Expanded custom PII data vulnerability

This data allows you to determine the validity of the data present and address it accordingly.

There are additional ways to configure custom rulesets. For example, rules can be applied remotely to several projects, avoiding the need to duplicate the rules file. See the Secret Detection Configuration documentation for more information.

Common PII types to consider

When building your custom ruleset, consider including patterns for:

  • Social Security Numbers
  • Credit card numbers
  • Driver's license numbers
  • Passport numbers
  • Email addresses
  • Phone numbers
  • IP addresses
  • Physical addresses
  • Medical record numbers

Leveraging GitLab Duo Chat for pattern creation

Creating effective regex patterns can be challenging. This is where GitLab Duo Chat comes in handy. GitLab Duo Chat is an AI-powered assistant that accelerates development with contextual, conversational AI. Chat is available to any user with an assigned GitLab Duo seat.

Using GitLab Duo Chat to generate PII regex patterns

  1. Access GitLab Duo Chat: In the upper right corner, select GitLab Duo Chat. A drawer opens on the right side of your screen.

  2. Prompt the chat: Enter a prompt in the chat input box and press Enter or select Send:

I need a regex pattern to detect Canadian Zip Codes in our codebase for secret detection
  1. Review and refine: GitLab Duo Chat will suggest a regex pattern that you can review and refine:
Canadian postal codes follow the format "A1A 1A1" (where A is a letter and 1 is a digit). Here's a regex pattern that will detect Canadian postal codes in your codebase, accounting for both spaced and non-spaced formats:
[A-Za-z]\d[A-Za-z][ -]?\d[A-Za-z]\d
This pattern matches:
* A letter, followed by a digit, followed by a letter
* An optional space or hyphen
* A digit, followed by a letter, followed by a digit
If you need a more strict pattern that only matches uppercase letters (as official Canadian postal codes use), you can use:
[A-Z]\d[A-Z][ -]?\d[A-Z]\d
For implementation in a secret detection tool, you might want to add word boundaries to avoid partial matches:
\b[A-Z]\d[A-Z][ -]?\d[A-Z]\d\b
Would you like more details about implementing this in a specific secret detection system?
  1. Test the pattern: Test the pattern against sample data to ensure it works correctly. You can use sites like https://regex101.com/ to validate patterns before you submit them to your ruleset.

  2. Add to your ruleset: Once satisfied, add the pattern to your rules/pii-data-extension.toml file and perform some tests to validate. This sample merge request contains a newly added rule for Canadian Zip Codes based on the above.

Additionally, you can use GitLab Duo Chat in:

In the future, you’ll be able to leverage GitLab Duo Workflow (currently in private beta) to automatically generate and add these patterns to your code base directly from your IDE. GitLab Duo Workflow is an AI agent, which transforms AI from reactive assistant to autonomous contributor, optimizing your software development lifecycle. Learn more about GitLab Duo Workflow.

Best practices for PII detection

  1. Start small: Begin with a few critical PII types and expand gradually.
  2. Test thoroughly: Test your patterns against sample data to avoid false positives.
  3. Update regularly: Review and update your rulesets as new PII requirements emerge.
  4. Document patterns: Maintain documentation for your custom regex patterns.
  5. Balance precision: Make patterns specific enough to avoid false positives but flexible enough to catch variations.
  6. Implement Secret Push Protection: Prevent PII data from making it into your repository.
  7. Set up Merge Request Approval Policies: Require approval before merging any possible PII data to your repository.

Once you have set up a PII data ruleset to meet your organization's needs, remote rulesets can scan for PII data across multiple repositories without the need to duplicate the rules file. Watch this video to learn more:

Handling Secret Detection findings

When GitLab Secret Detection identifies potential PII in your code:

  1. Review the finding: Assess whether it's a legitimate finding or a false positive.
  2. Remediate: Remove the sensitive data and replace it with environment variables or secrets management.
  3. Update history: For existing repositories, consider using tools like BFG Repo-Cleaner to remove sensitive data from history.
  4. Track progress: Use GitLab's security dashboard to monitor ongoing compliance.

Get started today

GitLab Secret Detection, combined with custom PII rulesets, provides a powerful defense against inadvertent exposure of sensitive information. By leveraging GitLab Duo Chat to create precise regex patterns, teams can efficiently implement comprehensive PII detection across their codebase, ensuring regulatory compliance and protecting user data.

Remember that secret detection is just one component of a comprehensive security strategy. Combine it with other GitLab security features like static application security testing, dynamic application security testing, and dependency scanning for a more robust security posture.

Start implementing these practices today to better protect your users' personal information and maintain the security integrity of your applications.

Start a free, 60-day trial of GitLab Ultimate and GitLab Duo today!

More resources

To learn more about GitLab security and compliance and how we can help enhance your AppSec workflows, follow the links below:

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

Find out which plan works best for your team

Learn about pricing

Learn about what GitLab can do for your team

Talk to an expert