Blog Engineering Tutorial: Secure BigQuery data publishing with GitLab
Published on: March 25, 2025
5 min read

Tutorial: Secure BigQuery data publishing with GitLab

Learn how to create repeatable, auditable, and efficient processes for automating and securing BigQuery data exports.

automation - cover

GitLab offers a powerful solution for automating and securing BigQuery data exports. This integration transforms manual exports into repeatable, auditable processes that can eliminate security vulnerabilities while saving valuable time. This tutorial explains how to implement this solution so you can quickly reduce manual operations, permission issues, and security concerns with just a few lines of GitLab YAML code.

Follow along with this step-by-step video:

The solution architecture

Our solution leverages GitLab CI/CD pipelines to automate the secure export of data from BigQuery to Google Cloud Storage. Here's the high-level architecture:

  1. SQL code is stored and version-controlled in GitLab.
  2. After code review and approval, GitLab CI/CD pipeline executes the code.
  3. The pipeline authenticates with Google Cloud.
  4. SQL queries are executed against BigQuery.
  5. Results are exported as CSV files to Google Cloud Storage.
  6. Secure links to these files are provided for authorized consumption.

Prerequisites

Before we begin, ensure you have:

  • Google Cloud APIs enabled: BigQuery API and Cloud Storage API
  • Service account with appropriate permissions:
    • BigQuery Job User
    • Storage Admin
    • Note: For this demo, we're using the service account approach for authentication, which is simpler to set up. For production environments, you might consider using GitLab's identity and access management integration with Google Cloud. This integration leverages Workload Identity Federation, which provides enhanced security and is more suitable for enterprise customers and organizations.
  • GitLab project ready to store your SQL code and pipeline configuration

Step-by-step implementation

1. Configure Google Cloud credentials.

First, set up the necessary environment variables in your GitLab project:

  • Go to your GitLab project > Settings > CI/CD.
  • Expand the Variables section.
  • Add the following variables:
    • GCS_BUCKET: Your Google Cloud Storage bucket name
    • GCP_PROJECT_ID: Your Google Cloud project ID
    • GCP_SA_KEY: Base64-encoded service account key (mark as masked)

2. Create your SQL query.

Create a file named query.sql in your GitLab repository with your BigQuery SQL query. The query looks like this:

-- This query shows a list of the daily top Google Search terms.
SELECT
   refresh_date AS Day,
   term AS Top_Term,
       -- These search terms are in the top 25 in the US each day.
   rank,
FROM `bigquery-public-data.google_trends.top_terms`
WHERE
   rank = 1
       -- Choose only the top term each day.
   AND refresh_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 WEEK)
       -- Filter to the last 2 weeks.
GROUP BY Day, Top_Term, rank
ORDER BY Day DESC
   -- Show the days in reverse chronological order.

This query gets the top 25 search terms from Google Trends for the current day.

3. Configure the GitLab CI/CD pipeline.

Create a .gitlab-ci.yml file in your repository root:

image: google/cloud-sdk:alpine

include:
  - template: Jobs/Secret-Detection.gitlab-ci.yml  # https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/ci/templates/Jobs/Secret-Detection.gitlab-ci.yml

execute:
  stage: deploy
  script: 
    # Set up Google Cloud authentication and install necessary components
    - export GOOGLE_CLOUD_CREDENTIALS=$(echo $SERVICE_ACCOUNT_KEY | base64 -d)
    - echo $GOOGLE_CLOUD_CREDENTIALS > service-account-key.json 
    - gcloud auth activate-service-account --key-file service-account-key.json 
    - gcloud components install gsutil
    # Set the active Google Cloud project
    - gcloud config set project $PROJECT_ID
    # Run the BigQuery query and export the results to a CSV file
    - bq query --format=csv --use_legacy_sql=false < test.sql > results.csv
    # Create a Google Cloud Storage bucket if it doesn't exist
    - gsutil ls gs://${CLOUD_STORAGE_BUCKET} || gsutil mb gs://${CLOUD_STORAGE_BUCKET}
    # Upload the CSV file to the storage bucket
    - gsutil cp results.csv gs://${CLOUD_STORAGE_BUCKET}/results.csv
    # Set the access control list (ACL) to make the CSV file publicly readable
    - gsutil acl ch -u AllUsers:R gs://${CLOUD_STORAGE_BUCKET}/results.csv
    # Define the static URL for the CSV file
    - export STATIC_URL="https://storage.googleapis.com/${CLOUD_STORAGE_BUCKET}/results.csv"
    # Display the static URL for the CSV file
    - echo "File URL = $STATIC_URL"

4. Run the pipeline.

Now, whenever changes are merged to your main branch, the pipeline will provide a link to the CSV file stored on the Google Cloud Storage bucket. This file contains the result of the executed SQL query that GitLab subjects to security checks.

Benefits of this approach

  • Security: Authentication is handled automatically via service accounts (or Workload Identity Federation for enhanced security in production environments).
  • Auditability: All data exports are tracked through GitLab commits and pipeline logs.
  • Repeatability: Consistent, predictable export process on every run, and can be scheduled.
  • Version control: SQL queries are properly versioned and reviewed.
  • Automation: Significantly fewer manual exports, reducing human error.

Try it today

By combining GitLab's DevSecOps capabilities with Google Cloud's BigQuery and Cloud Storage, you've now automated and secured your data publishing workflow. This approach reduces manual operations, resolves permission headaches, and addresses security concerns – all achieved with just a few lines of GitLab CI code.

Use this tutorial's complete code example to get started now.

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

Find out which plan works best for your team

Learn about pricing

Learn about what GitLab can do for your team

Talk to an expert