GitLab offers a powerful solution for automating and securing BigQuery data exports. This integration transforms manual exports into repeatable, auditable processes that can eliminate security vulnerabilities while saving valuable time. This tutorial explains how to implement this solution so you can quickly reduce manual operations, permission issues, and security concerns with just a few lines of GitLab YAML code.
Follow along with this step-by-step video:
The solution architecture
Our solution leverages GitLab CI/CD pipelines to automate the secure export of data from BigQuery to Google Cloud Storage. Here's the high-level architecture:
- SQL code is stored and version-controlled in GitLab.
- After code review and approval, GitLab CI/CD pipeline executes the code.
- The pipeline authenticates with Google Cloud.
- SQL queries are executed against BigQuery.
- Results are exported as CSV files to Google Cloud Storage.
- Secure links to these files are provided for authorized consumption.
Prerequisites
Before we begin, ensure you have:
- Google Cloud APIs enabled: BigQuery API and Cloud Storage API
- Service account with appropriate permissions:
- BigQuery Job User
- Storage Admin
- Note: For this demo, we're using the service account approach for authentication, which is simpler to set up. For production environments, you might consider using GitLab's identity and access management integration with Google Cloud. This integration leverages Workload Identity Federation, which provides enhanced security and is more suitable for enterprise customers and organizations.
- GitLab project ready to store your SQL code and pipeline configuration
Step-by-step implementation
1. Configure Google Cloud credentials.
First, set up the necessary environment variables in your GitLab project:
- Go to your GitLab project > Settings > CI/CD.
- Expand the Variables section.
- Add the following variables:
GCS_BUCKET
: Your Google Cloud Storage bucket nameGCP_PROJECT_ID
: Your Google Cloud project IDGCP_SA_KEY
: Base64-encoded service account key (mark as masked)
2. Create your SQL query.
Create a file named query.sql
in your GitLab repository with your BigQuery SQL query. The query looks like this:
-- This query shows a list of the daily top Google Search terms.
SELECT
refresh_date AS Day,
term AS Top_Term,
-- These search terms are in the top 25 in the US each day.
rank,
FROM `bigquery-public-data.google_trends.top_terms`
WHERE
rank = 1
-- Choose only the top term each day.
AND refresh_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 WEEK)
-- Filter to the last 2 weeks.
GROUP BY Day, Top_Term, rank
ORDER BY Day DESC
-- Show the days in reverse chronological order.
This query gets the top 25 search terms from Google Trends for the current day.
3. Configure the GitLab CI/CD pipeline.
Create a .gitlab-ci.yml
file in your repository root:
image: google/cloud-sdk:alpine
include:
- template: Jobs/Secret-Detection.gitlab-ci.yml # https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/ci/templates/Jobs/Secret-Detection.gitlab-ci.yml
execute:
stage: deploy
script:
# Set up Google Cloud authentication and install necessary components
- export GOOGLE_CLOUD_CREDENTIALS=$(echo $SERVICE_ACCOUNT_KEY | base64 -d)
- echo $GOOGLE_CLOUD_CREDENTIALS > service-account-key.json
- gcloud auth activate-service-account --key-file service-account-key.json
- gcloud components install gsutil
# Set the active Google Cloud project
- gcloud config set project $PROJECT_ID
# Run the BigQuery query and export the results to a CSV file
- bq query --format=csv --use_legacy_sql=false < test.sql > results.csv
# Create a Google Cloud Storage bucket if it doesn't exist
- gsutil ls gs://${CLOUD_STORAGE_BUCKET} || gsutil mb gs://${CLOUD_STORAGE_BUCKET}
# Upload the CSV file to the storage bucket
- gsutil cp results.csv gs://${CLOUD_STORAGE_BUCKET}/results.csv
# Set the access control list (ACL) to make the CSV file publicly readable
- gsutil acl ch -u AllUsers:R gs://${CLOUD_STORAGE_BUCKET}/results.csv
# Define the static URL for the CSV file
- export STATIC_URL="https://storage.googleapis.com/${CLOUD_STORAGE_BUCKET}/results.csv"
# Display the static URL for the CSV file
- echo "File URL = $STATIC_URL"
4. Run the pipeline.
Now, whenever changes are merged to your main branch, the pipeline will provide a link to the CSV file stored on the Google Cloud Storage bucket. This file contains the result of the executed SQL query that GitLab subjects to security checks.
Benefits of this approach
- Security: Authentication is handled automatically via service accounts (or Workload Identity Federation for enhanced security in production environments).
- Auditability: All data exports are tracked through GitLab commits and pipeline logs.
- Repeatability: Consistent, predictable export process on every run, and can be scheduled.
- Version control: SQL queries are properly versioned and reviewed.
- Automation: Significantly fewer manual exports, reducing human error.
Try it today
By combining GitLab's DevSecOps capabilities with Google Cloud's BigQuery and Cloud Storage, you've now automated and secured your data publishing workflow. This approach reduces manual operations, resolves permission headaches, and addresses security concerns – all achieved with just a few lines of GitLab CI code.
Use this tutorial's complete code example to get started now.