Secure Technical Documentation

Architecture

Overview

The architecture supporting the Secure features is split into two main parts.

flowchart LR
  subgraph G1[Scanning]
    Scanner
    Analyzer
    CI[CI Jobs]
  end
  subgraph G2[Processing, visualization, and management]
   Parsers
   Database
   Views
   Interactions
  end
  G1 --Report Artifact--> G2

Scanning

The scanning part is responsible for finding vulnerabilities in given resources and exporting results. The scans are executed in CI jobs via several small projects called Analyzers which can be be found in our Analyzers sub-group. The Analyzers are small wrappers around in-house or external security tools called Scanners to integrate them into GitLab. The Analyzers are mainly written in Go and rely on our Common Go library.

Some 3rd party integrators also make additional Scanners available by following our integration documentation, which leverages the same architecture.

The results of the scans are exported as JSON reports that must follow Secure Report Format and are uploaded as CI Job Report Artifacts to make them available for processing after the pipelines completes.

This part is mainly covered by the following groups:

Processing, visualization, and management

Once the data is available as Report Artifact, it can be processed by the GitLab Rails application to enable our security features:

Depending on the context, the security reports can be stored in the database or stay as Report Artifacts for on-demand access.

This part is mainly covered by the Threat Insights group.

Though, the boundaries can sometimes be a bit blurry so we’re trying to delineate this as clearly as possible.

ClickHouse Datastore

Key workloads across the Secure features rely on high rates of writes and aggregated analysis across historic data. These types of OLAP scenarios are not well-suited to transactional datastores like PostgreSQL and can benefit from batch-based inserts, high-read ratios, and wide tables.

In these cases, the introduction of ClickHouse to the GitLab technology stack provides an important opportunity to scale Secure features for the future.

ClickHouse as a datastore has the potential to power several key workflows within the section including:

Security Dashboards

Security dashboards provide historical aggregate data for tracking active vulnerabilities across projects and namespaces. These requests are analytical aggregation queries of read-only data for which ClickHouse is heavily optimized.

Beyond improving the performance of the existing aggregations, use of an OLAP datastore provides more open-ended options in allowing on-demand aggregation by additional fields; i.e. report types and classifications alongside severity.

Vulnerability Lists

Vulnerability lists provide tabular data and interactivity for reviewing, assessing, and triaging vulnerabilities within projects and namespaces. These requests are high-read, wide-column and (often) filtered. With a shift towards query-based view aggregation, columnar stores provide significant advantages in fetching limited columns for a given table rather than needing full record access.

In addition, with ongoing architecture aimed at reducing persistence to user-interaction such as the work to Create Vulnerabilities on-the-fly there is a lot of potential for performance improvements in shifting vulnerability finding storage to ClickHouse over PostgreSQL.

Researches

Brown bag sessions

Secure team members also share knowledge through brown bag sessions on various topics.


Data model for Dependencies Information
This document explores additions to the Security reports and to the database schemas that would enable new features for Dependency Scanning and License Scanning.
Secure Architecture - Feedback (Dismiss, create an issue or a Merge Request)
Overview Once a Finding is reported for a project, users can interact with it in multiple ways. One of them is called Feedback and allows the user to: dismiss the finding create an issue from the finding create an MR from the finding These features are described in details in our user documentation. It has been called Feedback as the initial intent was to gather feedback from users about reported Findings and possibly leverage that to increase the signal to noise ratio.