The Composition Analysis group at GitLab is charged with developing solutions which perform Container Scanning, and Software Composition Analysis. See the exhaustive list of projects the group maintains.
The Composition Analysis group largely follows GitLab's Product Development Flow.
We leverage the issue's health status feature to communicate the progress of the issue.
All issues should be marked On Track
at the beginning of a milestone. This is currently done manually by the Engineering Manager.
Raising risk early is important. The more time we have, the more options we have. As such, the team reviews issues every week and discusses items that Need Attention
or are At Risk
to possibly course correct and re-assign resources based on the team's priorities.
Follow these steps when raising or downgrading risk:
On Track
- the work will be completed within the planned milestone.Needs Attention
- the issue is blocked or has other factors that need to be discussed.At Risk
- the issue is in jeopardy of missing the cutoff to ship within the planned milestone.In addition to the above workflow, the Composition Analysis group can be involved in some experiments, which might temporarily alter how we work.
There are no experiments in progress at the moment.
After the 19th, we conduct an asynchronous retrospective. You can find current and past retrospectives for Composition Analysis team in https://gitlab.com/gl-retrospectives/secure-sub-dept/composition-analysis.
On top of our development roadmap, engineering teams need to respond to additional requests (support, community contributions, security vulnerabilities). This increases context switching and depending on the load this can drastically impact our ability to achieve our plan. As a result, the Composition Analysis development team is actively reserving capacity each iteration to triage and address these requests. Each milestone, an engineer is designated to handle each one of these responsibilities, by following the schedule defined in this epic. The rotation follows the development cycle, which means from the 18th of current month to the 17th of the following month. The total time allocation represents around 10% of the engineering team.
Time allocation: 15% of 1 engineer.
Time allocation: 15% of 1 engineer.
Time allocation: 15% of 1 engineer.
~priority::1
. (See Bugs triaging process)~priority::1
. (See Infradev triaging process)These items must be triaged continuously throughout the milestone which means they must be checked multiple times a week.
We are responsible for triaging vulnerabilities reported on 2 sets of projects: our analyzers and their upstream scanner software. For the latter, we've set up mirrors to run our security scans.
HIGH
and CRITICAL
vulnerabilities that are no longer detected, and resolve those that have been confirmed as fixed.
No longer detected
activity filter.
- Upstream scanner vulnerabilities that are no longer detectedNo longer detected
activity filter.All vulnerabilities must be remediated by their respective SLA. This process only describes how to generate the required FedRAMP vulnerability issues required for the report.
We priortize findings by their CVSS severities and SLAs, and currently focus on security findings with these severity levels:
An exception is made for Container scanning
findings - we focus only on findings with Critical
severity.
Please utilize all the time you have set aside. If you complete all the ones at Critical and High, please continue to triage - we want to address all findings but we are working in a risk based order.
We use the Vulnerability Report with filters to focus on items matching our policy and reported on the relevant projects.
For each "Detected" item, investigate and either dismiss or confirm it. If it's not clear whether there's indeed a threat, escalate to our Application Security team.
For vulnerabilities discovered in upstream scanners, an issue must be created in GitLab's issue tracker, and we should work with the relevant Open Source community to help provide a resolution. As a last resort, we can patch locally or fork the upstream project temporarily to fix the vulnerability sooner.
When there is no doubt a vulnerability is a false-positive, it can be "Dismissed". Select the "Dismiss" option from the vulnerability status options. Finally, make sure to comment on the vulnerability status change notification to explain why.
Because of both the way severity is generically set in CVSS and automated scanners do not have all context for an application, many findings which may be high risk in other environments or scenarios are low risk for our users. The containers ingest code from a user project and that user has developer access, and the containers are ephemeral and related to a specific pipeline.
In some other cases, a finding is related to an upstream dependency or Operating System and there is no fix available and no fix planned. Please be sure to mark this issues using the labels; blocked or blocked upstream.
When an issue is both blocked for a few releases and low risk you may dismiss the finding with a note as to the reasoning. If there is an open issue notify the Application Security team with your specific reasoning and close the issue (if applicable). In the future we will specifically want to tag everything related to these findings as won't fix or blocked when they are being closed, for now that is only available on issues and not findings.
The following class of container scan vulnerabilities can be considered low risk:
To add items to the list above discuss repeatable finding patterns with Application Security, get approval from a leader in the security section, and add to this list.
If the vulnerability impacts a dependency:
For all other confirmed vulnerabilities, create a security issue to discuss and track the remediation.
When a vulnerability has been remediated, it can be "Resolved". When doing so, comment how it was remediated, then select the "Resolve" option from the vulnerability status options, and close the related vulnerability issue.
Unfortunately, creating a security issue can't be done yet via the "create issue" button from the vulnerability page or security dashboard as this only works when creating an issue in the same project where the error was reported and we've disabled the embedded issue tracker in our projects.
Instead, in our workflow we open all our issues in the main GitLab project.
As a workaround, you can copy and paste the content of the vulnerability page (this keeps markdown formatting!). Please also follow our Security guidelines about creating new security issues.
You can leverage quick actions to add the necessary labels.
/confidential
/label ~security ~"type::bug" ~"bug::vulnerability"
/label ~"section::sec" ~"devops::secure" ~"group::composition analysis"
<!-- depending on the affected project: -->
/label ~"Category:Software Composition Analysis"
/label ~"Category:Container Scanning"
It's important to add the ~security
and ~"bug::vulnerability"
labels as described above, because the AppSec Escalation Engine
will automatically pick up any issues with these labels and add additional labels ~security-sp-label-missing
and ~security-triage-appsec
as well as mention the issue in the #sec-appsec
Slack channel. At this point, the Stable Counterpart or Application Security team triage person will pick up the issue and assign a severity as part of the appsec triage rotation.
Once the issue is created, please add it to the vulnerability's linked items for ease of tracking.
Developers reporting the security issue should help the Application Security team assess the impact of the vulnerability, and update the issue description with an Impact
section.
If immediate feedback is required, then add a comment to the vulnerability issue with an @
-mention directed at one of the Security Engineers listed in the Stable Counterpart section, or ping them on slack.
For each open issue that has no Priority label ("Open" column), shortly investigate the bug (< 2h) and comment with your findings. Ideally you'd suggest Priority and Severity levels to guide PM decision. Depending on how confident you are, you can either set the labels by yourself, or make a suggestion in a comment, and ping PM.
Track how long you actually spent investigating each bug in the Composition Analysis Bug Triaging Time Tracker spreadsheet.
Please refer to our infradev process for more details.
If the image release process is failing, an incident should be created to track how it was detected, escalated, and resolved. Documenting our incidents makes it possible to search for previous incidents by keyword, labels, and other issue filters. We open all of our incidents in the main GitLab project.
Open a new incident and add a description of the problem along with any reproduction steps. Add the following labels so that we can track the incidents that have impacted composition analysis in the future.
<!--
Select one of the following severities
Ref: https://about.gitlab.com/handbook/engineering/quality/issue-triage/#severity
-->
/label ~"severity::1"
/severity S1
/label ~"severity::2"
/severity S2
/label ~"severity::3"
/severity S3
/label ~"severity::4"
/severity S4
<!--
Select one of the following priorities
Ref: https://about.gitlab.com/handbook/engineering/quality/issue-triage/#priority
-->
/label ~"priority::1"
/label ~"priority::2"
/label ~"priority::3"
/label ~"priority::4"
/label ~"section::sec" ~"devops::secure" ~"group::composition analysis" ~"type::bug" ~"bug::availability"
<!--
Select one of the following categories
-->
/label ~"Category:Dependency Scanning"
/label ~"Category:Container Scanning"
/label ~"Category:License Compliance"
To help our Product Manager prioritize maintenance issues, the engineering team assigns them a priority label.
~maintenance::refactor
).The Composition Analysis group maintains several projects to provide our scanning capabilities.
Additional notes:
As some of our analyzers rely on open source software, we include them in our security testing to increase coverage and reduce risk.
To do so, we mirror their repository and execute our security scans on them:
The vulnerabilities reported on the currently used version of the scanner are automatically reported in the group level Vulnerability Report and triaged as part of our security vulnerabilities triaging process.
SCANNER_VERSION
variable in the analyzer's Dockerfile
). Use exact commit if there is no git tag for the corresponding release we use.VERSION-security-checks
where VERSION
is the version of the upstream scanner we currently use (e.g. v6.12.0
)..gitlab-ci.yml
configuration file to configure all compatible security scans.VERSION-security-checks
the default branch, so that reported vulnerabilities are showing up on the dashboards and vulnerability reports.We check for new releases of the upstream scanners on a monthly basis, as part of our release issue. When an update is available, a new issue is created using the update scanner issue template and added to the next milestone.
Every analyzer relying on an upstream scanner has a "How to update the upstream Scanner" section in their readme detailing the process. This includes a verification for possible new security vulnerabilities and a license check which are detailed below.
Before releasing an analyzer with a newer version of its upstream scanner, we must ensure it is exempt of security vulnerabilities matching our current policy.
NEW_VERSION-security-checks
..gitlab-ci.yml
configuration file from the current VERSION-security-check
branch.NEW_VERSION-security-checks
and proceed with the update of the analyzer to use this newer version.Before releasing an analyzer with a newer version of its upstream scanner, we must ensure its license has not changed or is still compatible with our policy.
(Sisense↗) We also track our backlog of issues, including past due security and infradev issues, and total open System Usability Scale (SUS) impacting issues and bugs.
(Sisense↗) MR Type labels help us report what we're working on to industry analysts in a way that's consistent across the engineering department. The dashboard below shows the trend of MR Types over time and a list of merged MRs.
(Sisense↗) Flaky test are problematic for many reasons.
We are currently in the process of reviewing our error budget and identifying where we spend the most of it. See this issue.
As part of FY21-Q4 OKRs, we've started tracking and monitoring the Largest Contentful Paint for our web pages. The results can be viewed on this Grafana dashboard.