Vulnerability Management is the recurring process of identifying, classifying, prioritizing, mitigating, and remediating vulnerabilities. This overview will focus on infrastructure vulnerabilities and the operational vulnerability management process. This process is designed to provide insight into our environments, leverage GitLab for vulnerability workflows, promote healthy patch management among other preventative best-practices, and remediate risk; all with the end goal to better secure our environments.
To achieve these goals, we’ve partnered with Tenable and have deployed their software-as-a-service (SaaS) solution, Tenable.io, as our vulnerability scanner. Tenable.io allows us to focus on what is important; scanning for vulnerabilities, analyzing, and ingesting vulnerability data into GitLab as the starting point for our vulnerability management process.
Arguably the most important step for a successful vulnerability management process is defining the scope that the process will cover. Security and Infrastructure partnered to come up with a scope that would make sure all of our critical environments and systems were covered during deployment. The following environments are currently
in-scope for GitLab.com production:
|Digital Ocean||gitlab b.v.||yes||no|
For additional information on production
systems in these environment, please see https://about.gitlab.com/handbook/engineering/security/security-assurance/security-compliance/sec-controls.html#what-is-considered-production.
Note: If you believe a system you are responsible for should be included in the vulnerability management process, please contact Security Operations.
With these environments scoped out and Tenable scanners deployed, we can begin the vulnerability management process. Keep in mind that vulnerability management is a feedback loop - vulnerability scanners provide the vulnerability data which is analyzed and ingested to mitigate and remediate found vulnerabilities. Feedback from this process feeds into preventative initiatives that further secure our environments.
Currently, we break down vulnerability management into the following steps:
This step is where we scan resources in our environments to identify vulnerabilities. Once setup, scans run on regular cadences that meet or exceed our compliance framework requirements.
Vulnerability scan data is exported and analyzed to provide consolidated vulnerability data we can ingest into GitLab.com for vulnerability remediation tracking. This is currently a manual process where we export vulnerability data into a spreadsheet and pull out pertinent information.
Tenable also provides reporting functionality that is used by our Compliance team to run reports for audits.
Currently, we export vulnerabilities as CSV files. These exports are filtered to be specific to the project/account (for example, gitlab-production in GCP gets its own report). Once exported, we analyze
and consolidate the data into different
views, including; unique vulnerabilities, vulnerability count, vulnerability count by asset, and vulnerability by severity. Once completed, we group vulnerabilities by solution and open a corresponding issue
in the Vulnerability Management issue tracker. These issue are where all discussion and documentation for the vulnerability will occur. We also open a linked issue in Infrastructure issue tracker
which is where the vulnerabilities get scheduled for review and remediation.
The vulnerability issue template is shown below:
Once the data is prepared in a format that we can pull out the most important information, we can ingest into GitLab.com. Issues are opened in the Vulnerability Management tracker to track the remediation process of the vulnerability. Another issue is opened in the Infrastructure issue tracker linking to the Security Operations issue; these are so that the work can properly get prioritized and scheduled according to Infrastructure team’s workflow.
Currently, we group vulnerabilities on a number of factors, such as severity and solution, as to consolidate the work required to remediate. If there are 10 unique vulnerabilities, but all require the same solution, it makes more sense to open a single issue to work through remediation.
Vulnerability issues should be tagged with the
vulnerability type label. The following labels exist to track the vulnerability remediation workflow:
~vulnerability::vulnerable: This label identifies that the vulnerability has been opened, but not validated and is considered impactful to our environments per the assigned priority label. With this label a vulnerability issue should not be closed.
~vulnerability::validated: This label identifies that the vulnerability has been validated as legitimate and is scheduled for mitigation or remediation. With this label a vulnerability issue should not be closed.
~vulnerability::falsepositive: This label identifies that the vulnerability has been validated as a false positive and is no longer impactful to our environments. With this label a vulnerability issue can be closed.
~vulnerability::exception: This label identifies that the vulnerability has been validated as legitimate and has an approved exception issue to account for a business need. In extreme circumstances, a vulnerability issue can be closed with an exception.
~vulnerability::mitigated: This label identifies that the vulnerability has been validated and triaged. The impact has been reduced through compensating controls, but not remediated (it is still actively identified on vulnerability scans). With this label a vulnerability issue should not be closed.
~vulnerability::remediated: This label identifies that the vulnerability has been remediated and the remediation has been validated. With this label a vulnerability issue can be closed.
We also add the
VM label to all Vulnerability issues to scope the issues in the Vulnerability Management issue board.
Validation is an important part of vulnerability management. This is where we investigate to ensure that the vulnerability being reported has properly been identified.
Vulnerabilities can sometimes be identified during a scan, but are not actually on the system. This can happen for a number of reasons, but most commonly is the result of misflagged ports or services. These are classified as false positives and would go through the process to be closed as a false positive.
Remediation is the part of the process in which a validated vulnerability is fixed. The remediation process would be tracked in the corresponding vulnerability issue in the Vulnerability Management issue tracker. SLAs are in place to help prioritize vulnerability based on severity. Once a vulnerability is remediated, we will run followup scans on the impacted systems to validate that the vulnerability is indeed remediated.
For improved tracking of remediation issues, we are using GitLab Epics. The remediation epic includes monthly subepics that track remediation progress for that month. If remediation SLAs do not require a vulnerability to be remediated in a month, it will be rolled over into the following subepic until remediated or its due date passes.
There are several ways a vulnerability issue can be closed - below are some common vulnerability workflows using the
vulnerability labels as reference:
The most common workflow is to close a vulnerability issue as
Remediated. This means that a vulnerability has been validated and remediation has taken place. Below is the workflow:
A vulnerability must always be validated - but sometimes the validation can prove that a vulnerability is a false positive. Below is the workflow:
Sometimes issues arise that would otherwise prevent a vulnerability from being remediated or mitigated. While commonly, these would result in an open
Exception vulnerability issue status, there are unique cases where an issue can be closed as an exception. Below is the workflow:
Closed issue via the
Exception process are very rare. Generally, an exception is a non-permanent way to assume risk on a vulnerability due to extenuating circumstances in which remediation can not take place within the required SLAs. Below is the described the workflow:
Another common workflow is when a vulnerability is validated and a fix is scheduled for some time in the future (within the SLA). If we're able to, we will put mitigation in place in the interim to reduce the risk from the vulnerability. Below is the described workflow:
The last step is for Security Operations and Infrastructure to determine what we can learn from each vulnerability remediated. This may be an improvement on the vulnerability management process itself or establishing preventive mechanisms for a repetitive vulnerability type. This feedback will be documented in the vulnerability issue and could result in additional issues being opened.
As stated above, this process is a cyclical loop. Vulnerability scans are recurring, providing new vulnerability data that feed new vulnerability issues and update/escalate open issues.
Security and Infrastructure have come up with remediated SLAs based on a multitude of factors, such as severity, scope, impact, etc. All of these factors will be considered when mapping the priority to GitLab’s priority labels. The SLAs are as follows:
|Priority||Severity Mapping||Time to mitigate||Time to remediate|
|S1/P1||Zero-day||Within 24 hours||Within 72 hours (when technically feasible)|
|S2/P2||Critical||N/A||Within 30 days|
|S3/P3||High||N/A||Within 60 days|
|S4/P4||Medium||N/A||Within 90 days|
S1/P1 vulnerabilities discovered in scans would be worked on immediately through the incident management process and adhere to any timelines determined as such. This includes a 15-minute engagement, 24-hour mitigation, and 72-hour remediation SLA (from time of reporting).
Note: Mitigation SLAs only apply to S1/P1 vulnerabilities. These types of vulnerabilities often coincide with broad industry-impacting zero-day vulnerabilities and in the event of these types of events the 72 hour target would be impossible to meet or exceed. These exceptions will be documented and noted as they occur.
We understand that it is not always technologically feasible to keep all packages up-to-date due to application conflicts, or that a business decision may be made to not remediate a vulnerability because remediation would impact performance too greatly. Low risk vulnerabilities that may not get prioritized within the remediation SLAs should have an exception approved for them, documenting the low likelihood of exploit due to layered security, other compensating controls, mean of exploitation, etc.
With this in mind we have an exception process; If you've identified a vulnerability that is a candidate for an exception, please open an [exception issue] in the vulnerability management issue tracker.
Please fill all out the pertinent information requested in the template. For reference, the information required is as follows:
You will also need to describe the business need for the exception and document any existing/implemented compensating controls.
We currently allow exception lengths based on priority/severity as follows:
After the issue is open, the requestor should assign the due date to match that of the associated remediation issue and assign to the proper approver. The severity and priority of the vulnerability will dictate the approval process. This is documented below:
|P1/S1||VP of Security or Infrastructure|
|P2/S2||Director of Security, Operations|
|P3/S3||Security Manager, Security Operations|
|P4/S4||Security Engineer, Security Operations|
To ensure we are scanning all possilbe hosts in the scoped environments, we leverage Tenable connectors. These connectors run as a service account in our environment projects/accounts and pull metadata regarding all compute assets, thus populating an up-to-date view of all the assets in our environment. These imports run on a 24-hour schedule, meaning we always have a daily view of our assets across the environments.
We setup our traditional scans using subnets that encompass all of our assets. When new subnets are setup in our environments (for example, VPC networks in GCP or AWS), part of the process of setting up that network is making sure that (if in scope) authenticated vulnerability scans are setup. If a connector import ever finds an asset outside of the subnets we are currently scanning, an investigation is launched to determine the validity of that host. These hosts would show up as validated, but unassessed. If it is a legitmate host we will add that new subnet the asset is in to our scanning schedule. While we prefer to setup the scans prior to new networks being setup, this feedback loop ensures we never miss assets when scanning.
Vulnerability scans occur on a weekly basis in our scoped environments. The schedule can be seen below:
The start times are always consistent - however, scan durations may fluctuate based on a multitude of factors. Generally, the production scans complete in under 2-hours. We’ve segmented the
gitlab-production scans to reduce impact to the environment. We’ve also enabled load throttling, so if increased load is detected on the systems/networks being scanned, Tenable will reduce its footprint to further reduce impact.
The target groups used in these scans are setup using GCP VPC network ranges to ensure any newly provisioned resources are scanned without manually inputting the resources IP into the scan. We will leverage similar functionality during our AWS and Azure deployments.
Note: for more information, please visit the
#tenable-notificationschannel on Slack where there are links to documentation breaking down what hosts are in what scan group.
If you have any questions or concerns related to vulnerability management please contact Security Operations in
#security-department channel on slack, by tagging
@sec-ops-team in slack or
@gitlab-com/gl-security/secops in a GitLab issue, or finally you can open an issue in the Security Operation issue tracker. All work being done to improve this process is also tracked in the issue tracker.
Any questions regarding ownership around vulnerability management can be answered in GitLab’s tech stack documentation.