Gitlab hero border pattern left svg Gitlab hero border pattern right svg

GitLab Security Releases

On this page

Resources

Issues:

Documentation:

Overview

Security releases are naturally very similar to patch releases, but on a much shorter timeline.

Number of teams/departments are participating in each type of security release:

At GitLab, we have two types of releases:

We can also define two customers we are serving with security releases:

Participating teams

Security team involvement in security release process is detailed in security release docs.

Responsibilities can be summarised as:

Development team involvement in security release process is detailed in security release docs

Responsibilities can be summarised as:

Infrastructure team involvement in security release process is done through participation of two teams: Delivery team and Reliability teams.

Delivery team responsibility is detailed in security release docs and it is related to release management tasks. Reliability teams are responsible for applying hot patches for the internal customer.

Delivery team responsibility can be summarised as:

Quality team involvement in security release process is detailed in security release docs.

Responsibilities can be summarised as:

Marketing team involvement is related to sending notice to security mailing list.

Critical and non-critical security releases (per 2019-01)

Decision on what goes into a critical and what goes into a non-critical release is made initially by Security team which is triaging the initial security report.

Definition of severity and priority labels is noted in the security team handbook.

When the labels are applied, Security hands off the issue to the Development team which does its own estimation on how difficult it is to address the report. Development team can also shed some new light on the issue and, in cooperation with the Security team, discuss changing of severity and priority labels.

In the case of highest severity/priority, critical security release process is started. Development team creates a hot-patch that is handed of to Infrastructure team for applying to the environments of internal customer. In the same case, Security team informs the Delivery team (responsible for releases) of incoming highest severity/priority patch. Delivery team then prepares a possible timeline for critical security release.

In the case of lower severity/priority reports, non-critical security process is followed. Non-critical security process has a different release timeline and different urgency of providing fixes for the issue. Issues triaged with lower severity/priority are scheduled using the regular development processes and the internal customer receives the fix in similar way to external customers. The Delivery team creates a non-critical security release at a predefined schedule.

Internal and external customers (per 2019-01)

Customers are defined from the point of view of GitLab Inc. Internal customer is a team that runs any GitLab instance that potentially has important data that can affect business operations of GitLab Inc.

In 2019-01, the internal customer is the Infrastructure department whose teams are operating installations of GitLab for GitLab.com, dev.gitlab.org, and ops.gitlab.net.

External customer is not less important than internal customer, and the difference between the two only exists due to public exposure of internal customer and the distribution channels at disposal.

In the case of critical security releases, hot-patches are applied to critical infrastructure as soon as the patches become available. When the Development team prepares the final fix, release is deployed and hot-patches removed. The same release is then shipped to external customers and external customers are notified of release through various announcement channels.

In the case of non-critical security releases, no hot-patches are provided for internal customer. Internal customer receives the final release artifact and this gets deployed to various environments before. External customers then receive the same release artifact.

Open questions in security releases

With the above process overview, there are some open questions that should be addressed in order to make the security releases more efficient.

  1. Do all listed teams need to participate in the process?
  2. What are the SLA's for releasing the fixes to public for each of the severity/priority labels?
  3. How do we ensure that fixes do not leak to public given the public nature of our work?
  4. How do we ensure that the security/priority items are upgraded/downgraded in priority when new information is received, and how do we communicate it efficiently?
  5. What is the requirement behind releases containing only security fixes?

Do all listed teams need to participate in the process?

Given the title of this document, we know that at minimum we have Security and Development team involved and that fact is not going to change through process or automation.

The other teams are up for discussion, so let's focus on what can be removed, and changed(or unchanged) in their process.

Infrastructure department teams, Delivery and Reliability teams participate in the process. Reliability teams are currently only involved through verifying and applying hot-patches to GitLab.com environments. Through automation, Delivery team has created a new patching tool that is fully unattended. This means that Reliability teams could only be involved with confirming the fix given that this can directly affect their on-call duties.

REMOVE Reliability teams on-call person applying hot-patch CHANGE Reliability teams on-call person approving hot-patch

Delivery teams has responsibility in releasing fixes to both external and internal customers. Releasing to internal customer can be done through automation and CI/CD. For reasons described in CI/CD blueprint, this process is a road more than it is a process or automation change. Releasing to external customers has a different dimension in that it requires backporting to older versions and manual tasks to ensure that the backports are applied cleanly.

Due to the above, Delivery teams involvement in security release will remain UNCHANGED.

Quality team is responsible for providing access to backport releases for external customers and running an automated GitLab QA suite. Due to work executed in CY18, by Quality team but also Distribution team, involvement of Quality team in Security releases can be automated. Distribution team has created gitlab-provisioner tool which can create a full HA environment using the omnibus-gitlab package. This task is also fully automated in nightly pipelines where a full environment is created and GitLab QA ran against it.

With some additional changes to this setup, we can automatically create an omnibus-gitlab environment with automated QA executed against it. If there is a requirement of providing access to Security team for manual verification, some additional work can be done to allow for that.

REMOVE Quality team does not need to participate in Security releases

Marketing team is responsible for communication with the public. It is not fully clear what tasks need to be executed on, but from some minor investigation this involves scheduling an email campaign for security mailing list. Scheduling a tweet used to be one of the other tasks but it is difficult to confirm this at this point.

If it is possible to automate email campaign, and tweet scheduling, Marketing team can be completely removed from the process. Alternatively, if automation is not possible, one of the other involved teams can be responsible for executing this task. Given that Security team is starting, and has final tasks in the process, assigning this task to the Security team could remove another handover. In either case,

REMOVE Marketing team does not participate in Security releases CHANGE Automate email campaign and tweet, or change the owner of the task

Conclusion: We can remove participation of two teams in the process, Quality and Marketing, by automating tasks or changing ownership in the process.

What are the SLA's for releasing the fixes to public for each of the severity/priority labels?

Security team has definition of Severity and Priority labels for security issues.

What remains unclear from this description from the side of the Security process is a clear definition on timelines for delivering the fix and delivering the release containing the fix.

For level 2 and 3 of both S/P issues, time to remediate is defined within 60 and 90 days, respectively. This allows the Development teams to schedule the S/P 2/3 items through the regular scheduling process. The fixes will be released through one of the scheduled non-critical security releases that happen once a month. This would mean that S/P 2/3 issues have similar (if not the same) time to remediate and time to release for both internal and external customers.

For S/P 1 items, things complicate because of a different definition of time to remediate and time to release for internal and external customers.

For internal customer, defined time to remediate "As soon as possible" means that upon a report of S/P 1 issue, the process:

At this point, Delivery team needs to know about time to release. There are multiple releases being prepared in parallel at any point in time, both for internal and external customers. For internal customers, ongoing monthly release means that there are a number of deployments in motion cross environments of untagged (unreleased) versions of GitLab. For external customers, there might be an ongoing patch releases in preparation. Due to various reasons, these pipelines might be dependent on or can be blocking each other.

For this reason, it is important that we have a clear SLA defined for time to release, based on customer requirements and obligations.

This defined time is to be used for deciding on how to progress with the releases that are in progress or are scheduled.

An example timeline with time to release set to 72h:

As seen above, this time is giving Delivery team a tool for better defining the critical process and allows other participants a clearer timeline for providing critical security fixes to external customers.

Conclusion: We need to define time to release for S/P 1 items in addition to existing definitions.

How do we ensure that fixes do not leak to public given the public nature of our work?

Transparency and working in public is a workflow that all teams involved in this process are used to in their day-to-day work. Working on security releases requires a shift in workflow and more attention which can be more challenging. Below we will describe each teams workflow and how the challenge is resolved.

Security team receives reports through various channels, most (if not all) being private by default. For Security team, non-public works is more a routine however that does not mean that there are no challenges to overcome.

Once the security report is triaged, Security team needs to interact with the Development and Delivery teams through issues in projects owned by those teams. These projects are almost exclusively public. By creating a confidential issue, Security team is able to resolve the challenging part of their public work.

Development teams work in public projects on the day-to-day. Most assigned issues are public and code review is also happening in public. For them, working in private is a major shift in work. All processes and tooling is configured for public workflows, making private workflows an after thought.

When Development team is engaged in a security issue, they are working with Security team in a confidential issue. Since merge requests and all code pushed to a public project in GitLab are exclusively public, this means that code review needs to be carried out differently compared to the regular workflow. This is resolved by moving the security development to a separate instance (dev.gitlab.org) where projects are private. In security process docs for developers, steps and some tooling is provided to help aid this workflow. However, this is still error prone as it requires a manual intervention by developers and periodically accidents happen where security releases need to be released quicker (or partially) due to code leak. Optimal way of resolving this would be to have confidential merge requests. In absence of that feature, we could have a separate private group outside of default project paths on GitLab.com where security work could be carried out. The upside of this workaround is that it allows for using same tools and simpler automation because there is no requirement of separate GitLab instance. For this workaround, there is a proposal and a discussion around it in gitlab-ce/55648.

Similar to Development teams, Delivery team works almost exclusively public with all tooling and process being public. When a security release is started, Delivery team works from a checklist in a confidential issue. Since all fixes created by developers are on a private instance, this means that the Delivery team works of GitLab.com for coordinating the release, but actually prepares the release on dev.gitlab.org where the code fixes are created. Another challenge that the Delivery team is facing is that all tooling is, by default, executing commands that assume public releases. This means that when a release is being prepared, not appending --security to number of commands could leak the release to public. One possible solution for this problem is that all release related work is carried out in a non-public project and carrying out work that is interacting with public would require an extra step.

To summarise:

Team Task Current solution Possible solution
Security Report triage Confidential issue Confidential issue
Development Fix the vulnerability Work on dev.gitlab.org Confidential merge request OR work in private group on GitLab.com
Delivery Create a security release Work on dev.gitlab.org Work in private group on GitLab.com

How do we ensure that the security/priority items are upgraded/downgraded in priority when new information is received, and how do we communicate it efficiently?

As previously mentioned, Security team is responsible for triaging incoming security reports and assigning severity/priority labels. Development team can change severity/priority based on new facts gathered during the work. There are cases where a security vulnerability severity/priority label is lower than the impact a fix can create. Let's take an example of gitlab-ce/50319, and attempt to retrospect on it. Issue was assigned S/P 2, and the fix for this vulnerability has been developed and released in a non-critical security release. While the fix for the security vulnerability was important, it did have an impact on external customers workflows eg. support-forum/#4151.

Conclusion: How does a non security impact of a security vulnerability affect severity/priority of the security issue? Asked differently, if the security vulnerability is having a larger impact on users and their workflows, how do we work on changing the priority?

What is the requirement behind releases containing only security fixes?

Security release process is separate from patch release process, and that also reflects in the release artifact. Namely, each security release contains only security fixes.

For critical security releases, this makes a lot of sense because:

Non-critical security releases have a different sense of urgency:

As such, non-critical security releases create a lot of overhead for both the Delivery team and the internal/external customers. To demonstrate the current workloads, take this table from a 10.8 release in consideration:

Version Type Teams Urgency Visibility User perceived impact
10.8.0 Monthly release Development, Product, Delivery High Public High
10.8.1 Bug patch release Development, Delivery Normal Public Medium
10.8.2 Non-critical security release Security, Development, Delivery Normal Private Medium
10.8.3 Bug patch release Development, Delivery Low Public Low
10.8.4 Bug patch release Development, Delivery Low Public Low
10.8.5 Non-critical security release Security, Development, Delivery Low Private Low

From the table above we can see that we had total of 6 releases for the 10.8 release cycle. Monthly release being the first one is also of the highest urgency as this is also a marketing event for the company. That release also has a number of release candidates created in the run up to the public release making it highly complex. All work is carried out in public and all other work is secondary making it highly urgent job. From user perspective, monthly release is where we deliver new features so the impact to all our users is high.

The other releases in the table have different impact on different participants in the process but all of them combined can affect participants in different ways.

For example, 10.8.1 bug fix release was affecting internal and external customers so some users were expecting the fixes as soon as possible. This in turn was delaying releasing security fixes which were already prepared for release, only waiting for a convenient time for releasing.

The requirement to have a separated releases for bug fixes and security fixes has actually affected both releases. Release manager had to prepare two separate releases in parallel but execute them sequentially. Internal and external customers had to upgrade GitLab two times to receive fixes that are similar in urgency and impact.

If non-critical security fixes could have been released at the same time as bug fixes, the same process would look a bit differently:

Version Type Teams Urgency Visibility User perceived impact
10.8.0 Monthly release Development, Product, Delivery High Public High
10.8.1 Patch release Security, Development, Delivery Normal Private Medium
10.8.2 Bug patch release Development, Delivery Low Public Low
10.8.3 Patch release Security, Development, Delivery Low Private Low

If you observe the table above, you will see three significant changes:

From security perspective, there is no change to the process. Security fixes are still kept private until the announcement not posing any threat to any customer. From release perspective, there is less pressure to create release with the same impact.

Conclusion

With number of participants in security releases, and impact that those have on participants and consumers, we should work towards agreeing and committing to implement on the following: