Zero Trust at GitLab: The data classification and infrastructure challenge

Aug 21, 2019 · 10 min read
Mark Loveless GitLab profile

Update: This is part 3 of an ongoing Zero Trust series. See our next post: Zero Trust at GitLab: Mitigating challenges with data zones and authentication scoring.

Zero Trust is the practice of shifting access control from the perimeter of the org to the individuals, the assets, and the endpoints. For GitLab, Zero Trust means that all devices trying to access an endpoint or asset within our GitLab environment will need to authenticate and be authorized. This is part three of a multi-part series.

Check out these other posts to get up to speed:

One of the main objectives for the Security team at GitLab is to protect data, regardless of whether it is our customer data or employee data. Instead of simply viewing Zero Trust Networking (ZTN) as some type of solution for authentication, we also look at it as a way to further our data protection. This poses specific challenges for both the data and the infrastructure the data resides upon.

Dealing with data classification

We’ve established a classification of data policy at GitLab, so we understand the protections necessary. The emphasis of the data classification policy is to define mapping between access controls and data, where the level of sensitivity of the data can appropriately be protected. To help with the understanding and to allow for quicker identification, the four data classification levels are mapped to a color coding. The color codings are RED, ORANGE, YELLOW, and GREEN – with RED being the most sensitive data, down to GREEN being public data.

This classification of data is a huge step in the right direction when it comes to handling ZTN. That being said, when it comes to data classification there are a few areas where we anticipate challenges with regards to ZTN:

As you can see, on the surface there seems to be no problem with securing our data with the assistance of ZTN, but once you start to explore "edge cases" one begins to reach the conclusion that these are not actually edge cases, but working examples of how we interact with our data. In most examples, this will not be a problem as we have granular control over our data, but when it comes to ZTN we need to make sure we consider the changing state of our data. The main thing we wish to avoid is an authentication decision being made based upon a particular classification of data on a system when the classification of that data is known to change over time.

Granular data access is typically controlled at the system level, so we should be just fine. A closer look at our infrastructure may indicate otherwise, so a more detailed examination is required.

The infrastructure

The infrastructure needs to be defined, including some semblance of where the data resides and how it is accessed. For the systems we directly manage and control down to the very lowest level, we have a good grasp on what we have to work with and what controls are available to regulate access to the data they contain. However, a decent part of our infrastructure resides on systems we do not fully control.

In the modern cloud age, the rise of software as a service (SaaS) applications has become an important part of everyday business operations. Instead of maintaining servers in a server room, a vendor uses the cloud and makes the application accessible over the internet. Each company has their own private set of data maintained by the SaaS provider, and may have different levels of features based upon price that allow them to manipulate and control the data. Examples include Expensify for handling expenses, BambooHR for handling HR functions, and so on. GitLab is no exception to this process. Deployment is often as easy as setting up accounts, and while we’re working to unify our authentication process under Okta, it is still not fully deployed.

As we are an all-remote company, our infrastructure is all-remote. We do the bulk of our company activity inside the GitLab.com software itself, but we also use roughly two dozen SaaS companies’ offerings as well. There are the usual suspects such as Slack and Zoom, but as mentioned we are currently using Expensify, BambooHR, ZenDesk, and many others.

Simply put, our infrastructure poses some unique challenges:

Fortunately we can leverage a number of the compliance efforts within the company to gain insight into what levels of control we can impose onto each system.

What's next

It sure seems like we have a lot of unique challenges! But we do have a huge leg up. For many organizations, the coming of ZTN means the end of the corporate VPN and the falling of huge chunks of the perimeter network. GitLab doesn’t have a corporate VPN to dismantle, and as we’ve said before we’re an all-remote company so there is no perimeter.

We’ve discussed a lot of challenges, in the next installment of this series we’ll start talking about a few specifics we are designing to help make things easier. If you’re researching, implementing, or considering ZTN, what are the challenges you’re tackling? Tell us in the comments.

Special shout-out to the entire security team for their input on this blog series.

Photo by Pixabay on Pexels

“The classification of data is a huge step in the right direction when it comes to handling #ZeroTrust, but it comes with its own set of challenges” – Mark Loveless

Click to tweet

Edit this page View source