Published on August 21, 2019
9 min read
The classification of data is a huge step in the right direction when it comes to handling Zero Trust, but it comes with its own set of challenges.
Update: This is part 3 of an ongoing Zero Trust series. See our next post: Zero Trust at GitLab: Mitigating challenges with data zones and authentication scoring.
Zero Trust is the practice of shifting access control from the perimeter of the org to the individuals, the assets, and the endpoints. For GitLab, Zero Trust means that all devices trying to access an endpoint or asset within our GitLab environment will need to authenticate and be authorized. This is part three of a multi-part series.
Check out these other posts to get up to speed:
One of the main objectives for the Security team at GitLab is to protect data, regardless of whether it is our customer data or employee data. Instead of simply viewing Zero Trust Networking (ZTN) as some type of solution for authentication, we also look at it as a way to further our data protection. This poses specific challenges for both the data and the infrastructure the data resides upon.
We’ve established a classification of data policy at GitLab, so we understand the protections necessary. The emphasis of the data classification policy is to define mapping between access controls and data, where the level of sensitivity of the data can appropriately be protected. To help with the understanding and to allow for quicker identification, the four data classification levels are mapped to a color coding. The color codings are RED
, ORANGE
, YELLOW
, and GREEN
– with RED
being the most sensitive data, down to GREEN
being public data.
This classification of data is a huge step in the right direction when it comes to handling ZTN. That being said, when it comes to data classification there are a few areas where we anticipate challenges with regards to ZTN:
The state of data changes over time. Data that is in one classification may change over time based upon any number of factors. An example is ORANGE
sales data for a public company – if disclosed before a certain date this could lead to insider trading. However after a certain date the sales data would become public, or GREEN
data.
Tracking of data/metadata. The tracking of data and its metadata, including origin and classification that define requirements for handling, is non-trivial. Applying labels (data classification labels, not to be confused with the labeling done within the GitLab software itself) to data helps in enabling control of the data. These labels can be related to the data’s function as well as conditional access controls needed.
Time limits on certain data. Applying data classification labels to data will require time limits. In the above example, the label is “part of the US DoD project team,” and access to this data may expire after 30 days and need to be removed/re-applied for/auto-extended under certain circumstances, etc.
Capability of data. Departmental data collected might be subject to classification based upon what it is capable of instead of what it actually seems to be (think Tenable scanning data). The same would apply to customer-generated data, such as ZenDesk tickets. Basically, because we cannot control what a customer might say or what a security scan might find. It is possible that someone could have access to a system or even manage parts of that system, yet should not be able to see all of its data.
Movement of data. Departmental data that is transferred between systems, either automated or manual, could affect the classification of itself or the surrounding data, especially if the data is transformed or cleansed in some way. For example, situations and potential security problems reported via ZenDesk or HackerOne are often transferred to GitLab.com so we can “work the issue.” These are often sanitized to a degree. Here is a more detailed example to illustrate this:
ORANGE
data impact, but the real impact affects more than one customer, so the problem is considered an impact to RED
data. After a patch and resolution of the problem, we make the details of the situation public and include vulnerability, patch, and resolution information. We state it was reported to us by “a customer.” Association with XYZ corporation would still be ORANGE
data. However, the previous RED
classification of the problem itself is now considered GREEN
since the problem is resolved and we have made the problem and its solution public.As you can see, on the surface there seems to be no problem with securing our data with the assistance of ZTN, but once you start to explore "edge cases" one begins to reach the conclusion that these are not actually edge cases, but working examples of how we interact with our data. In most examples, this will not be a problem as we have granular control over our data, but when it comes to ZTN we need to make sure we consider the changing state of our data. The main thing we wish to avoid is an authentication decision being made based upon a particular classification of data on a system when the classification of that data is known to change over time.
Granular data access is typically controlled at the system level, so we should be just fine. A closer look at our infrastructure may indicate otherwise, so a more detailed examination is required.
The infrastructure needs to be defined, including some semblance of where the data resides and how it is accessed. For the systems we directly manage and control down to the very lowest level, we have a good grasp on what we have to work with and what controls are available to regulate access to the data they contain. However, a decent part of our infrastructure resides on systems we do not fully control.
In the modern cloud age, the rise of software as a service (SaaS) applications has become an important part of everyday business operations. Instead of maintaining servers in a server room, a vendor uses the cloud and makes the application accessible over the internet. Each company has their own private set of data maintained by the SaaS provider, and may have different levels of features based upon price that allow them to manipulate and control the data. Examples include Expensify for handling expenses, BambooHR for handling HR functions, and so on. GitLab is no exception to this process. Deployment is often as easy as setting up accounts, and while we’re working to unify our authentication process under Okta, it is still not fully deployed.
As we are an all-remote company, our infrastructure is all-remote. We do the bulk of our company activity inside the GitLab.com software itself, but we also use roughly two dozen SaaS companies’ offerings as well. There are the usual suspects such as Slack and Zoom, but as mentioned we are currently using Expensify, BambooHR, ZenDesk, and many others.
Simply put, our infrastructure poses some unique challenges:
Fortunately we can leverage a number of the compliance efforts within the company to gain insight into what levels of control we can impose onto each system.
It sure seems like we have a lot of unique challenges! But we do have a huge leg up. For many organizations, the coming of ZTN means the end of the corporate VPN and the falling of huge chunks of the perimeter network. GitLab doesn’t have a corporate VPN to dismantle, and as we’ve said before we’re an all-remote company so there is no perimeter.
We’ve discussed a lot of challenges, in the next installment of this series we’ll start talking about a few specifics we are designing to help make things easier. If you’re researching, implementing, or considering ZTN, what are the challenges you’re tackling? Tell us in the comments.
Special shout-out to the entire security team for their input on this blog series.
Find out which plan works best for your team
Learn about pricingLearn about what GitLab can do for your team
Talk to an expert