Zero Trust is the practice of shifting access control from the network perimeter to the assets, individuals, and the respective endpoints. For GitLab, Zero Trust means that all users and devices trying to access an endpoint or asset within our GitLab environment will need to authenticate and be authorized. This is part 6 of 6 in our series.
- Part one: The evolution of Zero Trust
- Part two: Zero Trust at GitLab: Problems, goals, and coming challenges
- Part three: Zero Trust at GitLab: The data classification and infrastructure challenge
- Part four: Zero Trust at GitLab: Mitigating challenges with data zones and authentication scoring
- Part five: Zero Trust at GitLab: Implementation challenges
We've talked pretty openly about forming our ZTN approach and the challenges we expect along the way – as well as the challenges we've already met. If there is an area of ZTN that we've not addressed, or if you're interested in diving deeper into the topic, we invite you to join us October 29, 3-4 pm ET for our Zero Trust Reddit AMA where you can Ask Us Anything!
Where we are
I guess it makes sense to talk about where we are at with this whole ZTN thing. In addition to establishing policies for team members (based upon job descriptions and placement in the org chart), we have classified our data and mapped out our environment so we know where all of the parts are. But there are a few items we want to explain with a bit of detail.
Using Okta, we have managed to get (as of this writing) 70 of our SaaS apps under some semblance of control. This “control” has varied heavily – some SaaS apps cleanly and seamlessly integrated with Okta, and some were working kinda-sorta-good-enough to call them integrated. The majority of SaaS integrations work fine as they used SAML and easily integrate in minutes. We can provision and deprovision accounts with simple assignments. Departments like People Ops can do provisioning within minutes instead of days. For some of the integrations, we can force the user to go through Okta, and in a few cases where we have sensitive data, we have extra security steps. For example, to access BambooHR users have to go through Okta first (and using Multi-Factor Authentication aka MFA) instead of direct access, and they have to perform yet one more MFA-style step of authentication just for BambooHR.
Are there problems with this? Sure. Not everything integrates as well as Greenhouse or BambooHR, because each SaaS has implemented their own APIs and done their own SAML setup. Some don’t offer consistent interfaces to integrate with, which means that our team members can bypass Okta and go straight into the SaaS app in some cases, and in others they are forced to use Okta. This workflow inconsistency is sometimes frustrating for team members. We’re constantly updating our team member instructions on Okta usage and try to communicate it to all team members as best we can, but we are impacting some users’ workflows. For example, if you sign in via Okta, you need to keep that tab open in your browser, otherwise your Okta session will end and you’ll find yourself repeatedly “MFAing” until you’re blue in the face. Many people are not used to working that way, and not having all SaaS apps working exactly the same doesn’t help. But overall, the time savings and security are great gains for ZTN and we are quite happy with the implementation.
As I write this, we are getting ready to start the Okta ASA rollout to Staging to give it a good test. Like SaaS, we expect a few hiccups here and there – especially since this is a new product for Okta, released earlier this year. And talk about workflow changes – if you thought browser-based application users were picky, command line SSH users are a bizarre bunch indeed. Command line junkies practically have their own religion around workflow and we’re introducing a change to that workflow. Yes, it is a minor change, but it already concerns me. Truthfully, because I am one of those oddball Linux users who lives on the command line and I tend to get fairly picky after a couple decades of being able to adjust and customize every aspect of my experience.
This will seem like a weird one, but mitigating a security issue actually helped us out from a ZTN perspective. There was a security issue reported via our HackerOne program that allowed for malicious users to gather IP addresses from unsuspecting victims via embedded image files. The solution was to use Camo proxy to resolve the issue. The Camo proxy was widely deployed to ensure all possible links were protected and had the side benefit of ensuring communications going through the proxy were encrypted. Encrypting communications was one of the items we wanted as a part of ZTN and, as it turned out, we’d already done it.
A sound foundation
There are two things we want from our servers and containers and databases. First, we want them buttoned down tight and properly secured. All of these systems have robust controls, and we can perform all kinds of monitoring, but we have to do it at scale. Tightening security controls is especially important if you are using some of the Zero Trust-ish solutions out there to regulate access to these systems. We’re talking about automation of access provisioning, so we want to make sure that minimal access levels required for data stored on systems remains minimal access. This means no escalation of privileges due to configuration mistakes or security vulnerabilities. We also want to make sure that all services being offered up by these systems are as secure as possible against compromise, either locally or remotely.
Second, we want complete visibility into our infrastructure. If something goes awry with a vulnerability being disclosed that potentially impacts our systems or a security incident happens, we want to be able to quickly assess the state of the environment, ensure patches are installed, receive alerts based upon custom triggers to help monitor everything, and so on.
We are using Tenable (mainly for assessments) and Uptycs (mainly for monitoring and alerting) in our environment to help with this visibility. Both certainly handle the basics just fine, in fact Tenable has been quite up to the task. We are facing a few challenges with Uptycs as we’d like to do more than what the product currently offers. This may not sound like traditional ZTN territory, but it is. It does no good to offer up state-of-the-art authentication and authorization to resources that are poorly maintained and monitored. Like everything else in our company, we face issues with scale – our infrastructure needs to grow and managing the security of that infrastructure must also scale well. Right now we can manage the security of our environment just fine. In fact, it is quite strong, but a lot of it relies on manual intervention which has scaling issues. We have a lot of hash marks in the “win” column with Tenable, but as we scale and expand we’re challenged by Uptycs. In the spirit of openness, we’ll keep you posted on how this progresses.
The log ride
To get a grip on all of this activity, we need to be able to grab all the logs, toss them into one place, and make sense out of them. Our goal is two-fold: we need to understand how our system is being used so we can fine-tune it and we need to be able to detect anomalous events that could signify potential breaches. All of our systems put out logs, and we’ve designed systems to monitor those logs. It is nice to automate alerts so as odd events occur, we’re immediately notified, and in some cases, issues are automatically opened for further triage. We’ve started down this path with deployment of several technologies, related to the Logging Working Group. We’re in the initial first steps, and we expect that logs generated from the various ZTN implementations will help improve the logging efforts, perhaps even propel it along quicker as we work out the kinks.
The Budget Issue
A big ZTN question we get involves budget. After all, one company’s solution may involve a couple of small purchases and a large effort of tweaking and reconfiguring existing technology that is already deployed. Another company might have to make some major investments in new products just to get started. In other words, how do you budget for a solution when you don’t know exactly what that solution will look like?
This is probably one of those things a lot of organizations do not discuss, at least in any detail outside of “it’s expensive”. The idea of ZTN as a concept is an easy sell to most organizations because the benefits are so great. At the lofty bullet-point level on vendor slides, they often seem completely undeniable. But when you break down a concept into digestible and deployable components, you are often into interesting budget territory. Getting a department to buy into the concept is much easier than getting a department to alter their budget and purchase the XYZ product, deploy it, maintain it, and oh yeah please give the security department all of the logs. Of course this is a slight exaggeration to convey a point, but it is more often on the mark than not. We simply couldn’t fully budget for most of this because we didn’t know what we were going to be deploying until we found a particular solution.
In this case we have to be able to show an ROI, which means we need to help a department understand the benefits and actually show an improvement to that department’s bottom line. For example, Okta has allowed us to change some onboarding and offboarding processes from days into minutes – and it's a massive timesaver. The push for Okta ASA is because our Infrastructure department saw the gains realized from our Okta rollout, and asked for something similar. Regardless of which department’s budget this could go against, it has to be sold to someone internally. Showing an ROI that clearly states we could financially benefit in one or more areas is really the only way to go about it. Showing the benefits is critical when you are searching for solutions to problems with no idea which solution will work.
Since a lot of people ask for advice on ZTN in general, I’d like to share some impressions from our experience. Here are some major things that really have helped us.
Break down your needs into simple components
You do this by defining the problem end-to-end. For us, we could break it down into user identification and authentication, device identification and authorization, data classification, and policy enforcement. Each part was further broken down into smaller pieces – which includes a lot of what we covered in previous blog posts. This deconstruction helped us understand all of the areas we needed to work with.
Look at areas of winning
If a deployed technology is already solving part of the problem, can it be expanded? If it can’t, why not? Where are the gaps? List those gaps and use them to identify possible solutions during the review. We covered this topic in detail in a previous blog post, ZTN implementation challenges .
Ignore the vendor “spin”
There are vendors that sell solutions where they claim to be solving ZTN. In my ancient past, I worked for a company that sold (among other things) system administration tools. One day our boss handed us a list of compliance guidelines for three different standards. We were to go through each bullet item for each standard, point out the system administrative tools and the various system checks in our products that lined up with each bullet item, and write them down. This process took a few days, and by the end of the week each compliance standard had a list of checks. The product team grouped these checks together, and just like that we were a compliance company. Now the product line was actually quite good and robust which made this fairly easily, but the pivot of the company to being compliance-focused took longer for that marketing team to print up flyers than it did for the tech part. Yes, we were incomplete – we weren’t asked to write additional checks, we were asked to just use existing checks. But we literally were ready in less than a week with something we could call compliance.
My point here is that I often get the feeling that ZTN vendors do the same thing. They looked over their existing product line, figured out what they could even remotely claim as being a part of a “Zero Trust” solution, and overnight became a ZTN solutions provider. Of course, if your own organization’s world view on what ZTN is lines up with a particular vendor, great! Buy it. But, for GitLab, we had to break down what we wanted the various components of our technology and data to do and align them with our own ideas of ZTN, refine our model, and then go find vendors that did extremely specific things. For example, we’ve approached Okta with the breakdowns we are trying to solve – and they have products that solve them. For the most part we’ve ignored the whole “ZTN packaged solutions” approach and went after the core of what their products do, and we’re solving our problems as a result.
We’re getting there. We have a lot of wins, and a number of interesting challenges. Every once in a while we will post a new blog to keep you current on our security saga with Zero Trust, and hopefully you can learn from our examples – including our challenges – and help make your systems, data, and users as secure as possible. We hope you’ll follow along and, if you’ve got a ZTN viewpoint to share, we invite you to comment below.
Special shout-out to the entire security team for their input on this blog series.
Photo by Puria Berenji on Unsplash.
“We take a look back at how far we've come in our ZTN implementation at @GitLab, and at the progress we still need to make.” – Mark Loveless
Click to tweet