The move to microservices-based architecture creates more attack surface for nefarious actors, so when our security researchers discovered a file upload vulnerability within GitLab, we patched it right up in our GitLab 12.7.4 security release. We dive deeper into the problems that lead to this vulnerability and use it to illustrate the underlying concept of parser differentials.
File Uploads in GitLab
To understand the file upload vulnerability we need to go a bit deeper into file uploads within GitLab, and have a look at the involved components.
GitLab Workhorse
The first relevant component is GitLab's very own reverse proxy called gitlab-workhorse
.gitlab-workhorse
fulfills a variety of tasks, but for this specific example we only care about certain kinds of file uploads.
The second component is gitlab-rails
, the Ruby on Rails-based heart of GitLab. It's the main application part of GitLab and implements most of the business logic.
The following source code excerpts from gitlab-workhorse
are based on the 8.18.0
release which was the most recent version at the time of identifying the vulnerability.
Consider the following route, defined in internal/upstream/routes.go
, which handles file uploads for Conan packages:
// Conan Artifact Repository
route("PUT", apiPattern+`v4/packages/conan/`, filestore.BodyUploader(api, proxy, nil)),
The route defined above will pass any PUT
request to paths underneath /api/v4/packages/conan/
to the BodyUploader
. Within this BodyUploader
now some magic happens. Well, actually, it's not magic, the BodyUploader
receives the uploaded file and lets the gitlab-rails
backend know where the file has been placed. This happens in internal/filestore/file_handler.go
.
Also worth mentioning: Any not-matched routes in gitlab-workhorse
will be passed on to the backend without modification. That's especially important in our discussion for non-PUT
routes under /api/v4/packages/conan/
.
// GitLabFinalizeFields returns a map with all the fields GitLab Rails needs in order to finalize the upload.
func (fh *FileHandler) GitLabFinalizeFields(prefix string) map[string]string {
data := make(map[string]string)
key := func(field string) string {
if prefix == "" {
return field
}
return fmt.Sprintf("%s.%s", prefix, field)
}
if fh.Name != "" {
data[key("name")] = fh.Name
}
if fh.LocalPath != "" {
data[key("path")] = fh.LocalPath
}
if fh.RemoteURL != "" {
data[key("remote_url")] = fh.RemoteURL
}
if fh.RemoteID != "" {
data[key("remote_id")] = fh.RemoteID
}
data[key("size")] = strconv.FormatInt(fh.Size, 10)
for hashName, hash := range fh.hashes {
data[key(hashName)] = hash
}
return data
}
So gitlab-workhorse
will replace the uploaded file name by the path to where it has stored the file on disk, such that the gitlab-rails
backend knows where to pick it up.
Observe the following original request, as received by gitlab-workhorse
:
PUT /api/v4/packages/conan/v1/files/Hello/0.1/root+xxxxx/beta/0/export/conanfile.py HTTP/1.1
Host: localhost
User-Agent: Conan/1.22.0 (Python 3.8.1) python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: close
X-Checksum-Sha1: 93ebaf6e85e8edde99c1ed46eaa1b5e1e5f4ac78
Content-Length: 1765
Authorization: Bearer [.. shortened ..]
from conans import ConanFile, CMake, tools
class HelloConan(ConanFile):
name = "Hello"
[.. shortened ..]
This is what this request will look like to gitlab-rails
after gitlab-workhorse
has processed it (excerpted from api_json.log
):
{
"time": "2020-02-20T14:49:44.738Z",
"severity": "INFO",
"duration": 201.93,
"db": 67.34,
"view": 134.59,
"status": 200,
"method": "PUT",
"path": "/api/v4/packages/conan/v1/files/Hello/0.1/root+xxxxx/beta/0/export/conanfile.py",
"params": [
{
"key": "file.md5",
"value": "719f0319f1fd5f6fcbc2433cc0008817"
},
{
"key": "file.path",
"value": "/var/opt/gitlab/gitlab-rails/shared/packages/tmp/uploads/582573467"
},
{
"key": "file.sha1",
"value": "93ebaf6e85e8edde99c1ed46eaa1b5e1e5f4ac78"
},
{
"key": "file.sha256",
"value": "f7059b223cd4d32002e5e34ab1ae5b4ea12f3bd0326589b00d5e910ce02c1f3a"
},
{
"key": "file.sha512",
"value": "efbe75ea58bd817d42fd9ca5ac556abd6fbe3236f66dfad81d508b5860252d32d1b1868ee03c7f4c6174a0ba6cc920a574b5865ca509f36c451113c9108f9a36"
},
{
"key": "file.size",
"value": "1765"
}
],
"host": "localhost",
"remote_ip": "172.17.0.1, 127.0.0.1",
"ua": "Conan/1.22.0 (Python 3.8.1) python-requests/2.22.0",
"route": "/api/:version/packages/conan/v1/files/:package_name/:package_version/:package_username/:package_channel/:recipe_revision/export/:file_name",
"user_id": 1,
"username": "root",
"queue_duration": 16.59,
"correlation_id": "aSEqrgEfvX9"
}
In particular, the params
entry file.path
is of interest, as it denotes the file system path where gitlab-workhorse
has placed the uploaded file.
gitlab-rails
This gitlab-workhorse
-modified request, as gitlab-rails
will see it, is handled in lib/uploaded_file.rb
within the from_params
method:
01 def self.from_params(params, field, upload_paths)
02 path = params["#{field}.path"]
03 remote_id = params["#{field}.remote_id"]
04 return if path.blank? && remote_id.blank?
05
06 file_path = nil
07 if path
08 file_path = File.realpath(path)
09
10 paths = Array(upload_paths) << Dir.tmpdir
11 unless self.allowed_path?(file_path, paths.compact)
12 raise InvalidPathError, "insecure path used '#{file_path}'"
13 end
14 end
15
16 UploadedFile.new(file_path,
17 filename: params["#{field}.name"],
18 content_type: params["#{field}.type"] || 'application/octet-stream',
19 sha256: params["#{field}.sha256"],
20 remote_id: remote_id,
21 size: params["#{field}.size"])
22 end
We can see here the handling of the uploaded file reference. The part in line 10-13
in the snippet above implements a whitelist of a specific set of paths from where a gitlab-workhorse
uploaded file will be accepted.Dir.tmpdir
which resolves to the path /tmp
is added to the whitelist as well. In the subsequent lines a new UploadedFile
is constructed from the file.path
and other parameters gitlab-workhorse
has set.
gitlab-workhorse
bypass
So we've seen the inner workings of both gitlab-workhorse
and gitlab-rails
when it comes to file uploads for Conan packages. In recap it would go as follows:
From an attacker perspective it would be nice to meddle with the modified PUT
request, especially control over the file.path
parameter would allow us to grab arbitrary files from /tmp
and the defined upload_paths
. But as gitlab-workhorse
sits right in front of gitlab-rails
we can't just pass those parameters or otherwise interact directly with gitlab-rails
without going via gitlab-workhorse
.
We can indeed achieve this by leveraging the fact that gitlab-workhorse
parses the HTTP requests in a different way than gitlab-rails
does. In particular, we can use Rack::MethodOverride
in gitlab-rails
which is a default middleware in Ruby on Rails applications. The Rack::MethodOverride
middleware allows us to send a POST
request and let gitlab-rails
know "well, actually this is a PUT
request! ¯\_(ツ)_/¯ ". With this little trick we can sneak past the gitlab-workhorse
route which would intercept the PUT
request, as gitlab-workhorse
is not aware of the overridden POST
method. So by specifying either a _method=PUT
parameter or a X-HTTP-METHOD-OVERRIDE: PUT
HTTP header we can indeed directly point gitlab-rails
to files on disk. The method override is used a lot in Ruby on Rails applications to allow simple <form>
based POST
requests to use other REST
-based methods like PUT
and DELETE
by overriding the <form>
s POST
request with the _method
parameter.
So a POST
request to the right Conan endpoint with a file.path
and file.size
parameter will do the trick.
A full request using this bypass would look like this:
POST /api/v4/packages/conan/v1/files/Hello/0.1/lol+wat/beta/0/export/conanmanifest.txt?file.size=4&file.path=/tmp/test1234 HTTP/1.1
Host: localhost
User-Agent: Conan/1.21.0 (Python 3.8.1) python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: close
X-HTTP-Method-Override: PUT
X-Checksum-Deploy: true
X-Checksum-Sha1: ee96149f7b93af931d4548e9562484bdb6ac8fda
Content-Length: 4
Authorization: Bearer [.. shortened ..]
asdf
This would, instead of uploading a file, let us get a hold of the file /tmp/test1234
from the GitLab server's file system. In recap, the flow to exploit this issue looks as follows:
We fixed this issue within gitlab-workhorse
by signing Requests which pass gitlab-workhorse
, the signature then is verified on the gitlab-rails
side
How parser differentials can introduce vulnerabilities
Let's take a huge step back and see from an high-level perspective what just happened. We've had gitlab-workhorse
and gitlab-rails
both looking at a POST
request. But gitlab-rails
ultimately saw a PUT
request due to the overridden HTTP method.
What occurred here is a case of a parser differential, as gitlab-workhorse
and gitlab-rails
parsed the incoming HTTP request differently. The term parser differential originates from the Language-theoretic Security approach. It denotes the fact that two (or more) different parsers "understand" the very same message in a different way. Or, as described in the LangSec handout as follows:
Different interpretation of messages or data streams by components breaks any assumptions that components adhere to a shared specification and so introduces inconsistent state and unanticipated computation.
Indeed such issues and the consequential unanticipated computation get more and more common when we look at modern web environments. The days of web applications being a stand-alone bunch of scripts invoked on a web server are long gone. The rise of microservices leads to complex environments and the very same message (or HTTP request) might be interpreted by several different services in several different ways. Just as shown in the above example this sometimes comes along with security implications.
From the point of view of a pragmatic bug hunter, the idea of parser differentials is very interesting as those issue can yield unique security bugs. Consider, for instance, this RCE in couchdb. Also the HTTP desync attack technique, which has gotten a lot attention in the bug bounty community, is a matter of parser differentials.
For the developer perspective we need to be aware of other components and their parsing behavior in order to avoid security issues which arise from interpreting the same message differently.
Cover Photo by Marta Branco on Pexels