Blog Company GitLab Annex solves the problem of versioning large binaries with git
February 17, 2015
4 min read

GitLab Annex solves the problem of versioning large binaries with git

GitLab solves the biggest limitation of git compared to some older centralized version control systems has been the maximum size of the repositories.

annex.jpg

WARNING git-annex support on GitLab EE was deprecated in GitLab 8.17 (2017/02/22), and was permanently removed in GitLab 9.0 (2017/03/22). Read through the migration guide from git-annex to git-lfs. Updated on 2017/04/04.

The biggest limitation of git compared to some older centralized version control systems has been the maximum size of the repositories. The general recommendation is to not have git repositories larger than 1GB to preserve performance. Although GitLab has no limit (some repositories in GitLab are over 50GB!) we subscribe to the advice to keep repositories as small as you can.

Not being able to version control large binaries is a big problem for many larger organizations. Video, photo's, audio, compiled binaries and many other types of files are too large. As a workaround, people keep artwork-in-progress in a Dropbox folder and only check in the final result. This results in using outdated files, not having a complete history and the risk of losing work.

In GitLab 7.8 Enterprise Edition this problem is solved by integrating the awesome git-annex. Git-annex allows managing large binaries with git, without checking the contents into git. You check in only a symlink that contains the SHA-1 of the large binary. If you need the large binary you can sync it from the GitLab server over rsync, a very fast file copying tool.

Using GitLab Annex

For example, if you want to upload a very large file and check it into your Git repository:

git clone [email protected]:group/project.git
git annex init 'My Laptop'            # initialize the annex project
cp ~/tmp/debian.iso ./                # copy a large file into the current directory
git annex add .                       # add the large file to git annex
git commit -am "Added Debian iso"      # commit the file meta data
git annex sync --content              # sync the git repo and large file to the GitLab server

Downloading a single large file is also very simple:

git clone [email protected]:group/project.git
git annex sync                        # sync git branches but not the large file
git annex get debian.iso              # download the large file

To download all files:

git clone [email protected]:group/project.git
git annex sync --content              # sync git branches and download all the large files

By integrating git-annex into GitLab it becomes much easier and safer to use. You don't have to set up git-annex on a separate server or add annex remotes to the repository. Git-annex without GitLab gives everyone that can access the server access to the files of all projects. GitLab annex ensures you can only access files of projects you work on (developer, master or owner role).

How it works

As far as we know GitLab is the first git repository management solution that integrates git-annex. This is possible because both git-annex and GitLab stay very close to the unix paradigms. Internally GitLab uses GitLab Shell to handle ssh access and this was a great integration point for git-annex. We've added a setting to GitLab Shell so you can disable GitLab Annex support if you don't want it.

Of course we are very thankful for the awesome git-annex software that makes all of this possible. It was mainly written by Joey Hess whose work is sponsored by community fundraisers. It may be interesting to know that git-annex is written in Haskell, making it very fast and reliable.

To use GitLab annex you have to use GitLab Enterprise Edition 7.8 (which will be released on February 22) or GitLab.com. You'll have to use the Git over SSH protocol to connect to your GitLab server instead of the Git over HTTPS protocol. We look forward to feedback and enhancements from the rest of the community. For example, it would be nice if the GitLab UI shows the file size of the large file instead of the symlink. But let's first celebrate this milestone, being able to easily version control your large files!

Update

We made a video of Dmitriy, CTO of GitLab, explaining how git-annex integrates with GitLab Enterprise Edition. The video is raw and unedited.

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert