The biggest limitation of git compared to some older centralized version control systems has been the maximum size of the repositories. The general recommendation is to not have git repositories larger than 1GB to preserve performance. Although GitLab has no limit (some repositories in GitLab are over 50GB!) we subscribe to the advice to keep repositories as small as you can.
Not being able to version control large binaries is a big problem for many larger organizations. Video, photo's, audio, compiled binaries and many other types of files are too large. As a workaround, people keep artwork-in-progress in a Dropbox folder and only check in the final result. This results in using outdated files, not having a complete history and the risk of losing work.
In GitLab 7.8 Enterprise Edition this problem is solved by integrating the awesome git-annex. Git-annex allows managing large binaries with git, without checking the contents into git. You check in only a symlink that contains the SHA-1 of the large binary. If you need the large binary you can sync it from the GitLab server over rsync, a very fast file copying tool.
For example, if you want to upload a very large file and check it into your Git repository:
git clone email@example.com:group/project.git git annex init 'My Laptop' # initialize the annex project cp ~/tmp/debian.iso ./ # copy a large file into the current directory git annex add . # add the large file to git annex git commit -am "Added Debian iso" # commit the file meta data git annex sync --content # sync the git repo and large file to the GitLab server
Downloading a single large file is also very simple:
git clone firstname.lastname@example.org:group/project.git git annex sync # sync git branches but not the large file git annex get debian.iso # download the large file
To download all files:
git clone email@example.com:group/project.git git annex sync --content # sync git branches and download all the large files
By integrating git-annex into GitLab it becomes much easier and safer to use. You don't have to setup git-annex on a separate server or add annex remotes to the repository. Git-annex without GitLab gives everyone that can access the server access to the files of all projects. GitLab annex ensures you can only access files of projects you work on (developer, master or owner role).
As far as we know GitLab is the first git repository management solution that integrates git-annex. This is possible because both git-annex and GitLab stay very close to the unix paradigms. Internally GitLab uses GitLab Shell to handle ssh access and this was a great integration point for git-annex. We've added a setting to GitLab Shell so you can disable GitLab Annex support if you don't want it.
Of course we are very thankful for the awesome git-annex software that makes all of this possible. It was mainly written by Joey Hess whose work is sponsored by community fundraisers. It may be interesting to know that git-annex is written in Haskell, making it very fast and reliable.
To use GitLab annex you have to use GitLab Enterprise Edition 7.8 (which will be released on February 22) or GitLab.com. You'll have to use the Git over SSH protocol to connect to your GitLab server instead of the Git over HTTPS protocol. We look forward to feedback and enhancements from the rest of the community. For example, it would be nice if the GitLab UI shows the file size of the large file instead of the symlink. But let's first celebrate this milestone, being able to easily version control your large files!
We made a video of Dmitriy, CTO of GitLab, explaining how git-annex integrates with GitLab Enterprise Edition. The video is raw and unedited.