Git 2.43 was officially released on November 20, 2023, and included some improvements from GitLab's Git team. Here are some highlights from the work our team has done on Git and why it matters.
Segmenting objects across packfiles
In Git 2.43, Christian Couder
added a --filter
option to the git repack
command. Supported filter (see the
filter-spec docs) can be added to the git repack --filter
flag. This will cause the filtered out objects to be
packed into a separate packfile.
A --filter-to
option was also added. Providing this option will cause Git to write the filtered packfile to the specified location on the filesystem.
Why it matters
Gitaly servers host Git repositories and incur storage costs. In many repositories however, not all the objects need to be accessed all the time. Allowing Git to offload some repository data onto a different packfile paves the way for storage optimizations whereby we can choose to segment the Git repository data and place certain kinds of objects on cheaper storage such as slower disks or object storage.
Checking object existence
In Git, to check the existence of an object one would have to rely on Git returning an error if it couldn’t find an object. However, to date, there has not been a generic way in Git to check the existence of an object. There were certain edge cases that were not handled well by the underlying Git code. For example, if a reference exists as a symbolic reference, but its target branch does not exist.
Patrick Steinhardt added the --exists
option to git show
as a generic way to check for object existence.
Why it matters
The Gitaly team has started work to upstream the reftable backend into the Git project. This new flag enables consistent validation of object existence to fix a number of tests to work with the reftables backend.
Find missing commit objects
git rev-list
's --missing
option provides information about objects that are referenced but are missing from a repository. Up to this release however, this option only worked with blobs and trees. Missing commits would cause git rev-list
to fail with a fatal error.
In Git 2.43, Karthik Nayak
extended the --missing
option to work with commit objects.
Why it matters
Gitaly's next-generation repository replication implementation relies on a write ahead log (WAL) that logs every write to a repository.
The upcoming WAL creates separate log entries per transaction – as such, some transactions contain reference updates. In these transactions, it is necessary to identify new git objects being added to the repository. The WAL implementation uses a quarantine directory to stage these new objects.
We can now use git-rev-list(1) along with the --missing flag, to identify all the objects that are newly added and required and also boundary commits that connect the quarantine directory to the main object directory.
Read gitattributes from HEAD in bare repos
Starting in 2.43, John Cai made a change that allows Git attributes to start to read attributes from the tree that HEAD points to by default, in bare repositories.
Why it matters
To reduce some tech debt around how git attributes are read in a repository, we added the ability to pass a tree object directly to Git through the --attr-source
flag.
Passing in HEAD
to --attr-source
would fail however, when HEAD
pointed to and unborn branch, Gitaly would have needed to use a separate call to check if HEAD
were unborn before passing it in.
This change not only causes Git to read attributes from HEAD
by default, which means we don't need to pass in anything, but also silently ignores it if HEAD
is unborn, which is the behavior we want in Gitaly. This way, we don't need to make any code changes in Gitaly for this to work.
This leads to simplification on the Gitaly side, as we seek to remove some technical debt around gitattributes put in during a time when Git lacked support around reading gitattributes in bare repositories.
Bug fixes
Patrick Steinhardt fixed a bug in git rev-list –stdin
.
Steinhardt also addressed an existing issue in commit-graphs whereby commits parsed from the commit-graph weren’t always checked for existence. A GIT_COMMIT_GRAPH_PARANOIA
environment variable can now be turned on to always check for object existence.