Blog Open Source The contributions we made to the Git 2.43 release
January 11, 2024
4 min read

The contributions we made to the Git 2.43 release

Git 2.43 included some improvements from GitLab's Git team. Here are some highlights from the work the team has done on Git and why it matters.

coding-cover

Git 2.43 was officially released on November 20, 2023, and included some improvements from GitLab's Git team. Here are some highlights from the work our team has done on Git and why it matters.

Segmenting objects across packfiles

In Git 2.43, Christian Couder added a --filter option to the git repack command. Supported filter (see the filter-spec docs) can be added to the git repack --filter flag. This will cause the filtered out objects to be packed into a separate packfile.

A --filter-to option was also added. Providing this option will cause Git to write the filtered packfile to the specified location on the filesystem.

Why it matters

Gitaly servers host Git repositories and incur storage costs. In many repositories however, not all the objects need to be accessed all the time. Allowing Git to offload some repository data onto a different packfile paves the way for storage optimizations whereby we can choose to segment the Git repository data and place certain kinds of objects on cheaper storage such as slower disks or object storage.

Checking object existence

In Git, to check the existence of an object one would have to rely on Git returning an error if it couldn’t find an object. However, to date, there has not been a generic way in Git to check the existence of an object. There were certain edge cases that were not handled well by the underlying Git code. For example, if a reference exists as a symbolic reference, but its target branch does not exist.

Patrick Steinhardt added the --exists option to git show as a generic way to check for object existence.

Why it matters

The Gitaly team has started work to upstream the reftable backend into the Git project. This new flag enables consistent validation of object existence to fix a number of tests to work with the reftables backend.

Find missing commit objects

git rev-list's --missing option provides information about objects that are referenced but are missing from a repository. Up to this release however, this option only worked with blobs and trees. Missing commits would cause git rev-list to fail with a fatal error.

In Git 2.43, Karthik Nayak extended the --missing option to work with commit objects.

Why it matters

Gitaly's next-generation repository replication implementation relies on a write ahead log (WAL) that logs every write to a repository.

The upcoming WAL creates separate log entries per transaction – as such, some transactions contain reference updates. In these transactions, it is necessary to identify new git objects being added to the repository. The WAL implementation uses a quarantine directory to stage these new objects.

We can now use git-rev-list(1) along with the --missing flag, to identify all the objects that are newly added and required and also boundary commits that connect the quarantine directory to the main object directory.

Read gitattributes from HEAD in bare repos

Starting in 2.43, John Cai made a change that allows Git attributes to start to read attributes from the tree that HEAD points to by default, in bare repositories.

Why it matters

To reduce some tech debt around how git attributes are read in a repository, we added the ability to pass a tree object directly to Git through the --attr-source flag.

Passing in HEAD to --attr-source would fail however, when HEAD pointed to and unborn branch, Gitaly would have needed to use a separate call to check if HEAD were unborn before passing it in.

This change not only causes Git to read attributes from HEAD by default, which means we don't need to pass in anything, but also silently ignores it if HEAD is unborn, which is the behavior we want in Gitaly. This way, we don't need to make any code changes in Gitaly for this to work.

This leads to simplification on the Gitaly side, as we seek to remove some technical debt around gitattributes put in during a time when Git lacked support around reading gitattributes in bare repositories.

Bug fixes

Patrick Steinhardt fixed a bug in git rev-list –stdin.

Steinhardt also addressed an existing issue in commit-graphs whereby commits parsed from the commit-graph weren’t always checked for existence. A GIT_COMMIT_GRAPH_PARANOIA environment variable can now be turned on to always check for object existence.

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert