Jul 3, 2018 - Robert Speicher    

How we solved GitLab's CHANGELOG conflict crisis

How we eliminated changelog-related merge conflicts and automated a crucial part of our release process.

Since its very first commit more than six years ago, GitLab has had a changelog detailing the noteworthy changes in each release. Shortly after Enterprise Edition (EE) was introduced, it got a changelog of its own. Whenever anyone – whether it was a community contributor or a GitLab employee – contributed a new feature or fix to the project, a changelog entry would be added to let users know what improved.

As GitLab gained in popularity and started receiving more contributions, we'd constantly see merge conflicts in the changelog when multiple merge requests attempted to add an entry to the list. This quickly became a major source of delays in development, as contributors would have to rebase their branch in order to resolve the conflicts.

This post outlines how we completely eliminated changelog-related merge conflicts, removed bottlenecks for contributions, and automated a crucial part of our release process.

At the beginning, GitLab's CHANGELOG file would look something like this:

v 8.0.0 (unreleased)
  - Prevent anchors from being hidden by header (Stan Hu)
  - Remove satellites
  - Better performance for web editor (switched from satellites to rugged)
  - Faster merge
  - ...
  - Ability to fetch merge requests from refs/merge-requests/:id

v 7.14.1
  - Improve abuse reports management from admin area
  - Ability to enable SSL verification for Webhooks

v 7.14.0
  - Fix bug where non-project members of the target project could set labels on new merge requests.
  - Upgrade gitlab_git to 7.2.14 to ignore CRLFs in .gitmodules (Stan Hu)
  - ...
  - Fix broken code import and display error messages if something went wrong with creating project (Stan Hu)

When a developer made a change in the upcoming release, 8.0.0 in this example, they would add a changelog entry at the bottom:

diff --git a/CHANGELOG b/CHANGELOG
index de2066f..0fc2c18 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -5,6 +5,7 @@ v 8.0.0 (unreleased)
   - Faster merge
   - ...
   - Ability to fetch merge requests from refs/merge-requests/:id
+  - Made literally everything better. Evvvvverything!

 v 7.14.1
   - Improve abuse reports management from admin area

At the same time, another developer might have made a similar change in their branch:

diff --git a/CHANGELOG b/CHANGELOG
index de2066f..5f81cfd 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -5,6 +5,7 @@ v 8.0.0 (unreleased)
   - Faster merge
   - ...
   - Ability to fetch merge requests from refs/merge-requests/:id
+  - Made a few things worse. Woops!

 v 7.14.1
   - Improve abuse reports management from admin area

Now when one branch was merged, it'd create a conflict in the other:

diff --cc CHANGELOG
index 5f81cfd,0fc2c18..0000000
--- a/CHANGELOG
+++ b/CHANGELOG
@@@ -5,7 -5,7 +5,11 @@@ v 8.0.0 (unreleased
    - Faster merge
    - ...
    - Ability to fetch merge requests from refs/merge-requests/:id
++<<<<<<< HEAD
 +  - Made a few things worse. Woops!
++=======
+   - Made literally everything better. Evvvvverything!
++>>>>>>> developer-1

  v 7.14.1
    - Improve abuse reports management from admin area

This resulted in a ton of wasted time as something would get merged, and then every other open branch adding a changelog entry would need to be rebased. The situation only got worse as the number of contributors to GitLab grew over time.

Our initial, boring solution to the problem was to begin adding empty placeholder entries at the beginning of each monthly release cycle. The changelog for the upcoming unreleased version might look like this:

v8.1.0 (unreleased)
  -
  -
  -
  -
  -
  -
  -
  - (and so on)

A developer would make their change and then choose a random spot in the list to add a changelog entry. This worked for a while, until the placeholders began to be filled out as we got closer to the release date. Eventually two (or more) merge requests would attempt to add different entries at the same placeholder, and one being merged created a conflict in the others.

The problem was lessened, but not solved.

Not only was this a huge waste of time for developers, it created an additional headache for release managers when they cherry-picked a commit into a stable branch for a patch release. If the commit included a changelog entry, which any change intended for a patch release should have, cherry-picking that commit would bring in the contents of the changelog at the point of that commit, often including dozens of unrelated changes. The release manager would have to manually remove the unrelated entries, often doing this multiple times per release. This was compounded when we had to release multiple patch versions at once due to a security issue.

Brainstorming solutions

Frustrations with the process finally reached a tipping point, and an issue was created to discuss a solution. Yorick had the original idea that would ultimately form the foundation of our solution. During a trip around the world, myself, Douwe, and Marin were in Brooklyn, NY, and during a walk around the city one beautiful summer evening we ended up with a proposal to finally solve the problem.

Each changelog entry would be its own YAML file in a CHANGELOG/unreleased folder. When a release manager went to cherry-pick a merge into a stable branch in preparation for a release, they'd use a custom script that would perform the cherry-pick and then move any changelog entry added by that action to a version-specific subfolder, such as CHANGELOG/8.9.4. At the time of release, any entries in the version's subfolder would be compiled into a single Markdown changelog file, and then deleted.

With an idea of where we wanted to end up but no idea how to get there, I started with a spike.

A turning point

After a few days of working on the spike, I had a realization that we didn't need the cherry-picking concept at all:

Cherry picking a merge commit into a stable branch will add that merge's CHANGELOG/unreleased/whatever-its-called.yml file to the stable branch. Upon tagging a release with release-tools, we can consider everything in that stable branch's "unreleased" folder as part of the tagged release. We collect those files, compile them to Markdown, remove them from the stable branch and master, and that's our changelog for the release.

This was a major "aha" moment, as it greatly simplified the workflow for release managers. They could continue their existing workflow, and the release flow would transparently handle the rest. It also meant we could handle everything in our release-tools project, which is responsible for tagging a release and kicking off our packaging.

Even though we ended up not using a lot of the work that went into it, my original spike was still valuable. It allowed us to see pain points early on, refine the process, and find a better solution. It also gave me additional experience interacting with Git repositories programmatically via Rugged, and that would go on to be especially useful as we implemented the final tooling.

Building the building blocks

We knew there were several components that we'd need to build:

  1. Something to read and represent the individual YAML data files
  2. Something to compile individual entries into a Markdown list
  3. Something to insert the compiled Markdown into the correct spot in an existing list of releases
  4. Something to remove the files that had been compiled, and then commit the updated CHANGELOG.md file to the repository

All of these components were created in a single merge request and refined through several code review cycles. The commits listed there are all fairly atomic and may be interesting to read through on their own. The code review that happened in the merge request was incredibly valuable, and allowed us to really simplify some code that was hard to wrap one's head around, even for me as the original author!

Automated testing

Of course, we wouldn't consider this solution complete until we had automated tests guaranteeing the behavior and consistency of the automated compilation, including reading from and writing to multiple branches across multiple repositories.

I ended up using Rugged to create fixture repositories that would create a repeatable testing environment, which we could then verify with custom RSpec matchers.

Hooking into the release process

At this point we were fairly confident the changelog compilation worked, so it was time to hook it into our existing release process.

While testing this integration on a real release, we uncovered a pretty hilarious (but dangerous) oversight. I'll let the commit that fixed it speak for itself:

Protect against deleting everything when there are no changelog entries

On a stable branch with no changelog entry files, the resulting empty array was passed to Rugged::Index#remove_all which, when given an empty array, removes everything. This was not ideal.

Developer tooling

The final pieces of the puzzle were creating a tool to help developers create valid changelog entries easily, and adding documentation. Both were handled in this merge request.

This tool allows developers to run bin/changelog, passing it the title of their change, to generate a valid changelog entry file. Additional options are in the documentation.

Future plans

This changelog process has worked beautifully for us since it was introduced, and we know it might be just as useful to other projects. We're investigating a way to make it more generic so that it can remove a tedious chore for more developers.

I worked on this project as part of our Edge team, now known as the Quality team. If you're interested in this kind of internal tooling or other automation, we're hiring! Check out our open positions.

Photo by Patrick Tomasso on Unsplash

Install GitLab in 2 minutes

With Ubuntu, Debian, CentOS, openSUSE, and Raspbian packages or from source

Install GitLab Now