Celebrating Git's 20th anniversary with creator Linus Torvalds

The Git version control system was first released on April 7, 2005, by the father of the Linux kernel, Linus Torvalds. To mark the 20th anniversary of this important project that is nowadays used by almost every single developer, I interviewed Linus about the history of Git, why he handed over maintainership of Git, and what he considers to be its most important milestones.

In 2005, you were already the maintainer of the thriving Linux kernel. Why did you decide to start a new version control system?

So, I got into it from really despising version control.

I had used the traditional version control systems (CVS/RCS/SCCS) both as an end user (i.e., tracking open source projects like GCC) and as a developer (we used CVS at Transmeta for everything) and absolutely hated the experience with a passion.

And yes, back then most projects that used CVS had probably moved to SVN, but honestly, I always felt that SVN was just "lipstick on a pig." It was just CVS in another form, with some UI improvements, but none of the fundamentals fixed, and a few new problems added.

The problems with CVS and its ilk are too many to even list, and, happily, they have largely become irrelevant and younger developers have probably never even had to deal with any of it. I absolutely refused to deal with it for the kernel, even though a few subsystems (notably the networking side) were actually using CVS to track their code back in the '90s.

Anyway, back then I lived in the Bay Area, and Larry McVoy, who I knew from other projects (mainly lmbench), had started BitMover, which had a new version control model called BitKeeper, or BK, for short.

BK wasn't open source, but Larry liked open source projects and really felt that the lack of version control was holding the kernel back. He wasn't wrong, but the traditional source code managers (SCMs) really didn't work for me at all. Larry spent some time showing me and David Miller (networking maintainer and existing CVS user) what BitKeeper could do.

BK wasn't perfect, and it was based on Source Code Control System (SCCS) like so many other traditional SCMs were, and thus had the same broken "history per file" model that everybody else had, and that causes huge and fundamental issues with file renaming and deletion.

But BK also wasn't just that "lipstick" thing. It may have used SCCS at a low level, but on a higher level it fixed some really fundamental things, and did proper distributed development, and had a real global – not per-file – history that made merging code from different trees actually work.

With CVS, creating branches and merging them was something you had to plan and discuss with people, and were major events. With BK, every repository was a branch. We take that for granted now, and Git obviously took it much further by having many branches per repository, but even the much more limited BK model was really a big deal at the time.

Again, BK wasn't perfect. As mentioned, it did do per-file history, which really is a big fundamental problem that makes renaming and file merging simply not work reliably, and inevitably causes chaos and pain (for CVS people, think Attic, shudder). And it had some scalability issues, too, but those took a while to become more than a bit problematic.

But the biggest problem with BK was the licensing, and while over the years (we used BK from 2002 to 2005) a lot of kernel maintainers did end up switching over to it, it was always a bit of a friction point. And that friction came to a head in late 2004, and the use of BK for the kernel basically became untenable a few months later.

I was in the situation that for three years I'd finally used source control that worked, and it really had solved a lot of problems. There was no way I was going back to the days before source control, but in the years we'd been using BK, nothing better had really come out of the open source community.

Sure, people knew that CVS and SVN didn't work well, and there were projects that tried alternate approaches, but some of those approaches were even worse (basically amounting to "fancy patch tracking"), or had some good ideas but in the process making up some entirely new horrible design mistakes (Monotone).

So, I looked around for a while, and decided that I didn't have any options – I had to write my own.

Now, technically, it actually did take only a few days to make the first version of Git, and hey, it's all there in the Git commit history. It's easy enough to see how it goes from pretty much zero to being usable enough that I started applying patches from others a week later (and being actively used for the kernel a few days after that).

But that ignores the fact that I had been thinking about the problem for a while by then. Writing code is easy. Getting a good design is what matters. So there was a fair amount of background to those few days that is pretty important, and that part doesn't show up in the history.

And hey, that first version was very, very rough, and didn't do a lot that was to come later. But you can definitely already see much of the core design in those first few days.

Can you give us a short recount of the first days and weeks of how the Git project was started?

I had basically decided that I will stop kernel development until I had an alternative that worked for me. The main goals were to be distributed and high performance, and be something you could absolutely rely on to catch any corruption.

But I really do want to stress that I wasn't interested in SCMs, per se. I was interested in the end result, not in the process. So Git was never like the kernel for me: I do Linux because I think kernels are interesting - I did Git because I had to.

Which then directly segues into your next question.

You handed over the maintainership of Git to Junio Hamano after a couple of months, and Junio is still the maintainer. Why did you hand over maintainership and what made you pick Junio?

Handing over maintainership was not a hard choice. It was very much: "The moment somebody else comes along that I can trust to keep it going, I'll go back to doing just the kernel."

Which is not to say that I just threw things over the wall and prayed for the best. I ended up maintaining Git for something like four months because I felt I needed to find somebody who would stick around, and had that hard-to-explain quality of "GoodTaste"(TM).

Junio had been one of the very early people involved (he literally showed up the first week of development), but it's not like I just said, "Tag, you're it." It takes a while to see who sticks around, and who writes code and makes decisions that make sense.

And I think Junio has been exemplary. I get much too much credit for the few months I spent on Git - particularly in light of the 20th anniversary. I'll take credit for getting the core design right, and getting the project started, but it really is Junio who has led the project (not to belittle the hundreds of other people involved, but still).

The initial version of the Mercurial version control system was released only 12 days after the initial version of Git, on April 19, 2005. Many people claim that Mercurial's user experience was superior over Git's, but nowadays Git is significantly more popular. Why do you think that Git has won over Mercurial?

Oh, a big part of it is obviously just network effects, and SCMs have very strong network effects. It's why CVS survived as long as it did despite its limitations.

So, the fact that the kernel used Git (and then at some point it got to be very popular in the Ruby on Rails community, and then it took off everywhere).

But I really do think that the design of Git is superior. The core model is both very simple and very powerful, and I think that made it easier to translate into other environments. JGit was an early example of that, but you obviously have implementations like the MSgit virtual filesystem, etc.

And while Git was famously somewhat hard to use early on, I really do think that some of that comes from having done things "right," where people coming from other environments found Git non-intuitive because Git really did a few hard decisions that a traditional SCM person would never have done.

The Git project has not stood still since you handed maintainership over to Junio, and its community is always busy working on new features. What do you think the most important milestones were after you have left the project?

That's really hard for me to say, mainly because I obviously made Git work for me, and so the things I use have worked from pretty much Day One. Just as an obvious example: Making Git work on Windows was obviously a huge step for other people, but it affected me not at all ;)

There's obviously all the infrastructure within Git itself to make it a lot easier to use, but I think most of the big milestones have all been around people taking the Git infrastructure and building things around it. Those often end up feeding back into Git features, of course, but, at the same time, the milestone is about something external.

To give an obvious example: All the big Git hosting sites were big milestones. Making Git be distributed was what made those so much easier to do, but the milestone was how then the hosting made it so easy for users to use Git for various projects.

If you had the capacity to work on Git full time again, would there be anything that you would like to implement?

Absolutely not. Git did everything I really needed from very early on – my use is actually fairly limited, and I only really care about one project.

And I say "absolutely not" because I refer you to that earlier answer: I was never really interested in SCMs at all to begin with. I think a large reason for why Git ended up being so different - mostly in good ways - from other SCMs was that I approached it more like I would a distributed journaling filesystem, not really a traditional SCM.

Is there any feature or design decision in Git that you have come to regret in retrospect?

Design decisions? No. I still think the high-level design is just very good, and you can discuss various Git concepts without ever getting into the nitty-gritty complexity of actual implementation.

And I think that's important in a project. You need a certain high-level design principle to guide the conceptual direction of a project.

Sometimes people take that too far, and think that the high-level design means that the implementation must then slavishly follow some core principle. And that's wrong, too – the implementation will have lots of nasty corner cases because reality is hard and people want odd things, but there needs to be some kind of top-level design that you can point to and reason about at a high level before you get your hands dirty with the nasty reality.

And I think Git has a good balance of that. A very straightforward object store design (call them "structured Merkle trees" if you are a CS person, or you might just think of them as a "content addressable storage" if you are a filesystem person). That core design is there – but at the same time, it's realistically just a very tiny part of the actual code. Most of the code is about all the things you can do with the core design, but that basic clarity of design still gives the project some kind of high-level structure.

It's the same kind of high-level structure that Unix itself had, whether you said "everything is a file" or you were talking about process handling. There are a few "concepts" that drive the design, but then 99% of the code is about the ugly harsh details of what you build on top of that to make it all useful in the real world.

I have two mantras in technology: "If I have seen further, it is by standing on the shoulders of giants" (Newton) and "Genius is 1% inspiration and 99% perspiration" (Edison).

But talking about the 99% perspiration: While I am very happy with the big design, there are certainly various details that I would have done differently if I were to do Git today.

But honestly, they aren't that important. What's much more important is all the good details that have been done over the last two decades.

The Linux kernel has started to use Rust as a programming language for some of its subsystems. Do you think it makes sense to start using such newer programming languages like this in Git?

I suspect that when it comes to Git, there's less reason to try to mix languages, which is always somewhat painful.

In the kernel, the end result is one single kernel binary – even if much of it can be loaded dynamically as modules, it is still linked together into effectively one single binary.

And that makes using multiple languages more complex. But, on the other hand, the kernel also has more reason to worry about memory safety and, thus, look at newer languages.

In Git, if somebody wants to write parts of it in Rust or another language, I suspect it makes much more sense to just go for a separate implementation rather than try to mix languages in one binary.

Much of the Git core ideas are simple enough that just having parallel implementations of the core likely isn't too painful, and then you can target particular problem spaces where a different language makes more sense.

And we've seen that in Git already, of course: That's exactly what JGit is. The use of a different language was due to a different web-based environment where that language choice was much more natural.

I know that there are already Rust implementations of some of the core Git functionality, and I think the situation is similar: I suspect they make more sense in specific situations than in some kind of overall "let's convert things to Rust" kind of way.

So for anybody who is interested in implementing things in Rust, I'd suggest looking for target areas where the advantages of Rust are more obvious. I don't think C has actually been all that problematic in the standard Git source base.

New version control systems are popping up every couple of years. Do you think that Git will stay relevant in the future?

I already mentioned the network effects in SCMs, and I think that means that to replace Git you have to be not just slightly better, you have to be enormously better. Or so compatible that you effectively are just a new implementation of Git.

And I do think the SCM situation has changed – Git doesn't have the kinds of huge gaping fundamental problems that SCMs had before Git. So being "enormously better" is fairly hard.

So, yes, I would expect Git to stay relevant for the foreseeable future, with people working on improvements around Git rather than replacements.

Note: This interview has been edited for length and clarity.

Celebrating Git's 20th anniversary with creator Linus Torvalds

Learn more about Git

More to explore

What's new in Git 2.49.0?

How to use OCI images as the source of truth for continuous delivery

What’s new in Git 2.48.0?

We want to hear from you

Ready to get started?

Pricing

Celebrating Git's 20th anniversary with creator Linus Torvalds

Learn more about Git

Sign up for GitLab’s newsletter

More to explore

What's new in Git 2.49.0?

How to use OCI images as the source of truth for continuous delivery

What’s new in Git 2.48.0?

We want to hear from you

Ready to get started?