Do you really need Git? Here’s why you should reconsider Subversion
If you’re a small team creating a single product and want to make fast progress, you don’t need the distraction of managing a multiplicity of small Git projects
For years I thought Git to be the only obvious choice for source control, because everyone said it was. Then I did my own research and found out it’s not as obvious of a choice as it seems.
Like many small teams, we have a need to be productive independently. There’s a strong drive to create modules (when possible, microservices), that allows a team to make a lot of progress without stepping on each others’ toes.
This got me googling, and I found an interesting post resulting in a fascinating discussion about monorepos vs project repos. Here’s what I discovered.
The average team doesn’t need Git. Git merging sucks, and few people know how to use Git well (but nobody wants to admit they don’t know how to use Git).
The immediate impulse in most teams is to create lots of small Git repos. This is the way we know it’s done in open source. And, it makes sense for many purposes of open source, for example when you want to reuse code across a large array of projects with decoupled release cycles. The problem is, it makes a lot less sense for an organization that’s working on a single product.
Separate project repos are not conducive to keeping all your projects in sync. This is why we have things like npm package manager and maven (I shudder at the sound of “maven”. Those who, like me, are Java survivors perhaps can relate). And why we spend countless hours debating things like semver, etc.
When you’re working in a small team on a single product, and want to make progress instead of spending 50% of your time debating the structure of your project and managing versioning of your various internal dependencies, it just doesn’t make sense to have a multiplicity of independent repos. But we do it anyway, because it’s fun to create lots of repos (makes me feel like I’m accomplishing a lot), and also:
Git merging sucks, and few people know how to use Git well (but nobody wants to admit they don’t know how to use Git).
I remember my days of innocence, when Subversion was a perfectly fine choice for a version control system. I long for those days.
It turns out, Subversion is still a fine choice. In fact, it’s a better choice today it was back then. I found out, there’s just a bunch of myths about Subversion, and few good reason why you shouldn’t use it.
And there’s actually few good reasons you need DVCS unless you’re a nerd who has nothing better to do than to create a bunch of atomic commit and branches on your repo as you’re flying or in some remote nook of the earth.
This is the conclusion I came to. If you want to dive deeper into this conversation, you can read on for some highlights of what other people have said.
In On Monolithic Repositories, the author points out that there are principled and practical reasons why people argue for separate repos (but none of them are very good reasons)
Gregory Szorc: The principled camp will say that separate repositories constitute a loosely coupled (dare I say service oriented) architecture that maps better to how software is consumed, assembled, and deployed and that erecting barriers in the form of separate repositories deliberately enforces this architecture. I agree. However, you can still maintain a loosely coupled architecture with monolithic repositories. The Subversion model of checking out a single tree from a larger repository proves this. Furthermore, I would say architecture decisions should be enforced by people (via code review, etc), not via version control repository topology. I believe this principled argument against monolithic repositories to be rather weak.
From Hacker News, some good points. Git is not a good tool for subprojects:
You’re running into problems because you have multiple people all trying to push to the same branch, and Git errs on the side of extreme caution rather than creating a merge that you didn’t specifically request. If you use merge requests instead, as supported by Github/Gitlab, you don’t get blocked by other people’s commits.
(Actual merge conflicts are an orthogonal problem. They’ll happen whenever you have multiple people editing the same code, regardless of what VCS you use or how it’s organized.)
The Linux kernel example is a bit of a tricky one. Yes, kernel.org hosts a lot of repos, but they’re all different versions of the same codebase. They have a common commit ancestry, and commits get merged from one to another. So they’re not really separate modules in the sense that we’re talking about; they’re more like namespaces that each contain a set of branches.
On dealing with a large git repo:
I guess it comes down to organisational style. Using merge requests requires some benevolent dictator (poor sod?) to perform the merges. For us, if your commit is approved in Gerrit, it’s your task to merge it. If some other commit was merged in between, you have to rebase, possibly solving merge conflicts. So guess that with a monolithic repo, that wouldn’t scale, but it could be made to scale by appointing someone to perform the merge requests. I’m not sure I would like to be that person…
If using git, branches are your friend, and merge requests can help you use git more effectively:
You’re running into problems because you have multiple people all trying to push to the same branch, and Git errs on the side of extreme caution rather than creating a merge that you didn’t specifically request. If you use merge requests instead, as supported by Github/Gitlab, you don’t get blocked by other people’s commits.
Subversion doesn’t meet your needs? How about Perforce?
We use Perforce (since before DVCSs came along) and follow the monolithic approach with 100s of projects. We use git and mercurial when interfacing to clients’ repositories and for small, local temporary repos to structure work that shouldn’t clutter our main repository.
The monolithic approach works so smoothly because Perforce makes it trivial to only check out the directories we’re working on. It has huge advantages when we’re working on interdependent projects.
Perforce does have some historical oddities (top level directories are called “depots” and are slightly different than normal directories), but the ability to branch, merge, and check out using normal filesystem concepts is a huge usability boon.
Many repos approach doesn’t make sense for startups:
I agree with the author… In the context of startups that are very much in “discovery” mode (still doing big changes to their product) breaking the repositories in subparts is just a source of confusion for the tech lead. I would also argue it is bad for devs because they then need to manage version-binding between different repos. Many repos approach is only justifiable if you have many teams with many tech leads each independently doing releases.
Note: the views expressed in this article are my own views and do not represent the views of my employer. However, we’re always on the lookout for great engineers who want to make a difference.