The Importantance of Good Revision Control

I spent five years as a developer at a company that used Visual SourceSafe. This was not back in the 90's, nor even the early 2000's. This was a span from 2006 to 2011. For those who don't know, Visual Source Safe is less revision control, and more like a shared network directory with a locking mechanism. Any branching and merging features it has are broken to the point of being unusable, and it thus makes working in large groups nearly impossible. How then, did I work for five years with such a system? Simple. I didn't know any better. I do now.

Revision control was something that came into being out of a necessity to be able to return to a known point in development history. In this sense, it is more of a continous backup mechanism than a way of sharing code between a team of developers. Being viewed in this way, revision control is rarely looked at as a technical implementation of a project, but rather as a policy process within an organization. I have come to disagree with this point of view, as revision control can be a limiting factor on not only a team's method of development, but also on it's ability to maintain a codebase.

Consider my current team's problem:

We have a legacy codebase that is in desperate need of some refactoring, and we have a good idea of the type of architecture we want to end up with. However, our current revision control is set up more to serve the needs of the business unit than the needs of us developers. Our branches are based on features of individual projects, and are only recommitted into the trunk of the codebase when these features are released. There is no branch for general maintenance, and there are no intermediate branches between projects and trunk. This gives the business unit the ability to decided the order in which projects go live, but prevents the developers from sharing code between branches until a project is released. Indirectly and unintentionally, this means that the business unit is responsible for how and where any refactorings take place. Not only that, but this also means that any refactorings done in one branch may take up to six months or so to reach another branch. Meanwhile, we are referring to this as "agile" development.

If revision control were merely a policy problem, then it would hold no bearing on design decisions. The problem with my team's current codebase is that it is implemented almost entirely at the UI layer, and therefore has multiple implementations of what should be shared functionality. With our current method of revision control, it is slow and difficult to refactor common code into a service layer. Doing so requires that the refactorings be done in the branch for a specific project. It is then the business unit that decides when that project reaches the trunk, and can then be shared by the other project branches. With this system, architecture control is taken out of the hands of the programmers, and put into the hands of the business unit.

Why is this the fault of the revision control software? Because the revision control software dictates how easy it is to do branching a merging. My current team uses Subversion, which on the whole is fairly good software. It supports branching and merging, but it doesn't allow it to be done on the cheap. Creating a new branch in Subversion requires copying all the code on the shared repository, and downloading all of that code to the client machines. Switching between branches also requires a great deal of network traffic, and can be quite costly in terms of time. This means that programmers are less likely to create short-lived branches, and more likely to put multiple changes into a single commit. With so much overhead needed in order to use branching, we end up using a single branch per project, and committing to that branch with large changesets. Our revision control software now limits our branching and our own wishes for a branching/merging strategy are supersceded by the requirements of the business unit, which we have no choice to fulfill.

What this all means, is that our choice of revision control software has an indirect but very real impact on our architectural decisions. We are serving the software more than the software is serving us. What we need is a way to fulfill the requirements of the business unit without sacraficing design decisions within our own codebase. If Subversion can't meet that requirement, than my team needs to find something that can. Enter git.

Git is revision control system that created a few years ago by none other than Linus Torvalds of Linux fame. The kernel needed a new revision control system, and since nobody could agree on any of the existing solutions, Linus wrote one of his own. It's worth taking a little time to think about this. Linus Torvalds is the top maintainer in what is probably the largest open source project, at least in terms of number of contributors. He maintains an ever-growing codebase of software that is used by all levels of industry and goverment, and merges changes from thousands of developers. I can think of no person who has ever had a larger revision control problem than Linus Torvalds. Necessity breeds invention, and so Linus created git. The end result has completely changed the way open source projects are maintained, and the way developers think about revision control. Now git is the de facto revision control system for open source projects, and has made huge inways into the proprietary world as well.

The best part about git, in my opinion, is that branching is cheap. It takes very little storage to create a new branch, very little time to switch to it, and very little time to merge it back into another branch. This means that developers are more likely to create small, short term branches for their changes. Since committing changes are done locally in git (without losing individual commits when re-merged with remote branches), it also means that developers are more likely to keep their commits small.

Now let's look what this does for my teams current refactoring problem. Since branches are so cheap, we can maintain branches that are just as the business unit requires, as well as branches that involve system-wide refactoring changes. These branches can be merged into eachother often, since it is such a quick operation, and therfore projects can share code between each other before being released for production. Meanwhile, our trunk can stay identical to the current production code, and merges of various projects can happen in a staging branch. Now we are free to develop as we like, without spending so much time worrying about the needs of the business unit.

Now that I have had greater experience with different revision control systems, I look back on my use of SourceSafe for all those years. I wonder how many of our development problems back then were caused by use of a poor revision control system. Did it ever effect our design decisions? Was our productivity lowered? Did it make it easier for us to write bad code? Harder to write good code? I can't answer any of these questions with assurance, but I tend to think I could probably answer all in the affrimative if I thought about it long enough. I have since that time learned that revision control is not just a solution to a process problem, but a technical implementation problem. A solution should be chosen based on technical needs, rather than just what provides us with backups.