Locking and git-tf and... danger?

August 28, 2012

If you haven't yet seen or heard about the new git-tf tool from Microsoft, this blog post probably won't make any sense. So... go check it out over at gittf.codeplex.com. It's okay, I'll wait.

Okay, got it? Yeah, I know, it's cool, right? Well, it is, but like any software project, there are a few problems. One of them is in the whole git-tf checkin process and how it needs to take a lock on your tree before it can continue. I mean, locks are annoying, right? If git-tf requires a lock to do a checkin, then that's going to fail if somebody else has already taken a lock somewhere in that folder. And, of course, the default in TFS is to take a lock on binaries, so if you've got some of those in your tree then git-tf is going to complain all the time when you try to checkin.

It's even worse when you run into this the first time and decide that we can't have been so obnoxious as force you to take a lock. Surely those developers put in some no lock option, maybe even called --no-lock, that will avoid locking. And we did! And with our appropriately scary warning in the checkin help:

--no-lock Does not take a lock on the server path before committing (dangerous)

...you can see why this is a problem. I mean, it's dangerous?! Who would want to use that!?

Okay, let's back up and explain why we decide to take this lock.

Team Foundation Server's version control model has good concurrency. Let's assume that you and I are both working on the latest version and we both edit some files. If we both hit the checkin button at the same time, this is a classic race condition, but one that's harmless. Our checkins get serialized and - assuming we haven't edited any of the same files to produce a conflict - it really doesn't matter all that much which one of us wins this race and which one loses. Both of our changesets got checked in and the next time we do a get latest, we get the other person's changes, too. Things get a little bit more complicated if we edit the same file, but not much. The winner of this race gets their changes checked in - the loser has to resolve a conflict and then checkin again. This is pretty classic version control behavior and a well-understood problem.

git, of course, also has good concurrency. Let's assume that you and I are both on HEAD and we both edit some files. If we both were to commit to our local git repository, now we both have new commits that share a common parent. At some point, we'll need to merge this branch in the graph, but this is no big deal. Again, a race condition that doesn't matter.

git-tf, on the other hand, can't cope well with this race condition. (It might be a bit much to have called it dangerous, but that little word scared you enough to read this blog post, so I think we'll stick with that term for a little while longer.) Here's how git-tf checkin works, in a nutshell:

Finds the latest TFS changeset that you've fetched and merged into your graph. This is the "high water mark".
Takes a lock on the TFS path you're bridging. Unless, of course, you're living dangerously and using that crazy --no-lock flag.
Ensures that this high water mark is actually the latest TFS changeset. If not, you will be told to fetch that changeset and merge.
Pends the changes to TFS that are representative of the changes in your git commit and checks them in.
Updates the aforementioned "high water mark" to the changeset that I just checked in.

Let's take another look at this race condition once git-tf gets in the picture. Imagine that I'm using git-tf and you're using some other TFS client like Visual Studio or the TFS plug-in for Eclipse. The latest version is changeset 3, and I've made a git commit on top of that. My git repository looks like this:

I'm ready to git-tf checkin these changes, and I'll do so without a lock. Meanwhile, you've also made some changes against TFS and you're going to check in at the same time. If you happen to sneak your changeset in between steps 3 and 4 (above), then I've updated my high-water mark to my changeset... but I never got your changeset.

In a perfect world, I would pull your changeset down as a git commit such that it has a parent of 3, and my git commit 4fac65... has your changeset as a parent. This keeps the linear model of TFS changesets that git-tf strives so hard for. Except that if I do that, my commit can't be 4fac65... anymore, it gets a completely new ID because those are computed based on their parents. Ouch.

The way it is, I'll never even see your changes until there's a new changeset and I can fetch that, since git-tf's high water mark already includes the latest changeset on the server. So it turns out that not locking can be dangerous after all.

Of course, most people don't have that much churn in their code tree that this is an issue. So you could just check the history in TFS to make sure that nobody else snuck with a new changeset in those few seconds. And if they did, of course, you could just stick a new dummy changeset in TFS so that you could pull the latest version that has the other changes, too. But this isn't exactly an ideal workflow.

So the question of course becomes: can we avoid taking a lock, but stay out of danger? We think so. We spun up a conversation over on our codeplex site about this - so if you're interested in this topic, we'd love your feedback.