Locking and git-tf and... danger?

August 28, 2012

If you haven't yet seen or heard about the new git-tf tool fromMicrosoft, this blog post probably won't make any sense. So... go check itout over at gittf.codeplex.com. It's okay, I'll wait.

Okay, got it? Yeah, I know, it's cool, right? Well, it is, but like anysoftware project, there are a few problems. One of them is in the wholegit-tf checkin process and how it needs to take a lock on yourtree before it can continue. I mean, locks are annoying, right? If git-tfrequires a lock to do a checkin, then that's going to fail if somebody elsehas already taken a lock somewhere in that folder. And, of course, thedefault in TFS is to take a lock on binaries, so if you've got some of thosein your tree then git-tf is going to complain all the time when you try tocheckin.

It's even worse when you run into this the first time and decide that wecan't have been so obnoxious as force you to take a lock. Surelythose developers put in some no lock option, maybe even called--no-lock, that will avoid locking. And we did! And with ourappropriately scary warning in the checkin help:

--no-lock Does not take a lock on the server path before committing (dangerous)

...you can see why this is a problem. I mean, it'sdangerous?! Who would want to use that!?

Okay, let's back up and explain why we decide to take this lock.

Team Foundation Server's version control model has good concurrency.Let's assume that you and I are both working on the latest version and weboth edit some files. If we both hit the checkin button at the same time,this is a classic race condition, but one that's harmless. Our checkins getserialized and - assuming we haven't edited any of the same files to producea conflict - it really doesn't matter all that much which one of us winsthis race and which one loses. Both of our changesets got checked in andthe next time we do a get latest, we get the other person's changes, too.Things get a little bit more complicated if we edit the same file, but notmuch. The winner of this race gets their changes checked in - the loser hasto resolve a conflict and then checkin again. This is pretty classicversion control behavior and a well-understood problem.

git, of course, also has good concurrency. Let's assume that you and Iare both on HEAD and we both edit some files. If we both were to commit toour local git repository, now we both have new commits that share a commonparent. At some point, we'll need to merge this branch in the graph, butthis is no big deal. Again, a race condition that doesn't matter.

git-tf, on the other hand, can't cope well with this race condition. (Itmight be a bit much to have called it dangerous, but that littleword scared you enough to read this blog post, so I think we'll stick withthat term for a little while longer.) Here's how git-tf checkin works, in anutshell:

Finds the latest TFS changeset that you've fetched and merged into yourgraph. This is the "high water mark".
Takes a lock on the TFS path you're bridging. Unless, of course, you'reliving dangerously and using that crazy --no-lock flag.
Ensures that this high water mark is actually the latest TFS changeset.If not, you will be told to fetch that changeset and merge.
Pends the changes to TFS that are representative of the changes in yourgit commit and checks them in.
Updates the aforementioned "high water mark" to the changeset that Ijust checked in.

Let's take another look at this race condition once git-tf gets in thepicture. Imagine that I'm using git-tf and you're using some other TFSclient like Visual Studio or the TFS plug-in for Eclipse. The latestversion is changeset 3, and I've made a git commit on top of that. My gitrepository looks like this:

I'm ready to git-tf checkin these changes, and I'll do so without a lock.Meanwhile, you've also made some changes against TFS and you're going tocheck in at the same time. If you happen to sneak your changeset in betweensteps 3 and 4 (above), then I've updated my high-water mark to mychangeset... but I never got your changeset.

In a perfect world, I would pull your changeset down as a git commit suchthat it has a parent of 3, and my git commit 4fac65... has yourchangeset as a parent. This keeps the linear model of TFS changesets thatgit-tf strives so hard for. Except that if I do that, my commit can't be4fac65... anymore, it gets a completely new ID because thoseare computed based on their parents. Ouch.

The way it is, I'll never even see your changes until there's a newchangeset and I can fetch that, since git-tf's high water mark alreadyincludes the latest changeset on the server. So it turns out that notlocking can be dangerous after all.

Of course, most people don't have that much churn in their codetree that this is an issue. So you could just check the history in TFS tomake sure that nobody else snuck with a new changeset in those few seconds.And if they did, of course, you could just stick a new dummy changeset inTFS so that you could pull the latest version that has the other changes,too. But this isn't exactly an ideal workflow.

So the question of course becomes: can we avoid taking a lock, but stayout of danger? We think so. We spun up aconversation over on our codeplex site about this - so if you'reinterested in this topic, we'd love your feedback.