Edward Thomson

(Re)introducing git-dad

June 18, 2017  •  8:18 PM

After I dropped the git-recover script, @MordodeMaru asked me on Twitter if we could have a git dad command to help you out when you're in a jam:

But when I started thinking about my stepdad and his banter with his coworkers, I thought that if a git dad command was really going to help you out, it would help you when you mistyped the git add command. It could bring you a bit of levity… a dad joke!

And what better day than Father's Day to make that happen!

After I dropped the git-recover script, @MordodeMaru asked me on Twitter if we could have a git dad command to help you out when you're in a jam:

But when I started thinking about my stepdad and his banter with his coworkers, I thought that if a git dad command was really going to help you out, it would help you when you mistyped the git add command. It could bring you a bit of levity… a dad joke!

And what better day than Father's Day to make that happen:

Now when you mistype git add as git dad, it will still add your file to the index, but it will also give you the prize of a dad joke.

All you have to do is grab git-dad and put it in your PATH.

On Dad Jokes and Calculus

I'd love to claim credit for this wonderful addition to the Git ecosystem, but just as I was getting ready to publish this, I did a quick search for "git dad" and I realized that Tim Petterson had already come up with the idea.

And, honestly, I would like to claim that I just happened to have the same idea. That this was totally independent discovery, like Calculus (and almost as important a contribution to humanity). But the truth is that I probably heard him talking about it. Perhaps it was in his awesome talk at Git Merge about aliases this year. Anyway, I'm sure that somewhere I got the idea from him and it stuck in my head, lying dormant until it was resurrected on Twitter.

But why would we need a second version of git dad? Surely one is enough.

You'll notice that this solution is a bit different than his solution, though. If you have an alias that starts with a bang (!) it will execute a non-Git command. (Normally, a Git alias just invokes another Git command; starting your alias with a ! allows you to invoke any command.)

But, if you have an alias that runs a non-Git command, then the alias will only be executed from the root of the repository's working directory. So an alias for !git add will work from the root of the repository's working directory, but not if you're inside some folder beneath that.

Using a script instead of an alias will solve this problem.

I did like his idea of using icanhazdadjoke.com instead of hardcoding some dad jokes. It's a bit slower than if they were hardcoded, but let's face it, that extra time spent is totally worth it to have a fresh, neverending supply.

Happy father's day!

Introducing git-recover

June 15, 2017  •  2:04 PM

I'm old enough to remember the old Norton UNERASE command: it was part of the old Norton Utilities for MS-DOS. It made clever use of the FAT filesystem to find files that were recently deleted, show them to you and let you undelete them.

git-recover brings that idea to your Git repositories.

Every time you add a version of a file to your git repository - that is to say, every time you run git add - Git will put a copy of that file in its object database. That means that if you accidentally delete a file that you were working on, if you ever ran git add on it, you can probably recover it.

Tell me more

I thought you'd never ask!

(Wait, you didn't? If you really aren't interested in the nitty gritty of how Git manages the index, then I guess you can skip this section. But who isn't interested in that!?!)

Git's index is a "staging area" that will become the next commit. If you recall from my discussion about how Git works, a commit in Git is a snapshot of the entire repository at a single point in time. And the index is also a snapshot: it contains a list of all the files in the repository that will make up the next commit.

You can see this if you look at the index, and Git provides a tool to do just that: git ls-files --stage. When I've just cloned a repository:

% git clone /tmp/foo_repo .
% git ls-files --stage
100644 6af0abcdfc7822d5f87315af1bb3367484ee3c0c 0   foo.txt

And when I add a new file to this repository, I can inspect the index again, and will see the new file:

% git add bar.txt
% git ls-files --stage
100644 ce013625030ba8dba906f756967f9e9ca394464a 0   bar.txt
100644 6af0abcdfc7822d5f87315af1bb3367484ee3c0c 0   foo.txt

Note that the entry for bar.txt contains the object ID of the file. When you run git add, Git actually adds the file to its object database, and takes the resulting object ID (the SHA-1 hash of the file) and places that in the index.

You can see the file on disk - Git has added it to the repository as a loose object:

% ls -Flas .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
4 -r--r--r--  1 ethomson  staff  21 14 Jun 23:58 .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a

So Git has prepared this new file for our commit. But what if we don't commit this file? What if, instead, we git rm it? Or if we make some more changes to bar.txt and add those instead?

% echo "different changes" > bar.txt
% git add bar.txt

Now we've overwritten our original changes to bar.txt:

% git ls-files --stage
100644 4a95512212b2f24397fe2df5a2554935bd0a032a 0   bar.txt
100644 6af0abcdfc7822d5f87315af1bb3367484ee3c0c 0   foo.txt

You can see that the object ID for bar.txt had changed - reflecting our new file. But what's happened to the original file we added? Where is object ce01362?

It's still in our object database:

% ls -Flas .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
4 -r--r--r--  1 ethomson  staff  21 14 Jun 23:58 .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a

But we never committed it, so this object is not pointed to by any commit in the graph. Nor is it in our index anymore. This unreference blob is "garbage" and - eventually - Git will garbage collect it.

But until it does, we can recover it!

Using git-recover

The simplest way to use git-recover is to use it in interactive mode: just run git recover -i. It will show you the first few lines of each "orphaned" file - those that were once git added to the repository but were never committed - and let you recover them (or not).

% git recover -i
Recoverable orphaned git blobs:

61c2562a7b851b69596f0bcad1d8f54c400be977  (Thu 15 Jun 2017 12:20:22 CEST)
> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
> tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
> veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
> commodo consequat. Duis aute irure dolor in reprehenderit in voluptate

Recover this file? [y,n,v,f,q,?]: 

You can also run git-recover without any arguments, and it will show you all the "orphaned" blobs that you can recover. You can then inspect an object to decide if it's something that you're interested in (using git show).

% git recover
Recoverable orphaned git blobs:

61c2562a7b851b69596f0bcad1d8f54c400be977  Thu 15 Jun 2017 12:20:22 CEST

When you find the object that you want to recover, you can run git-recover <objectid> to pull it out of the object database and write it to disk.

You can specify the filename to write with the (optional) -f flag:

% git recover 61c2562 -f greeking.txt
Writing 61c2562: greeking.txt.

Specifying the filename is helpful, because you may have rules set up in your .gitattributes file on a per-file or per-file extension basis. Using the -f flag will make sure that these rules are executed.

How to get it

git-recover is a shell script - you can just download it and go.

When you put git-recover in your PATH, then it becomes a proper git command, and you can run git recover (notice the space instead of the dash).

Please open an issue or a pull request if you have problems or improvements.

Git Conditional Includes

June 6, 2017  •  1:29 PM

One of the features that slipped quietly into Git 2.13 is the notion of "conditional includes" for configuration files, and it's one of my favorite new features. Although it's really simple, it's also extremely important for people who work both on Enterprise software and open source projects.

When I work on my open source projects, I end up pushing them to GitHub - and I have to authenticate when I do, so GitHub knows that I'm ethomson@edwardthomson.com. And when I'm working at my day job, I push my repositories to Visual Studio Team Services - and I have to authenticate there, too. So VSTS knows that I'm ethomson@microsoft.com.

But since Git is a distributed version control system, there's no authentication when I'm working in my local repository. When I commit, my changes go into the repository on my computer. That means that I need to tell Git who I am - my name and my email address - and Git will dutifully record that information the commits as I create them.

The problem here is that now I'm on my own to manage the settings for my name and email address - and I want to make sure that I keep my day job separate from the work I do in open source. I don't want to use my work email address on my open source projects because I don't want anybody to run git log and think that it's a project that's sponsored by my employer. And if I ever change jobs, the email address in the repository's history will stop working.

More importantly, I don't want to use my personal email address in our corporate Git history. People tend to get a little jumpy when they see that somebody has checked in code and their email address doesn't end in microsoft.com. And although my personal address is pretty boring, I imagine this is especially important for people who's email addresses are a bit more "creative". Otherwise, you may show up at work on monday morning with a clever new nickname. (I'm talking to you, reeferman42@example.com.)

So I try to make sure to keep my professional work separate from my personal work.

Before Git 2.13 introduced conditional includes, I would have had to set up my corporate email address in my global Git configuration, and then remember to change it in any new repository where I want to use my personal email address. Unfortunately, it's really easy to forget to do this when all you want to do is clone a repository and quickly create a quick pull request. That means that it's easy to accidentally end up pushing a change with the wrong identity.

With conditional configuration includes, you have a lot more control over how your Git configuration is applied. You can set up some configuration to be applied based on the directory that you're in, so it's much easier to set up.

I keep all my libgit2-related repositories in one directory: C:\LibGit2. This is where libgit2, LibGit2Sharp, Rugged and the like live; all my other open source projects live in C:\Projects. Every other repository on my machine is work-related. So my global Git configuration (C:\Users\ethomson\.gitconfig on my Windows machine, or /Users/ethomson/.gitconfig on my Mac) looks like:

[user]
    name = Edward Thomson
    email = ethomson@microsoft.com
[includeIf "gitdir:C:/LibGit2/"]
    path = .gitconfig.oss
[includeIf "gitdir:C:/Projects/"]
    path = .gitconfig.oss

And I have a second file in my home directory, .gitconfig.oss that specifies only the configuration that I want to use when I'm hacking on open source projects, like libgit2:

[user]
    email = ethomson@edwardthomson.com

Note that the trailing slash is required, and on Windows, you need to specify your path using forward slashes - not backslashes. I make sure to keep these updated in my dotfiles repository, so that I won't forget them on a new machine.

Now when I'm working on a project that lives beneath C:\Projects, like my little hexdump utility that lives in C:\Projects\HexDump - or when I'm working on LibGit2Sharp, which lives in C:\LibGit2\LibGit2Sharp, then Git will set my identity as ethomson@edwardthomson.com.

For everything else - like the Visual Studio Team Services repository - it will use my work email address, ethomson@microsoft.com, ensuring that nobody at work gives me an embarassing nickname.

At least, if they do, it won't be because of my email address.

Managing Dotfiles with Git

March 28, 2017  •  7:15 PM

Professional cooks have a term: "mise en place", which translates literally to "everything in its place". In a kitchen, it means that your station is prepared and well-organized; all the ingredients for every dish that you cook are prepared and set out in front of you so that they're ready to use. But perhaps most importantly: your station is clean, because professional cooks have another phrase:

Messy station equals messy mind. Clean station equals clean mind.

This idea extends far beyond the kitchen and - yes - even into software engineering and IT. Don't believe me? Take a look at your computer… Unless you're excruciatingly well-organized, you've probably got a junk folder somewhere. I call mine "Temp" on the very optimistic idea that I'm just throwing things in there for a little while until I find the right place for them.

But the reality is that most things never leave; they just accumulate.

Just like in a kitchen, all this crud and all these little annoyances take a mental tax and make you less productive. That's why one of my guilty pleasures is to "repave" my machine, reinstalling everything onto a freshly formatted drive with no trace of the old detritus, then carefully reinstalling only the apps that I'm interested in and copying over only the data that's worth keeping.

I try to do this every year or two, both to my main personal machine and my main work machine. And it turns out that I just finished this up, since last week I had a little free time on my hands:

At a high level, this process is pretty straightforward: I back up my home directory (twice), format the drive, reinstall the operating system, and carefully copy the contents of my home directory back, tidying as I go. On the whole, this is pretty tedious, except for one set of files: my dotfiles.

Years of working with large networks of Unix machines has taught me to version control my dotfiles so that I can get up and running on any new machine quickly. I keep my dotfiles checked in to a git repository, except for the truly important ones - the ones that I need to keep secure, like my SSH keys - which I keep with me.

SSH Keys

When I'm setting up a new workstation, I start by copying over my SSH keys since I'll actually need them in order to clone the rest of my dotfiles. (I skip this step if I'm setting up a new account on a remote machine, since I'll just use SSH Agent Forwarding to keep my keys on my local machine.)

I keep my most secure bits on a USB key that I keep with me. Most USB keys have some sort of attachment for a keyring - hence the name. Maybe it's a little lanyard, or a metal ring, or even a plastic tab of some sort.

USB keys that will inevitably fall off your keychain
These little keychain hooks are all going to break and your SSH keys will be left behind somewhere.

But just because you can attach these to a keyring doesn't mean that you should attach these to a keyring. That little hook is going to fall apart over time as you pull your keys out of your pocket or they jostle around in your purse, and one day your USB key isn't going to be attached to your keyring anymore. And then you've lost a copy of your SSH private key.

What I use is a USB key that is made from a hunk of aluminum:

A USB Key that might not fall apart
My keychain will fall apart before this USB key does. This one was made by Lacie, but they discontinued it a few years ago. Now Samsung produces one and there's a generic one as well.

This is much, much less likely to fall off my keychain unexpectedly. Even with that in mind, of course, I still turn on full-disk encryption to protect me in case I accidentally leave my keys laying around after a night out at the pub.

This particular key is an encrypted HFS+ partition for Mac OS, but I have a separate key for setting up new Windows machines, which is encrypted with Bitlocker To Go. (But more on that later.)

The Rest of the Dotfiles

After copying over my SSH public and private key into my home directory, I run a script that clones my dotfiles repository from my personal Visual Studio Team Services account. (I use VSTS for private repositories and GitHub for all my open source repositories.) I keep this on my USB key so that it's handy:

#!/bin/sh
git clone --separate-git-dir=.dotfiles.git ethomson.visualstudio.com:DefaultCollection/personal/_git/dotfiles .
rm .git
echo '*' > .dotfiles.git/info/exclude

This script is pretty simple, but there are a few odd things going on:

  1. It uses the --separate-git-dir option to git clone. It does this so that my git repository in my home directory is called .dotfiles.git, instead of the usual name, .git. This is important so that when I run git somewhere in my home directory it doesn't accidentally do work inside my dotfiles repository.

    In particular, I might be somewhere beneath my home directory - say in ~/src/my_new_project - and run git status. If I haven't yet inited a repository for my new project, then I'll actually see the status for the git repository in my home directory that controls my dotfiles. This is confusing at best; it's best to make this a separate git directory instead.

  2. This removes the .git file that git clone creates. When I clone with a separate-git-dir, git will helpfully set up a .git file that points to that separate git dir so that future invocations of git can find it. This would be useful if my goal was to put the git directory on another device but still have it work transparently as if it were a normal git repository.

    That transparency is exactly the behavior I don't want, though. By removing this file, I'll get the isolation that I wanted in step 1, but I'll have to specify the git-dir every time I run git to work on my dotfiles. (More on that below.)

  3. This sets up an info/exclude in the dotfiles git repository with a * wildcard. This means that all untracked files in my home directory will be ignored. As a result, I will have to explicitly add new files to my dotfiles repository. Without this, git status would show all the files underneath my home directory which is very noisy and - worse - makes it very easy to accidentally add things like my SSH keys.

    I could also accomplish this with a .gitignore, but then I've got a .gitignore hanging out in my home directory and the whole point of repaving my machine was to clean up the unnecessary cruft in my home directory.

Managing Dotfiles

As a consequence of using a separate git directory for my dotfiles, I can't simply run git commands - like git status - in my home directory. If I do, git will tell me that it can't find a repository:

fatal: Not a git repository (or any of the parent directories): .git

Instead, I need to explicitly pass the --git-dir option to git so that it can find my repository. The easiest way to do that is to set up an alias in my shell's startup configuration.

I have this in my .zshrc:

alias dotfiles="git --git-dir=$HOME/.dotfiles.git"

Now instead of running git status, I run dotfiles status:

On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   .zshrc

no changes added to commit (use "git add" and/or "git commit -a")

And it's straightforward to dotfiles add .zshrc, dotfiles commit and finally dotfiles push to get my changes back up to my hosted git repository. I simply replace git with dotfiles in any command I want to run when I want to work on my dotfiles repository.

The only thing to remember is that - by virtue of my info/excludes above - I'm ignoring all the files in my home directory that aren't already part of the repository. This means that when I create a new file in my home directory, git ignores its presence:

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

This is generally what I want; most of my home directory probably shouldn't be under version control or pushed to every computer that I use. But when I do create a file that I actually want to push to my dotfiles repository, it's easy for me to add:

dotfiles add -f .dotfile

The -f flag is necessary to override the info/excludes file. If I forget it, git will remind me. Once the file is staged, now git will begin tracking it, and dotfiles status will show:

On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   .dotfile

This straightforward setup lets me get my home directory configured on a new machine quickly and easily, and if I make any changes on that machine, I can share them with my home directory on every computer just by pushing and pulling.

The Git Contributor Summit

February 5, 2017  •  5:20 PM

The annual Git Merge conference just wrapped up, and it was another exciting year. As always, the speakers were excellent, the training was informative, and the after-party was a blast. But my favorite part was a part of Git Merge that most people don't see: the Git Contributor Summit.

The Git Contributor Summit is held in conjunction with Git Merge, and it allows the people who work on the git project, and related projects like libgit2 and jgit, to get together and talk shop.

The most important part of the Git Contributor Summit, I think, is the face-to-face time that it enables. This is a rare pleasure for most of us; the git contributors tend to chat on the mailing list, the libgit2 contributors talk in GitHub issues or in our Slack room, and the jgit contributors… well, I don't contribute to jgit, but I assume they have all the same problems that the rest of us do where we start to lack the personal touch.

And the personal touch is critical. When I first started working on the libgit2 project, I suggested bringing in some dependency on another project… and the maintainer wasn't enthused by this idea. In fact, he hated it. He hated the idea so much that he threatened to fly to the United States and murder my family if I actually merged that change.

Needless to say: we didn't really get off on the right foot.

But not too long after that, we met at a Contributor Summit and I discovered that he's actually a lovely, friendly person… he just happens to have a bit of an odd sense of humor. And so I knew that the next time he threatened to murder my family that he was just kidding.

Of course, it's not all just social time. Jeff King is part of the team of three maintainers who see to the actual business of the git project and he kicks off the contributor summit with that business. He catches us up on the health of the project as an entity and its budgetary and legal concerns.

But primarily we talk about the technical issues around git: the problems we have, and how we intend to fix them. Often these are discussions about scaling git in various directions, but we also talk about making git easier to use and, perhaps more importantly, easier to contribute to.

The amazing part about this is that we do this working together, despite having different interests and different needs. There are employees from git hosting companies, hackers who work on git in their spare time, and people who maintain git in their organization, whether that's a school, a small website or one of the world's largest news organizations.

And the hosting companies like GitHub, GitLab, Atlassian, and Microsoft all sit down together to talk about the problems they have in common and the ways they've each solved them, and they do it politely and without a hint of competition. There's no bragging about success or laughing at failures, just working together for the benefit of all our users.

It's an honor to be able to sit down with all the Git Contributors and work together on these solutions, and something I look forward to every year. Thanks so much to the organizers of Git Merge, and to the organizers and attendees of the Git Contributor Summit, for making it happen.