Introducing git-recover

June 15, 2017

I'm old enough to remember the old Norton UNERASE command: it was part of the old Norton Utilities for MS-DOS. It made clever use of the FAT filesystem to find files that were recently deleted, show them to you and let you undelete them.

git-recover brings that idea to your Git repositories.

Every time you add a version of a file to your git repository - that is to say, every time you run git add - Git will put a copy of that file in its object database. That means that if you accidentally delete a file that you were working on, if you ever ran git add on it, you can probably recover it.

Tell me more

I thought you'd never ask!

(Wait, you didn't? If you really aren't interested in the nitty gritty of how Git manages the index, then I guess you can skip this section. But who isn't interested in that!?!)

Git's index is a "staging area" that will become the next commit. If you recall from my discussion about how Git works, a commit in Git is a snapshot of the entire repository at a single point in time. And the index is also a snapshot: it contains a list of all the files in the repository that will make up the next commit.

You can see this if you look at the index, and Git provides a tool to do just that: git ls-files --stage. When I've just cloned a repository:

% git clone /tmp/foo_repo .
% git ls-files --stage
100644 6af0abcdfc7822d5f87315af1bb3367484ee3c0c 0   foo.txt

And when I add a new file to this repository, I can inspect the index again, and will see the new file:

% git add bar.txt
% git ls-files --stage
100644 ce013625030ba8dba906f756967f9e9ca394464a 0   bar.txt
100644 6af0abcdfc7822d5f87315af1bb3367484ee3c0c 0   foo.txt

Note that the entry for bar.txt contains the object ID of the file. When you run git add, Git actually adds the file to its object database, and takes the resulting object ID (the SHA-1 hash of the file) and places that in the index.

You can see the file on disk - Git has added it to the repository as a loose object:

% ls -Flas .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
4 -r--r--r--  1 ethomson  staff  21 14 Jun 23:58 .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a

So Git has prepared this new file for our commit. But what if we don't commit this file? What if, instead, we git rm it? Or if we make some more changes to bar.txt and add those instead?

% echo "different changes" > bar.txt
% git add bar.txt

Now we've overwritten our original changes to bar.txt:

% git ls-files --stage
100644 4a95512212b2f24397fe2df5a2554935bd0a032a 0   bar.txt
100644 6af0abcdfc7822d5f87315af1bb3367484ee3c0c 0   foo.txt

You can see that the object ID for bar.txt had changed - reflecting our new file. But what's happened to the original file we added? Where is object ce01362?

It's still in our object database:

% ls -Flas .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
4 -r--r--r--  1 ethomson  staff  21 14 Jun 23:58 .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a

But we never committed it, so this object is not pointed to by any commit in the graph. Nor is it in our index anymore. This unreference blob is "garbage" and - eventually - Git will garbage collect it.

But until it does, we can recover it!

Using git-recover

The simplest way to use git-recover is to use it in interactive mode: just run git recover -i. It will show you the first few lines of each "orphaned" file - those that were once git added to the repository but were never committed - and let you recover them (or not).

% git recover -i
Recoverable orphaned git blobs:

61c2562a7b851b69596f0bcad1d8f54c400be977  (Thu 15 Jun 2017 12:20:22 CEST)
> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
> tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
> veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
> commodo consequat. Duis aute irure dolor in reprehenderit in voluptate

Recover this file? [y,n,v,f,q,?]: 

You can also run git-recover without any arguments, and it will show you all the "orphaned" blobs that you can recover. You can then inspect an object to decide if it's something that you're interested in (using git show).

% git recover
Recoverable orphaned git blobs:

61c2562a7b851b69596f0bcad1d8f54c400be977  Thu 15 Jun 2017 12:20:22 CEST

When you find the object that you want to recover, you can run git-recover <objectid> to pull it out of the object database and write it to disk.

You can specify the filename to write with the (optional) -f flag:

% git recover 61c2562 -f greeking.txt
Writing 61c2562: greeking.txt.

Specifying the filename is helpful, because you may have rules set up in your .gitattributes file on a per-file or per-file extension basis. Using the -f flag will make sure that these rules are executed.

How to get it

git-recover is a shell script - you can just download it and go.

When you put git-recover in your PATH, then it becomes a proper git command, and you can run git recover (notice the space instead of the dash).

Please open an issue or a pull request if you have problems or improvements.