Advent Day 8: Git-LFS

December 8, 2018

This is day 8 of my Git Tips and Tricks Advent Calendar. If you want to see the whole list of tips as they're published, see the index.

Distributed Version Control Systems like Git are revolutionary: instead of having to communicate with the server to rollback to a previous version, to view changes introduced in a particular version, or even make a commit, you can do it all locally. That's because when you clone a git repository, you get every version of every file that's ever been checked in. That's why it's called cloning a repository, you get a complete copy from the server.

This is a huge productivity gain for working with versions of source code and text files and ... a terrible pain if you want to check in large files like images, movies or audio into your repository. That's because when you clone a repository, you'll get every copy of those large binaries.

Consider a PNG image that's seemingly not too big - maybe only 100 KB. That's not so big, until you have 100 revisions of that file: now you're pulling down an extra 10 MB on every clone to get that history.

Worse still, some hosting providers limit the storage space that your Git repository can use: GitHub, for example, doesn't let you check in files larger than 100 MB.

The solution to this is the Git Large File Storage extension.

When you use Git LFS, you can still have large files in your project, and you can still manage them with Git. But instead of adding large files directly to the repository, you'll add them to a separate Git LFS storage area. What's added to the Git repository is a "stub" file or a "pointer" file that describes where to find the actual binary in that Git LFS storage area.

version https://git-lfs.github.com/spec/v1
oid sha256:7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730
size 4

This is helpful because when you clone a repository, you'll get all the stub files and none of the binaries. When you check out a particular version with Git LFS, it will download the binaries from the server's Git LFS storage on-demand. So you'll only have to download the versions of the large files that you actually need, and not previous versions.

(Some of the old versions may be cached on your local disk so that you can move back and forth between branches without having to download the LFS files repeatedly.)

This is also good for the hosting providers: they can store your Git repository on a storage media that's optimized for high-I/O throughput, while putting your LFS data in blob storage that's optimized for large file reads.

Thankfully, this is all mostly transparent. Once you've installed Git LFS, you can work with files just like you normally do. And hosting providers like GitHub and Azure Repos will load your LFS data to show you your repository contents exactly as you'd expect.

The only caveat is that you need to set up Git LFS for large files before you add them to your repository. Since Git history is immutable, you'll still have the large files in history (unless you take the trouble to rewrite it). So make sure you set up Git LFS before you start adding lots of large files.

Git LFS isn't critical if you're just checking in one version of a large file that never changes (after all, you probably need that version to build or run your application), but if you're checking in multiple versions of images, videos or audio files, Git LFS is a lifesaver.