Mirroring Git Repositories

July 22, 2019

One of the unique features about a DVCS - like Git - is that it gives you portability of your repository. Unlike a centralized version control system, where you get started by doing a checkout or a get latest of the remote code, with Git you do a clone. This name is carefully chosen: you are actually getting a full copy of the repository, with all the history, the branches and the tags.

As a result of this, it's very easy to move or copy a repostiory from one place to another. For example, you might want to do the early development of your project privately, then open source it on GitHub when you're ready to release.

A dog and a mirror

And with simple cases, it is easy. You can git clone --mirror to get a clone of a remote repository with all the information, then take that and git push --mirror it to another location. The problem with that, though, is that git clone --mirror does too good a job and gets too much information.

First, some background: Git stores information about the structure of your repository in data structures called references. You may have even seen this before: branches are stored in references named refs/heads/<branchname>. Tags are stored in references named refs/tags/<tagname>. And notes are stored in the refs/notes namespace. And git, when you mirror, tries to clone and then push all of these references since, obviously, transferring them is the point of the mirror.

Here's the problem: your hosting provider also stores information in references. For example, both GitHub and Azure Repos store information about pull requests in read-only references. This is great in the general case, because it lets you download pull requests locally to review them, build them and debug them. But it's frustrating when you want to mirror, because these special references are read-only.

That means that if you just naively git pull --mirror from one GitHub repository, and then try to git push --mirror to another repository, then your push will show a lot of errors about how you can't push those private, read-only references that are custom to GitHub.

Instead, you can pull just the references that you care about: in particular, the branches (refs/heads), tags (refs/tags) and notes (refs/notes). By selecting only these sets of references, you won't clone the private, read-only references, and you won't try to push them back up to the other repository.

Here's a script that can help. I call it mirror.sh:

To mirror some remote repository <source> over to some other remote repository <target>, you can just run mirror.sh <source> <target>.