Mirroring Git Repositories
One of the unique features about a DVCS - like Git - is that it gives you portability of your repository. Unlike a centralized version control system, where you get started by doing a checkout or a get latest of the remote code, with Git you do a clone. This name is carefully chosen: you are actually getting a full copy of the repository, with all the history, the branches and the tags.
As a result of this, it's very easy to move or copy a repostiory from one place to another. For example, you might want to do the early development of your project privately, then open source it on GitHub when you're ready to release.
And with simple cases, it is easy. You can git clone --mirror
to get
a clone of a remote repository with all the information, then take that
and git push --mirror
it to another location. The problem with that,
though, is that git clone --mirror
does too good a job and gets too
much information.
First, some background: Git stores information about the structure of your
repository in data structures called references. You may have even
seen this before: branches are stored in references named
refs/heads/<branchname>
. Tags are stored in references named
refs/tags/<tagname>
. And notes are stored in the refs/notes
namespace. And git, when you mirror, tries to clone and then push all
of these references since, obviously, transferring them is the point of
the mirror.
Here's the problem: your hosting provider also stores information in references. For example, both GitHub and Azure Repos store information about pull requests in read-only references. This is great in the general case, because it lets you download pull requests locally to review them, build them and debug them. But it's frustrating when you want to mirror, because these special references are read-only.
That means that if you just naively git pull --mirror
from one GitHub
repository, and then try to git push --mirror
to another repository,
then your push will show a lot of errors about how you can't push those
private, read-only references that are custom to GitHub.
Instead, you can pull just the references that you care about: in
particular, the branches (refs/heads
), tags (refs/tags
) and notes
(refs/notes
). By selecting only these sets of references, you won't
clone the private, read-only references, and you won't try to push them
back up to the other repository.
Here's a script that can help. I call it mirror.sh
:
To mirror some remote repository <source>
over to some other remote
repository <target>
, you can just run mirror.sh <source> <target>
.