Friday, February 26, 2010

Stupid git tricks

I had a directory in CVS that I used for a catch-all for random scripts. (For example, this is where cpanspec lived before moving it to Sourceforge CVS.) Now that I'm using github, I'm trying to split these scripts up into separate git repos. This is the procedure I've come up with...

First I use git cvsimport to pull in the whole CVS tree:
git cvsimport -d :ext:user@host:/cvsroot -C myscript cvs_module
This will create a directory named myscript. Next, go into that directory and use git filter-branch to remove everything but the file(s) we care about (in this case, myscript again).
git filter-branch --prune-empty --tree-filter 'find -maxdepth 1 -type f \! -name myscript -delete' HEAD
This ends up leaving some stale objects that can be cleaned up by removing everything other than master in .git/refs/heads/, the entire directory .git/refs/original/, and any unrelated tags in .git/refs/tags/ (at least in my example with no branches and such), then cleaning up with a few git commands:
git gc --aggressive
git prune
git repack -a -d

The total number of objects listed by git gc and git repack should be much smaller than the original number git cvsimport reported. (I also confirmed that git fsck --unreachable doesn't find anything.)

[Update] Apparently I had found this answer to my problem a while back and forgot about it. Oops.

No comments: