Friday, February 26, 2010

Stupid git tricks

I had a directory in CVS that I used for a catch-all for random scripts. (For example, this is where cpanspec lived before moving it to Sourceforge CVS.) Now that I'm using github, I'm trying to split these scripts up into separate git repos. This is the procedure I've come up with...

First I use git cvsimport to pull in the whole CVS tree:
git cvsimport -d :ext:user@host:/cvsroot -C myscript cvs_module
This will create a directory named myscript. Next, go into that directory and use git filter-branch to remove everything but the file(s) we care about (in this case, myscript again).
git filter-branch --prune-empty --tree-filter 'find -maxdepth 1 -type f \! -name myscript -delete' HEAD
This ends up leaving some stale objects that can be cleaned up by removing everything other than master in .git/refs/heads/, the entire directory .git/refs/original/, and any unrelated tags in .git/refs/tags/ (at least in my example with no branches and such), then cleaning up with a few git commands:
git gc --aggressive
git prune
git repack -a -d

The total number of objects listed by git gc and git repack should be much smaller than the original number git cvsimport reported. (I also confirmed that git fsck --unreachable doesn't find anything.)

[Update] Apparently I had found this answer to my problem a while back and forgot about it. Oops.

Monday, February 22, 2010

Where do we go next with RAID?

So a friend of mine sent me a link to this blog post.  A couple of things jumped out at me...
When a drive fails in a 7 drive, 2 TB SATA disk RAID 5, you’ll have 6 remaining 2 TB drives. As the RAID controller is reconstructing the data it is very likely it will see an URE. At that point the RAID reconstruction stops.
And later...
RAID proponents assumed that disk failures are independent events, but long experience has shown this is not the case: 1 drive failure means another is much more likely.
That sounds an awful lot like what I've been saying for 8 or 9 years now...  (Well, not specifically about 2TB drives, but you know what I mean...  :-)

So the myth that I've been hearing for the last 15 years or so is that you get speed and data security with RAID 5.  The fact is that the speed of an intact array is terrible, and to use the word "speed" in regards
to a degraded array would be an oxymoron.  Add that to the odds of a failure of one of your "good" drives during a rebuild, and you get one big pile of fail.

The advantage of RAID 5 is capacity.  Period.  Any other RAID solution costs more in terms of raw storage capacity.  RAID 6 gives you one less drive of capacity in exchage for improving your odds of a successful
rebuild, but as you all know, I still don't trust it for anything that we don't have a mirror of somewhere.

We've been doing a lot of RAID 1 and RAID 1+0, which is fine, but ultimately you have the same problem there with likely failures while trying to rebuild an array, but you have the added bonus problem that
errors may go undetected.  They may kill performance, but the checksums on RAID 5 and 6 do give you an added safety net since you can detect corrupted data.

For some of our largest arrays, we've been doing mirrored (or rsync'd) RAID 5 or 6, which, while extraordinarily wasteful in terms of storage space, gives us very good odds of recovery from catastrophic hardware failure.

I have to wonder if the real answer here might ultimately be to add parity to a stripe/mirror set, so that any combination of drive failures in an array of n drives that leaves you with at least (n-2)/2 working
drives is easily recoverable...  (Maybe doing RAID 6 over pairs of mirrored drives would be sufficient.  I have to think on that a bit...)

Thursday, February 18, 2010

Open-Source Point of Sale?

Dear Lazyweb,

I need an open-source POS solution for a client. They have a small cafeteria-type restaurant + gift shop. Currently they are using a craptastic closed-source commercial solution that offers no support despite requiring a huge service contract.

Lots of bonus points for something web-based, since the POS terminals they have are rather low-end.

FWIW, we've tried the following:
Posterita comes the closest to being what we want (web-based, AJAX-y, etc.), but development has gone closed-source apparently. OpenBravo POS is probably the most functional, but it's difficult to figure out how to do much of anything with it. OFBiz has a nice, simple POS app, but it's horribly buggy (and rather slow too).

Given an infinite amount of free time, I'd probably hack on the Adempiere + Posterita (last open-source release) combo, but, well, time is not on my side here...