Monday, February 22, 2010

Where do we go next with RAID?

So a friend of mine sent me a link to this blog post.  A couple of things jumped out at me...
When a drive fails in a 7 drive, 2 TB SATA disk RAID 5, you’ll have 6 remaining 2 TB drives. As the RAID controller is reconstructing the data it is very likely it will see an URE. At that point the RAID reconstruction stops.
And later...
RAID proponents assumed that disk failures are independent events, but long experience has shown this is not the case: 1 drive failure means another is much more likely.
That sounds an awful lot like what I've been saying for 8 or 9 years now...  (Well, not specifically about 2TB drives, but you know what I mean...  :-)

So the myth that I've been hearing for the last 15 years or so is that you get speed and data security with RAID 5.  The fact is that the speed of an intact array is terrible, and to use the word "speed" in regards
to a degraded array would be an oxymoron.  Add that to the odds of a failure of one of your "good" drives during a rebuild, and you get one big pile of fail.

The advantage of RAID 5 is capacity.  Period.  Any other RAID solution costs more in terms of raw storage capacity.  RAID 6 gives you one less drive of capacity in exchage for improving your odds of a successful
rebuild, but as you all know, I still don't trust it for anything that we don't have a mirror of somewhere.

We've been doing a lot of RAID 1 and RAID 1+0, which is fine, but ultimately you have the same problem there with likely failures while trying to rebuild an array, but you have the added bonus problem that
errors may go undetected.  They may kill performance, but the checksums on RAID 5 and 6 do give you an added safety net since you can detect corrupted data.

For some of our largest arrays, we've been doing mirrored (or rsync'd) RAID 5 or 6, which, while extraordinarily wasteful in terms of storage space, gives us very good odds of recovery from catastrophic hardware failure.

I have to wonder if the real answer here might ultimately be to add parity to a stripe/mirror set, so that any combination of drive failures in an array of n drives that leaves you with at least (n-2)/2 working
drives is easily recoverable...  (Maybe doing RAID 6 over pairs of mirrored drives would be sufficient.  I have to think on that a bit...)


Stephen Smoogen said...

Raid 51 and Raid 61 are done in some places.. but the rebuild times seem to go up quite a bit (I think its quadratic). One really crazy guy on IRC said he did Raid 66 but I am not sure how true that was.

I think at a certain point, we will have to go to something like NUMA disk arrays where you put your disk data all over and it gets stored many many times on solid state stuff.

cowbutt said...

I've been thinking the same way; RAID5 and 6 are only for mostly-quiescent bulk data; archived log files, multimedia files etc.

I use RAID1 at home and RAID10 at work. Remember that both these RAID levels have checksums too; over every block, at the drive level. If a block experiences corruption, the drive will throw a read error and the RAID controller or layer should attempt to read the block from another device in the array.

pbrobinson said...

in the enterprise environment I work in we tend to use RAID-6 with a remote block based sync to a remote site (in case the site is lost). The linux tool that does this is, We also use a 'enterprise' storage product that has the equivalent as well. I know the formerly Sun zfs team were also looking at adding an option of a third parity disk (raid-7?) as well as a means of mitigating further the probability of failures during a disk rebuild·