Re: RAIDing LUNs in a SAN



Mick wrote:
Hi,

I'm trying to get a basic heads-up on the best approach to distributing
the disks in a SAN to various servers. Hopefully you'll excuse the
newbie-ish nature of all this :-)

To demonstrate with a theoretical example:
If I was trying to divide a 10 disk SAN between 3 servers...2 of which
need RAID 5 arrays and the last requiring RAID 1 I could:

a) allocate 3 disks to server 1, 3 to Server 2 & 4 to server 3 and then
RAID these accordingly

or

b) convert 6 disks to RAID 5, 4 to RAID 1 and then split the RAID 5
array into 2 chunks (1 for each of the first two servers)

It seems to me that *if* both of these are possibl approaches then (b)
gives more usable space to each of the first two servers.
(5/6*diskspace each, rather than 2/3*diskspace)

Does that make sense?

It may, for the specific example you provide: 3-disk RAID-5 sets waste an unreasonable amount of space, especially given the compromise in performance which they often entail when compared with mirroring, and the difference in likelihood of a second whole-disk failure using 6 disks rather than 3 should be relatively negligible (i.e., if you can't afford the risk of using a 6-disk array, you quite possibly can't afford it using a 3-disk array).

For larger arrays, it might not be a good trade-off. E.g., creating a 50-disk (or even a 20-disk) RAID-5 array and splitting it up would entail what many people would consider to be unacceptable risk of a second whole-disk failure (even when failures are strictly uncorrelated, which, as Thor pointed out, may not be the case). Besides, if you can afford to run a non-negligible risk of data loss anyway, you need to start to question your need for RAID at all.

Furthermore, sharing a single array between multiple servers may maximize total throughput (if one server's load is lighter, the other gets the benefit of more disks to spread its own load across) but also couples each server's performance to the others' load. Sometimes, that's what you want; others, it isn't.

Finally, as was mentioned recently in another thread here today's disk sizes carry with them non-negligible risk that in the process of rebuilding a failed disk you'll encounter an unreadable sector on one of the survivors, resulting in limited data loss (probably 'only' in a single file: for some applications this is an entirely tolerable risk to run, e.g., if most files could easily be restored from a backup and the array will continue with the rest of the rebuild rather than throw a fit if it can't restore one sector; for others, *any* loss could be catastrophic).

To use your own numbers, the chance that a disk will fail over the 5-year nominal service life of a 6-disk RAID-5 array varies from under 20% (using 1.4 million hour MTBF drives) to about 40% (using 600,000 hour MTBF drives) - though proactive replacement based on, e.g., S.M.A.R.T. logs might improve those odds. Obviously, the chance that a second disk will experience an uncorrelated failure during the brief rebuild interval is extremely small (though, once again, not all failure modes are uncorrelated).

But the 5 survivors could contain from 370 GB to 2.5 TB of data, all of which must be used to reconstruct the failed disk. If the uncorrectable error rate is 1 per 10^14 bits (about 1 in 10 TB), that means you have something between a 4% and 25% chance that the reconstruction will fail to recreate something - likely a *far* greater probability than that of a second whole-disk failure (though, again, you may be able to mitigate this risk if the array performs background 'scrubbing' activity to detect failing sectors before they become completely unreadable).

That's one reason why RAID-6 (double-parity protection that can tolerate concurrent failures of any *two* disks) is gaining in popularity these days.

- bill
.



Relevant Pages

  • Re: Need feedback on the A5200 storage array....
    ... they don't have the money for a big Hitachi array or a fast FC array with ... Use RAID5 on that kind of hardware. ... ten years or so) that had internal RAID5 controllers. ... I can't simply yank a disk and read its ...
    (comp.unix.solaris)
  • Re: HP EVA4000 / IBM DS4300 / EMC CX3-20/40
    ... Both EMC and EVA are great arrays and they will serve you well. ... disk array with the virtual raidsets on top. ... So, the system admin, and the DBAs had to create and manage lots of ...
    (comp.arch.storage)
  • Re: RAID 5 corruption, RAID 1 more stable?
    ... corruption to either the RAID array itself or the file system. ... The disk array to suffer so many errors (for example disk errors ... There is nothing the disk array can do if the host is broken and ...
    (comp.arch.storage)
  • Bug+fix: PDC20271 RAID detection fails
    ... My array was not detected by my kernel. ... the PDC RAID superblock, that is located at the start ... of the last track on the disk. ... is a multiple of track size and if not, ...
    (comp.os.linux.hardware)
  • Second node stopped "seeing" the disks
    ... Powervault disk array. ... I did the hardware configuration ... servers could connect and write to the array. ...
    (microsoft.public.windows.server.clustering)