Re: Windows RAID

On 13/07/2011 08:08, Yousuf Khan wrote:
On 12/07/2011 5:29 PM, David Brown wrote:
For the most part, I'd agree with you - RAID1 (or RAID10 for better
performance) is a good choice for many applications because it gives you
fast performance and fast recovery. But it has only one disk redundancy
- two disks failing (or one disk failed, and an unrecoverable read error
on the other) means lost data if the two failures are within one pair.
For the same reasons that some users want RAID6 instead of RAID5,
paranoid users might worry about the one disk redundancy of (two-copy)
RAID1. You could always go for triple-copy RAID1 - Linux mdadm will do
raid10,f3 layouts to get fast reads striped over all the disks. But
RAID15 will give you /three/ disk redundancy, which is plenty for the
most serious worriers. And it will do so with better space efficiency
than triple-copy RAID1 (if you have more than 6 disks).

For somebody extolling the virtues of software RAID, I'm surprised you
aren't aware of one the major reasons for using software RAID and that
is that software RAID can make either RAID 0+1 or RAID 1+0 volumes.
What's the difference? One is striping and then mirroring (RAID 0+1,
let's call it "Rx" from now on), while the other is mirroring and then
striping (RAID 1+0, call it "Ry"). The most common form is Rx, of
course. Less common is Ry.

I was under the impression that Ry - a RAID0 stripe of RAID1 mirrors - is more common. But I'll let you continue...

Let's say you got 4 identical disks which you're going to stripe and
mirror with each other: A,B,C,D. In Rx, you would stripe A with B, and C
with D, and then you would mirror AB to CD. In Ry, you would first
mirror A with C, and mirror B with D, you would then stripe AC with BD.


In Ry, you have a 66% chance of surviving a two-disk failure. Let's say
your first disk failure occurred in the AC pair, let's say A failed.
Then your second disk failure can only occur in the remaining three
disks, the chance that the second failure occurs in the mirror partner C
is only 33%, but 66% that it'll occur within the BD mirror pair. As long
as the failure occurs in the BD pair, you won't lose any data. The more
mirrored pairs you have, the higher your chances of surviving a two-disk
failure are. And another big advantage is that Ry doesn't take any more
or less space than Rx.

Yes, I know of the advantages of RAID10 (what you call Ry) over RAID01 (Rx). It is also much better when you are doing a re-build - you only need to copy one disk's worth of data, not half the array.

RAID01 may have an advantage in the topology, such as having each RAID0 set in its own cabinet with its own controller, so that you have redundancy in that hardware too.

This is something that is available to software RAID (sophisticated ones
anyway), but it's unclear if it's available to hardware RAID. Most of
the time hardware RAID means RAID5 anyways, and mirroring + striping are
not given a lot of thought. In most cases, mirroring is used mainly on
OS disks, rather than data disks. But in some cases they can be used in
data disks where performance is critical, especially write performance.

I don't have your experience and knowledge of large external storage arrays, so I can't comment on what they do or do not support there.

I have a server with an LSI hardware raid card (I would have preferred to use Linux software raid on the server, but it's almost impossible to get non-raid access to disks on that card, so it is running hardware RAID5). That card can certainly do RAID10.

My understanding of usage is that RAID1 is popular for, as you say, OS disks - or other usage where you only have a small amount of data. Large data stores normally use RAID5 (or RAID6) because of better space efficiency. But there are also data types which don't fit well with RAID5 - big databases is the typical example. Here the extra overhead needed for sub-stripe writes to RAID5 can be a big performance hit, and you often can't afford the bandwidth taken during a RAID5 rebuild. (This all depends on your usage patterns, of course, and the type of hardware and RAID system you have available.)

Having had a brief look again at the wikipedia article on raid, I think it's interesting that it discusses RAID51 (and even RAID61), but not the RAID15 I described in an earlier post. For the same reason that RAID10 is generally a better choice than RAID01, RAID15 would provide better on-average failure tolerance and more efficient rebuilds than RAID51. Perhaps the topology of the hardware and connections makes RAID51 more attractive, or that you could make it more easily using existing hardware components (just take two normal RAID5 arrays, and then mirror them at a higher level).


Another important point about RAID10 in the context of Linux software raid, is that with Linux it can be done with only two disks (or any number greater than 1). It can also be done in a "far" layout which gives excellent read performance (often faster than a simple RAID0 stripe), at the cost of slower write performance. This makes a bigger difference if you have only a few disks, and if there are fewer parallel accesses.

Is there really a need for such a system? Probably not - it's a
theoretical combination. There are also many other causes of failure
that you should think about before the risk of a two disk failure
becomes an important concern - it may make more sense to have a cluster
with redundant servers rather than just redundant disks.

My experience is that chances of two disk failures are significantly
reduced with hot-spares available.

Certainly a hot spare will reduce the time during which you have no (or lower) redundancy.

The big concern, according to what I have read, is of getting unrecoverable read errors during a rebuild. If you have a large RAID5 array, you have a lot of data that needs to be read during the rebuild - every sector of every disk. URE are rare, but if you read enough data the chances of hitting one are no longer negligible. If you have redundancy in your disk set, then it's not a problem if you hit an URE - the raid controller/software makes up the data from the rest of the stripe. But if you don't have that redundancy, then it can't do that and some of the data for that stripe is permanently lost. Starting the rebuild quickly does not mitigate this effect.

(I haven't seen this effect myself - I don't deal with enough disks or data to make it statistically likely.)

There's more chances of the data backplane failing than two-disk
failure. If the data backplane fails then none of the disks will be
available, doesn't matter what kind of planning you made against
two-disk failures. So it's more useful to get redundant data backplanes

I'll accept that argument, at least regarding a second complete drive failure, since I've no statistics to argue otherwise. Certainly it makes sense to look at all possible failure situations, and how these risks can be reduced, or the effects of failure reduced. Other simple examples include redundant power supplies and redundant fans. And perhaps redundant locks on the door - there is no point in having extra redundancy on the disks if the statistically biggest chance of data loss is physical theft!

But two-disk redundancy is getting popular (typically RAID6) for large
arrays, as single-disk redundancy is not always enough - and RAID15
would give you that as well as very good recovery properties and
excellent speeds.

Could be getting popular, but I think just hot-spares by themselves do
more to guard against two-disk failures, than all of these fancy schemes
like RAID6 or RAID 1+0. I'm looking at it from a practical experience
vs. theoretical point of view. Two-disk failures are entirely correlated
to time between failures, so the more you wait, the more you have a
chance to get a second failure.

If you've got the extra disk, why would you prefer RAID5 + hot spare over RAID6? RAID6 has slightly more overhead for small writes, but it also has slightly faster reads (since you have an extra spindle online).