Re: General musings and/or recommendations on number of global spares to keep?



On Wed, 24 Aug 2005 21:21:02 GMT, Dan Stromberg
<strombrg@xxxxxxxxxxxxxxx> wrote:

>
>I've been working on a Sun StorEdge 3511 with dual RAID controllers and
>three expansion boxes.
>
>We originally purchased the equipment expecting to get 16 terabytes of
>usable space.
>
>Now that it's "all set up", we're really seeing more like 14 or 15
>terabytes, depending on how you do the calculation.
>
>The Sun channel partner we're working with is advising that we go from our
>current 4 global spares, down to either 1 or 2 global spares, using the
>additional 3 or 2 disks for data.
>
>The number of disks in the system totals 48, including data and parity and
>global spares.
>
>Please be sure to use a fixed-pitch font when viewing the tables found below.
>
>What we have right now is:
>
>global spares: 0,16,32,48
>
>Raidset Disks used Data:parity ratio
>0 1,2,3,4,5,6,7,8,9,10 9:1
>1 11,17,18,19,20,21,22,23,24,25 9:1
>2 26,27,33,34,35,36,37,38,39,40 9:1
>3 41,42,43,49,50,51,52,53,54,55 9:1
>4 56,57,58,59 3:1
>
>
>And the vendor is suggesting that we move to something like:
>
>global spares: 0
>
>Raidset Disks used Data:parity ratio
>0 1,2,3,4,5,6,7,8,9,10 9:1
>1 11,17,18,19,20,21,22,23,24,25 9:1
>2 26,27,33,34,35,36,37,38,39,40 9:1
>3 41,42,43,49,50,51,52,53,54,55 9:1
>4 56,57,58,59,16,32,48 3:1
>
>...or...:
>
>global spares: 0,16
>
>Raidset Disks used Data:parity ratio
>0 1,2,3,4,5,6,7,8,9,10 9:1
>1 11,17,18,19,20,21,22,23,24,25 9:1
>2 26,27,33,34,35,36,37,38,39,40 9:1
>3 41,42,43,49,50,51,52,53,54,55 9:1
>4 56,57,58,59,32,48 3:1
>
>
>Does anyone have any comments on:
>
>1) The sanity of these 10 disk RAID 5's?
>
>2) The degree of loss of reliability incurred by moving 3 disks from
>global spare to data?
>
>3) The degree of loss of reliability incurred by moving 2 disks from
>global spare to data?
>
>
>To answer these questions, you probably need to know how the storage is to
>be used. This single, large QFS filesystem will be used by a variety of
>researchers and students from around The University of California, Irvine,
>but was purchased primarily by the Earth System Science part of the
>Physical Sciences department, which in turn will primarily be storing many
>approximately 100 megabyte files which comprise time series related to
>climatology simulations.
>
>They don't feel that the storage has to be blazing fast, and 100% uptime
>isn't paramount, however they very much do not want to lose their data.
>
>The filesystem will not be backed up - we simply don't have anything large
>enough to back it up -to-, so if the some part of the storage solution
>goes kerflooey, we're totally... er... out of luck, and they'll probably
>be looking at me (the primary sysadmin on the storage configuration),
>wondering why their data is gone.
>
>Thanks!


I actually thought you were a little paranoid on your layout until I
got to the last section. Now I think you're not paranoid enough.

If availability is paramount, and you're not backing it up somewhere,
then I think raid 5 alone is a resume trigger.
Regrettably any solution that would let me sleep at night would
require alot more capacity than you currently have. But raid 1+0
would be my recommendation with 4 global spares.

Now, having said that, and putting aside the no backup policy
(/shiver) I think you could easily get away with 2 global spares for
the number of drives you have.

For 168 drives I have between 3 and 6 global spares. So for 48 drives
I would personally be comfortable with 2 spares, *if* I had backups.

Honestly, the no backup policy is freaky. Especially since they
actually want the data to stick around.

Question: If you're running QFS why not slap SAMFs on it as well and
use the tape store as a psuedo backup plan? It could be proclaimed
additional capacity for the users while acting as a safety net for
them too.

~F
.



Relevant Pages