HW Raid5 as Parity Drive?


Recommended Posts

... Ya'll be missing the point (and the math) by a mile.  You didn't read my post carefully.  I said  DATA LOSS, not drive failure.

 

Agree ... I noted earlier that the statistics being posted weren't right.

 

Using a RAID-0 parity effectively makes NO difference in the likelihood of data loss => it does very slightly increase the likelihood of a drive failure; and in fact even adds the special case of a 2-drive failure with no data loss (both parity drives).    But once you have a drive fail, as long as you immediately do a drive rebuild, the likelihood of data loss is simply the probability of one other drive failing -- which is more likely to be due to an unrecoverable bit error than a physical drive failure.    The likelihood of that bit error is not effected by use of a RAID-0 array.    The likelihood of a failure for other reasons is in fact very slightly higher due to the simple fact that there more drives in play ... but this is VERY tiny, and is no different than if you had one additional data drive.

And the probability of a drive failing for a non-bit-error reason in the next 8-10 hours is a LOT lower than the probability of it failing in the next year  :)

 

 

Link to comment

@bubbaQ

I was not referring to "data loss" but the passage:

Disks don't just blink out like light bulbs. For the most part, they start to act up and show signs that you would start receiving warnings about long before they actually fail.

 

Of course data loss will occur after a double drive loss - that would be 2 out of n in our unRAID world.

And a raid0 parity setup means 2 out of (n+1) compared to the "initial" setup.

And yes, the difference is very little but it is there.

 

If you look at failure statistics, you see failure rates of individual drives differ by 4-5X those of other brands. You can't say that a RAID0 of say HGST drives is more likely to fail than a single WD or Seagate drive with much higher failure rate.

 

It's all statistics - the numbers we are dealing with are very small.

So small, that you can't handle them with feelings.

If you want to talk about reliability and failure probability you have to stay with the figures.

At first, doubling the value is doubling the risk - period.

But you are right, changes in this business usually matter if they are in the range of magnitudes.

 

e.g. a tolerable hazard rate for catastrophic events in safety engineering is usually lower than 1E-9 failure/hour or 1 failure per 1E9 hours (1 FIT).

A disk drive is rated at an MTBF of 1 to 2 mio. hours = 1 - 0.5 failures per mio. hours (500FIT)

 

Based on a failure rate of 500FIT for a disk drive, a 2 out of 24 scenario will have a failure frequency of 100FIT = 1 loss in 10 mio. hours.

This is based on a "system life time" of 720h (montly parity check interval when a failure is detected and resolved).

Doing fault detection at a yearly interval will lead to a failure frequency of 1000FIT = 1 loss in 1mio. hours.

 

If you run this array without parity protection the value for 1 out of 24 is about 10000FIT = 1 loss in 100000 hours.

 

One could say, 100000 hours is pretty much - my array will never reach that. I will swap the drives far before...

Well, then consider the millions (or billions) of drives in the field in the equation ...

 

Link to comment

There are three kinds of lies - lies, damn lies, and statistics.

 

You should, of course, have given Disraeli credit where credit is due  :)

... even Mark Twain didn't use this without a footnote giving Disraeli credit for first saying it.  (Although admittedly historians have doubts about the actual origin)

 

:) :)

 

But statistics DO provide a basis for at least having some idea about just how much risk is involved in various configurations => although there are so many different variables (quality of the individual drives; size of the drives; total number of drives in the array; etc. that statistically there's effectively NO difference in the net risk of a single drive vs. a RAID array for parity.    Remember that for data loss there has to be a 2nd drive failure -- so the only "at risk" time is during the rebuild of the first failed drive ... and the likelihood of a failure during that very short period is almost exclusively based on the unrecoverable bit error rate => which is identical for a single drive and a RAID-0 array.

 

 

Link to comment

There are three kinds of lies - lies, damn lies, and statistics.

 

You should, of course, have given Disraeli credit where credit is due  :)

... even Mark Twain didn't use this without a footnote giving Disraeli credit for first saying it.  (Although admittedly historians have doubts about the actual origin)

 

:) :)

 

But statistics DO provide a basis for at least having some idea about just how much risk is involved in various configurations => although there are so many different variables (quality of the individual drives; size of the drives; total number of drives in the array; etc. that statistically there's effectively NO difference in the net risk of a single drive vs. a RAID array for parity.    Remember that for data loss there has to be a 2nd drive failure -- so the only "at risk" time is during the rebuild of the first failed drive ... and the likelihood of a failure during that very short period is almost exclusively based on the unrecoverable bit error rate => which is identical for a single drive and a RAID-0 array.

 

First this is flat out wrong. Each disk has a URE rate. A URE on either disk in Raid-0 (2 disks) would prevent rebuild of the entire array. The URE of a Raid-0 "disk" is not the same as a single disk. This URE rate increases dramatically the more disks you have in RAID-0. 

Link to comment

Absurdity, n.: A statement or belief manifestly inconsistent with one's own opinion.

 

Ambrose Bierce

 

Well that sure adds a lot to the discussion.

 

Basic probability question with small numbers that are easy to understand...

 

If there are 10 balls 1 of which is blue and one of which is white and you are asked to reach into a bag and draw one at random what is the probability you'll get a white ball. 1 in 10. The probability of a blue ball 1 in 10. What's the probability of getting either a blue or a white ball. 2 in 10 or 1 in 5.

 

The Same Probability question with bigger numbers...

 

If a disk is manufactured with an expected URE for every 1 in 10^14 bits read and another disk is manufactured with an URE of 1 in 10^14 bits is the probability of having an URE at any given time on either of the two disks 1 in 10^14...

 

I think the answer is no, while for some reason you seem to think yes.

 

Yes my earlier math was incorrect, due to a mistake in polynominal expansion but here are the new numbers.

 

Rows are Number of Disks in RAID 0 Array, Colums is Disks in unRAID array, assuming a 1% failure rate a year. This does not account for UREs and is only the probability of having two disks fail at the same time, including the risk of UREs would make these numbers higher. 

 

1 2 3 4 5 6 7 8

1 1.00% 0.01% 0.03% 0.06% 0.10% 0.15% 0.20% 0.27%

2 1.99% 0.03% 0.07% 0.12% 0.17% 0.24% 0.31% 0.40%

3 2.97% 0.07% 0.14% 0.23% 0.32% 0.42% 0.52% 0.63%

4 3.94% 0.13% 0.26% 0.39% 0.53% 0.67% 0.82% 0.97%

5 4.90% 0.20% 0.40% 0.60% 0.80% 0.99% 1.19% 1.39%

6 5.85% 0.29% 0.58% 0.85% 1.12% 1.38% 1.64% 1.89%

7 6.79% 0.40% 0.79% 1.15% 1.50% 1.83% 2.14% 2.45%

8 7.73% 0.53% 1.02% 1.49% 1.92% 2.32% 2.71% 3.07%

 

Link to comment

Well that sure adds a lot to the discussion.

 

Basic probability question with small numbers that are easy to understand...

 

If there are 10 balls 1 of which is blue and one of which is white and you are asked to reach into a bag and draw one at random what is the probability you'll get a white ball. 1 in 10. The probability of a blue ball 1 in 10. What's the probability of getting either a blue or a white ball. 2 in 10 or 1 in 5.

 

The Same Probability question with bigger numbers...

 

If a disk is manufactured with an expected URE for every 1 in 10^14 bits read and another disk is manufactured with an URE of 1 in 10^14 bits is the probability of having an URE at any given time on either of the two disks 1 in 10^14...

 

I think the answer is no, while for some reason you seem to think yes.

 

Yes my earlier math was incorrect, due to a mistake in polynominal expansion but here are the new numbers.

 

Rows are Number of Disks in RAID 0 Array, Colums is Disks in unRAID array, assuming a 1% failure rate a year. This does not account for UREs and is only the probability of having two disks fail at the same time, including the risk of UREs would make these numbers higher. 

 

1 2 3 4 5 6 7 8

1 1.00% 0.01% 0.03% 0.06% 0.10% 0.15% 0.20% 0.27%

2 1.99% 0.03% 0.07% 0.12% 0.17% 0.24% 0.31% 0.40%

3 2.97% 0.07% 0.14% 0.23% 0.32% 0.42% 0.52% 0.63%

4 3.94% 0.13% 0.26% 0.39% 0.53% 0.67% 0.82% 0.97%

5 4.90% 0.20% 0.40% 0.60% 0.80% 0.99% 1.19% 1.39%

6 5.85% 0.29% 0.58% 0.85% 1.12% 1.38% 1.64% 1.89%

7 6.79% 0.40% 0.79% 1.15% 1.50% 1.83% 2.14% 2.45%

8 7.73% 0.53% 1.02% 1.49% 1.92% 2.32% 2.71% 3.07%

 

I think it is absurd that you think you can, with mathematical precision, derive any meaningful results from this analysis. You are assuming all drives have the same failure rate (which we know isn't true even for different sizes of the same drive model), that all failures are absolute, that failures cannot be averted with proper preventative maintenance, that failures are not dependent on external factors (like heat and vibration) that may affect some disks more than others. It is absurd that you are somehow going to ignore all of that which is hugely significant with the very small percentages you are dealing with, and make some accurate prediction to the hundredths of a percent of the impact of having 2 3T in a RAID0 vs 1 6T drive on data loss. I say any of the factors you are ignoring as constants would make a markedly bigger difference than the one you are varying. This ain't pulling balls out of a hat!

Link to comment

Absurdity, n.: A statement or belief manifestly inconsistent with one's own opinion.

 

Ambrose Bierce

 

Well that sure adds a lot to the discussion.

 

Basic probability question with small numbers that are easy to understand...

 

If there are 10 balls 1 of which is blue and one of which is white and you are asked to reach into a bag and draw one at random what is the probability you'll get a white ball. 1 in 10. The probability of a blue ball 1 in 10. What's the probability of getting either a blue or a white ball. 2 in 10 or 1 in 5.

 

The Same Probability question with bigger numbers...

 

If a disk is manufactured with an expected URE for every 1 in 10^14 bits read and another disk is manufactured with an URE of 1 in 10^14 bits is the probability of having an URE at any given time on either of the two disks 1 in 10^14...

 

I think the answer is no, while for some reason you seem to think yes.

 

Yes my earlier math was incorrect, due to a mistake in polynominal expansion but here are the new numbers.

 

Rows are Number of Disks in RAID 0 Array, Colums is Disks in unRAID array, assuming a 1% failure rate a year. This does not account for UREs and is only the probability of having two disks fail at the same time, including the risk of UREs would make these numbers higher. 

 

1 2 3 4 5 6 7 8

1 1.00% 0.01% 0.03% 0.06% 0.10% 0.15% 0.20% 0.27%

2 1.99% 0.03% 0.07% 0.12% 0.17% 0.24% 0.31% 0.40%

3 2.97% 0.07% 0.14% 0.23% 0.32% 0.42% 0.52% 0.63%

4 3.94% 0.13% 0.26% 0.39% 0.53% 0.67% 0.82% 0.97%

5 4.90% 0.20% 0.40% 0.60% 0.80% 0.99% 1.19% 1.39%

6 5.85% 0.29% 0.58% 0.85% 1.12% 1.38% 1.64% 1.89%

7 6.79% 0.40% 0.79% 1.15% 1.50% 1.83% 2.14% 2.45%

8 7.73% 0.53% 1.02% 1.49% 1.92% 2.32% 2.71% 3.07%

 

 

 

Are you controlling for multiple RAID0 failures?

 

Assuming 1% annual failure rate per disk and RAID0 parity.

 

Annual probability of data loss:

# of parity disks

                0              1         2         3

# of  1 1.000% 0.010% 0.020% 0.030%

data  2 1.990% 0.059% 0.078% 0.136%

disks 3 2.970% 0.117% 0.174% 0.230%

        4 3.940% 0.193% 0.268% 0.342%

        5 4.901% 0.287% 0.379% 0.470%

        6 5.852% 0.398% 0.507% 0.614%

        7 6.793% 0.525% 0.650% 0.774%

 

Our numbers are similar. My formulas essentially assume that if one parity disk fails they all are gone and the subsequent loss of a data disk is required for data loss.

 

This exercise is useful to understand the impact of adding RAID0 parity to increase speed.  Looking at row 1 we can see the probability of loss increases with additional RAID0 parity disks. However, doubling from .01% to .02% is inconsequential in most cases.

 

The OP asked about using RAID5 for parity. The effect of RAID1 is shown in the following table:

RAID1 # of parity disks

                0           1           2         3

# of  1 1.0000% 0.0100% 0.0001% 0.0000%

data  2 1.9900% 0.0591% 0.0214% 0.0205%

disks 3 2.9701% 0.1170% 0.0620% 0.0609%

        4 3.9404% 0.1931% 0.1215% 0.1206%

        5 4.9010% 0.2868% 0.1995% 0.1988%

        6 5.8520% 0.3976% 0.2954% 0.2938%

        7 6.7935% 0.5248% 0.4087% 0.3693%

 

RAID 5 will be identical except for the last column. RAID5 would have slightly worse (greater) numbers in column 3.

 

These tables show that using a RAID0 parity for increased performance should not create reliability problem and using a RAID1 or RAID5 parity provides very little reliability improvement. In most usage cases, the effect of any RAID parity drive is inconsequntial to data loss.

Link to comment

A drive failure that happens to me is 100% failure rate. A drive failure that happens to someone else is a statistic.

 

All drives fail eventually. The tricky bit is figuring out when. A RAID system helps reduce the risk when we get a prediction wrong.

 

All these drive manufacturers statistics are good for is determining profit and loss predictions for warranty and pricing of the entire run of drives. They mean absolutely NOTHING in practice for a single person with even 100's of disks.

 

The probability that your specific drive was dropped in shipping is orders of magnitude higher than any failure statistic based on the manufacturers data.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.