Raid 5 Doomed Article


NAS

Recommended Posts

They failed statistics 101.... like the other article.

 

The gist is that 12 TB = 1x10^12 bits.  If the unrecoverable read error rate for each drive is 1 in 1x10^12 bits, they think you will get one error during the rebuild of a 12TB array.

 

But having 12 drives that each has a 1x10^12 bit error rate is NOT the same as 1 drive with a 1x10^12 bit error rate.

Link to comment

They failed statistics 101.... like the other article.

 

The gist is that 12 TB = 1x10^12 bits.  If the unrecoverable read error rate for each drive is 1 in 1x10^12 bits, they think you will get one error during the rebuild of a 12TB array.

 

But having 12 drives that each has a 1x10^12 bit error rate is NOT the same as 1 drive with a 1x10^12 bit error rate.

 

100% true, they also don't consider that the value of a read error rate of 1x10^12 is a manufacturers value, which means that in order to justify this value the 95% confidence interval (subject to further refinement through safety factors) must be less than 1x10^12 in the manufacturers statistical analysis.  Most of the drives that are on the market would have a higher read error rate(better) than the actual marketed value (unless they have some legal way around this, the engineers have to "prove" the values are legit).

 

Cheers,

Matt

Link to comment
Theres also the striping-all-or-nothing approach of RAID 5 to take into account.

 

Which brings me back to my question from several weeks ago, of what does unRAID do when it encounters an uncorrectable read error when operating in degraded mode (i.e. with a failed data drive)?

 

It needs to continue, and not mark the drive as bad.

 

 

Link to comment

Which brings me back to my question from several weeks ago, of what does unRAID do when it encounters an uncorrectable read error when operating in degraded mode (i.e. with a failed data drive)?

 

It needs to continue, and not mark the drive as bad.

 

I agree.

 

unRAID does not take a drive out of service for a read error - only for a write error.  In this scenario (rebuilding a drive), I don't think that unRAID would stop for a read error.  It would likely assume the sector were all zeros and continue on.  This is just a guess.

 

TOM IF YOU ARE READING COULD YOU CONFIRM OR DENY?  INQUIRING MINDS WANT TO KNOW!

Link to comment

If array parity is valid, then for an unrecoverable...

Write error: the drive is 'disabled' but parity is updated (so that drive contents can be reconstructed);

Read error: block is 'reconstructed' by reading all other drives plus parity.  Result is then re-written to bad block (if this subsequent write fails then see 'write error' case above).

 

If array parity is not valid, then all unrecoverable errors are 'passed up' to the caller - this will result in originating application getting an I/O error, or possible loss of data if we're talking about a cache flush write.

 

A future feature would be to disable all write-behind, from Samba all the way to the driver, if array parity is not valid.  But this would really slow down writes.

Link to comment

But the question is --- if a drive is being rebuilt, and you get a read error from one of the drives (parity or data) during the rebuild, would unRAID terminate the reconstruct of the drive?  Or would it just go on to the next sector and complete the reconstruction on a best effort basis?

Link to comment

But the question is --- if a drive is being rebuilt, and you get a read error from one of the drives (parity or data) during the rebuild, would unRAID terminate the reconstruct of the drive?  Or would it just go on to the next sector and complete the reconstruction on a best effort basis?

 

The reconstruct will continue.

Link to comment

The reconstruct will continue.

 

Thank you.

 

Will there be any indication the unrecoverable error was experienced?

 

There will be the original error posted in the system log, but otherwise no.  In looking at the code & thinking about this, we should increment the 'Sync Errors' counter when this happens (as we do for errors detected during Parity Check).  This will be in the next release.

Link to comment

Some recovery tools will remap bad sectors and fill the remapped sector with some searchable string like "UNRECOVERABLE DATA UNRECOVERABLE DATA ..." so that, after recovery, the user could search the files for that string and figure out what file(s) were impacted. 

 

Having unRAID do something similar during a drive rebuild would be a nice enhancement.  It would allow a user to be able to figure out what got corrupted, rather than just knowing something got trampled with no means to figure out what it was.  No real harm in it - if you get a bad read you know that sector is not going to rebuild correctly - might just as well put something identifyable in there. 

 

You'd likely want to do this on both the restored disk AND the disk that gave the read error (unless it was parity).  Corresponding info should be in the syslog to guide a person to the affected drives.

 

I think that this would be a great advertising point!  "Stripe kill" is such a hot topic of criticism of RAID-5.  A robust story to tell about how unRAID gracefully handles this deadly (and relatively common) occurrence, giving the user the ability to recover most all of their data and the tools to figure out what, if anything, got corrupted, would be a great selling point IMO.

Link to comment
  • 12 years later...
On 1/28/2009 at 10:18 AM, bubbaQ said:

They failed statistics 101.... like the other article.

 

The gist is that 12 TB = 1x10^12 bits.  If the unrecoverable read error rate for each drive is 1 in 1x10^12 bits, they think you will get one error during the rebuild of a 12TB array.

 

But having 12 drives that each has a 1x10^12 bit error rate is NOT the same as 1 drive with a 1x10^12 bit error rate.

 

12 years later I came to tell that it doesn't matter how many drives you have and whether they are a part of one storage array or each drive plugged into its own laptop.

The official number for Bit Error Rate per read (BER) for Consumer HDD (PC/Laptops) = 1/100,000,000,000,000 (1 / 10^14).

The proper interpretation of BER is this: "the chance that a bit is unreadable".

* Every time you read a bit with a BER of 1/10^14, it's like you are rolling the dice with a chance of 1/6 to get a "six" (let's imagine for a moment that the "six" is that "read error" situation).

And when you go on to read the next bit, it's like you are rolling the dice again.

* If you want to calculate that you are not getting the "six" in 30 straight rolls, you calculate the probability of no-six per each roll: 1 - 1/6 = 5/6. This is the probability that you'll be getting anything but the "six". Next you raise 5/6 to the power of 30 (number of throws): (5/6)^30 = 0.0042 or 0.42% chance of not getting a "six" in 30 dice rolls. Now to calculate the probability of getting a "six" at least once in 30 dice rolls, we just have to "invert" the probability: 1 - 0.42 = 99.58%.

 

By the same logic, to calculate the probability of not getting a single bit error in a 12TB read we have to identify the number of independent bits reads (aka dice rolls):

12TB = 96Tbit = 96.000.000.000.000 bit

Each bit has a BRE of 1/10^4 (read error). Meaning that read success = 1 - 1/10^4 = 0.99999999999999. This is the probability of a successful read of a SINGLE bit.

Now we raise this number to the power of 96.000.000.000.000 bits (number of independent dice rolls): 0.99999999999999 ^ 96.000.000.000.000 = 0.383186795 or 38.3% of having NO bit errors across the 12TB of read. Which results in 1 - 38.3 = 61.7% chance of bit error.

 

The important part here is to realize how independent and decoupled bit reads are from each other throughout the 12TB of read. It doesn't matter if you (re)read all 12TB from a single 1TB disk, or it's across the 100TB array made of 20 disks, or it's a cumulative read performed by 12 separate laptops with their own drives inside, 1 TB of read by each laptop. Each bit read is as independent event as each roll of a dice. No matter how many different and unique dice you use throughout the experiment.

 

I just thought I'd add my two cents as I was waiting for the Parity-Sync to complete on my Unraid. Plus this topic is still indexed very well by google after all these years. Okbye!

 

P.S. Some might say - 61.7% chance of bit error is way too high based on anecdotal evidence and personal experience. And they will be right. Firstly, because UREs don't happen due to a single bit flip, thanks to ECC built into HDD firmware which can tolerate a flip per sector. Secondly, because I find the BER of 1/10^14 way too conservative.

The above example calculates the probability of a bit flip that most likely goes unnoticed thanks to ECC, not an actual URE. It also shows that there is no difference how many drives are participating in the 12TB read. Every bit read is an independent event from the standpoint of probability.

 

 

 

 

 

 

Edited by m-a-x
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.