unRAID mention in RAID reliability review

Nyago123 · September 2, 2010

http://www.servethehome.com/raid-reliability-failure-anthology-part-1-primer/

My wishes for dual parity remain intact...

Vibe · September 3, 2010

Yes...interesting author comment:

Between the MTBF portion here, and when the UBER results are online, you will see that single parity schemes like RAID 4 and 5 are very scary. Real-world, a RAID 4 array will have a higher MTBF failure rate versus a RAID 5 array because the rebuild speed is much slower.

Joe L. · September 3, 2010

Yes...interesting author comment:

Between the MTBF portion here, and when the UBER results are online, you will see that single parity schemes like RAID 4 and 5 are very scary. Real-world, a RAID 4 array will have a higher MTBF failure rate versus a RAID 5 array because the rebuild speed is much slower.

I'm not sure where that can be true. To rebuild ANY RAID 4 or RAID 5 array you must read all the data on the "working" disks in the array and write all the data to the disk being rebuilt. The exact same amount of data is being manipulated. You are limited by the read speed of the slowest drive involved. (or the write speed of the drive being rebuilt) whichever is the slower.

Any other limit of the data rate of the disk controllers and/or PCI/PCIe bus has absolutely nothing to do with RAID 4 or RAID 5.

Joe L.

Joe L. · September 3, 2010

A little over 2 years ago this was discussed in a thread entitled "How Reliable is an unRAID array?"

You can find it here: http://lime-technology.com/forum/index.php?topic=1751.msg12137#msg12137

It has links to other sites with reliability calculators and papers written by drive manufacturers and others.

Now, our array sizes have grown, and the disk sizes have doubled/quadrupled, but the time to re-construct a drive has stayed close to the same since the SATA interface and disk read/write speed is faster.

I'll still say, you are more likely to lose data from a human error than anything else.

Joe L.

bubbaQ · September 3, 2010

Time periods are not a good x-axis.

UREs are expressed as a function of bytes read from the drive, not time periods.

Reading 100TB from a 1TB drive (100 full passes) reads the same amount of data, and should experience the same URE count regardless of whether that 100TB was read over 30 days, or 300 days.

You also have to consider that different drives have different URE rates, by a factor of 10 and even 100.

pjkenned · September 3, 2010

Time periods are not a good x-axis.

UREs are expressed as a function of bytes read from the drive, not time periods.

Reading 100TB from a 1TB drive (100 full passes) reads the same amount of data, and should experience the same URE count regardless of whether that 100TB was read over 30 days, or 300 days.

You also have to consider that different drives have different URE rates, by a factor of 10 and even 100.

Actually, time period is a fairly standard x-axis type for these types of calculations because you are doing a probability in a given time period calculation. That model specifically does not include a UBER calculation, and has a static MTBF. Also, an assumption is 15TB on each array and 50MB/s rebuild speed which is important because if you just used max capacity array size during MTTR (Mean Time to Recover) you end up making RAID 10 and 60 look much better due to the increased number of parity disks, lower capacity, and therefore lower amounts of data to rebuild.

On rebuild speeds, you have to be VERY careful with this. Most models assume things like XOR happening in one media turn. I actually had someone run RAID 4 v. RAID DP and WAFL on 10 drive FC arrays on current-gen hardware to show that real world RAID 4 rebuilds are slower.

If you are curious, NetApp's 6xxx series (their high-end) docs (see http://www.netapp.com/us/products/storage-systems/fas6000/fas6000-tech-specs.html) spec Maximum RAID Group Sizes at:

RAID 6 (RAID-DP) FC – 28 (26 data disks plus 2 parity disks), SATA – 16 (14 data disks plus 2 parity disks)

RAID 4⁽⁴⁾ FC – 14 (13 data disks plus 1 parity disk), SATA – 7 (6 data disks plus 1 parity disk)

That note next to RAID 4 says: ⁽⁴⁾ RAID 6 is the recommended configuration for drives greater than 144GB.

Just food for thought on why enterprise storage does not use single-parity RAID 4/5 that much anymore.

Also, URE does depend a lot on the amount of data written, but in a healthy array is basically a non-issue at the moment since having two bad sectors in the exactly correct spot on disks is a low probability event. The more data written the more errors get written (if not scrubbed) and the higher chance for failure when an array is in a failed state.

BRiT · September 3, 2010

The shortsightedness in that review has to deal with the usage patterns. Typically in an unRAID array the large majority of the drives are spun down, while ALL the drives in RAID 0/1/5/10/50 are always spinning equally. Thus for a data drive 1 year of usage in a typical RAID system is much closer to N+ years of usage in a typical unRAID system.

pjkenned · September 3, 2010

The shortsightedness in that review has to deal with the usage patterns. Typically in an unRAID array the large majority of the drives are spun down, while ALL the drives in RAID 0/1/5/10/50 are always spinning equally. Thus for a data drive 1 year of usage in a typical RAID system is much closer to N+ years of usage in a typical unRAID system.

Good luck modeling that. Lubrication in hard drives are set to work best at rotational temperatures. So you effectively have inefficient lubrication if the drive is not up to operating temperatures and are increasing your failure rate. On/ off spin-up not only adds a lot of latency (not an issue in low-end consumer storage) due to spin-up but spin-up is much rougher on drives than constant spinning since platters tend to be very well balanced. Most raid vendors and even a lot of software RAID solutions let you do spin down of RAID sets. If you want to see this, boot a 10 drive system and watch power usage versus idle spinning power usage. That differential can easily be over 10w/ drive because it requires a lot of power to convert into mechanical movement. Point is, I still have not read anyone doing a good Markov or better type model on this difference because it is really hard to model hypothetical wear on a hypothetical use scenario throwing out all of the other causes of failure without making pure speculation.

Then again, you are 100% correct that a low end consumer system that resembles streaming sequential data from a USB drive as the usage scenario will likely see less drive wear over time than a higher-end NAS/ SAN type application that services multiple users and needs to have a lot of performance. Not only that, if the spin-up lag is acceptable, then you actually save a ton of power with RAID 4 assuming you have 1-2 concurrent users and have relatively infrequent access to files contained on single drives. That logic is perfectly sound for the mechanical portion of drive failure (not related to things like PCB failures). The article in question was not meant for low end systems with low end usage scenarios though, hence using relatively large RAID sets.

GK20 · September 4, 2010

My wishes for dual parity remain intact...

Given that many MB nowadays have build-in HW RAID support, i would rather use HW RAID1 as parity disk in unRAID.

Just a though, never has chance to give it a try yet.

BRiT · September 4, 2010

I thought most of the built-in MB Raid-1 was still largely a software level and not a hardware level feature. Maybe I'm looking at different mb's than you.

bubbaQ · September 4, 2010

Actually, time period is a fairly standard x-axis type for these types of calculations because you are doing a probability in a given time period calculation. That model specifically does not include a UBER calculation, and has a static MTBF.

If he is using pure MTBF, then his numbers for RAID 4 and RAID 5 are a crock.

For example the first graph appears to be wrong (I haven't found his actual calculations... did he publish them?) If he was solely using MTBF (i.e. a second drive failure during the rebuild of the first) then the probabilities for RAID4/5 should be much lower.... it appears he is making some common mistakes of a a layperson doing statistical analysis. The probability of having a drive failure in one of 20 1-year-old drives, and then having a second failure of one in 19 1-year-old drives during the 11 hours needed to rebuild parity, is not 13.7%.

Lookup "bathtub curve" and send the author a link.... he needs it.

EVERY drive will fail. Period. If you have a critical system, then you replace drives when they reach 3 years old. Running a RAID failure graph to 10 years is asinine.

The article defines:

# UBER – Unrecoverable Bit Error Rate is basically a RAID 4 or RAID 5 array’s worst enemy as an error bit could cause a failed array rebuild.

An UBER (URE) can be fatal to RAID4 or RAID5...but it is not fatal to unRAID. unRAID continues the rebuild even if it has a URE.

bubbaQ · September 4, 2010

I looked more at this guy's "equations" and he is full of crap.

For example, he refuses to use the manufacturer's MTBF numbers because he doesn't trust them. In actuality, he doesn't understand them.

A 500,000 MTBF does not mean the manufacturer is claiming te drive will last 500,000 hours. It means in a large group of drives, you will expect a failure per 500,000 hours of cumulative usage. It does not apply except to large populations... it does not apply to a population of 1. It doesn't really apply well to a small population, such as 20 used in his example.

But then he uses a simplistic MTBF/#Drives.... which is only relevant if you are using the manufacturer's stated MTBF of 500,000 hours, not some made up number based on "5 years" for MTBF.

He is confusing "service life" with MTBF. Yes, drives have a 3 or 5-year service life. That is not the MTBF. MTBF is a statistical number, intended to be used in statistical calculations covering adequately size populations to be statistically valid.

Large-scale testing shows that drive mortality is in the 6-8% range in commercial environments (e.g. Google)... yet he claims a 98% chance 1 out of 20 drives will fail in 1 year? Give me a break.

BRiT · September 4, 2010

Something didn't look quite right to me either, but I didn't into things further. It's kind of alarming to hear about what mathematics he used to make those graphs.

WeeboTech · September 4, 2010

A 500,000 MTBF does not mean the manufacturer is claiming te drive will last 500,000 hours. It means in a large group of drives, you will expect a failure per 500,000 hours of cumulative usage. It does not apply except to large populations... it does not apply to a population of 1. It doesn't really apply well to a small population, such as 20 used in his example.

He is confusing "service life" with MTBF. Yes, drives have a 3 or 5-year service life. That is not the MTBF. MTBF is a statistical number, intended to be used in statistical calculations covering adequately size populations to be statistically valid.

That's very interesting. So in effect, the number does not mean as much unless you know the pool of drives it is measured against.

Other interesting Links

http://en.wikipedia.org/wiki/Mean_time_between_failures

http://www.eweek.com/c/a/Data-Storage/Hard-Disk-MTBF-Flap-or-Farce/

http://www.t-cubed.com/faq_mtbf.htm

bubbaQ · September 5, 2010

That's very interesting. So in effect, the number does not mean as much unless you know the pool of drives it is measured against.

You don't have to actually know anything about the pool per se... you just have to be willing to accept the pool as representative of your population (i.e. your drives are not some special case that is under-represented in the pool or widely disparate from the pool average)

There is a LOT of bogus claptrap about MTBF out there. I'd wager that over half of the articles that mention MTBF use it wrong. Even some drive manufacturer's PR blurbs get it wrong!

bubbaQ · September 5, 2010

Also, as drive manufacturers have been accused of inflating MTBFs, they have been overly conservative in UREs. A 2005 study by Microsoft found a field-tested UCE rate of only 1 per 466TB on a population of drives rates at 10^-14... so the actual field-tested UCE was 3x10-16

Vendor drive specifications predict an uncorrectable bit UREs every 10^-15 to 10^-16 bits read for SCSI and 10^-14 to 10^-15 bits read for SATA drives. SSDs have MTBF of 3,000,000 hours and UREs of 10^-16.

But the killer is the 4K sectors on the advanced format WD EARS drives... they improve overall error correction by two orders if magnitude to an expected 10^-17 or even less, because they have more space for the ECC.

shawn · September 10, 2010

Another article: http://blog.kj.stillabower.net/?p=93

•Limit your RAID 5 array to 10 drives. 50% of the 11+ drive Raid 5 array’s don’t survive 10 years. Throw in a 3 year rate of about 20% and you are at about the odds of playing Russian Roulette with a six shooter with your data.

•If your card can handle it, RAID 6 is the option to have for any large array.

I too would like to see a 2nd parity disk being implemented in UnRAID unless performance will be so slow as to be unusable. Reference: Limetech post: http://lime-technology.com/forum/index.php?topic=2634.msg21695#msg21695 User posts: http://lime-technology.com/forum/index.php?topic=6865.0 , http://lime-technology.com/forum/index.php?topic=5342.0

Benchmarks of hardware RAID 5 vs RAID 6 performance:

http://www.tomshardware.com/reviews/SERIAL-RAID-CONTROLLERS-AMCC,1738-13.html

http://www.tomshardware.com/reviews/SERIAL-RAID-CONTROLLERS-AMCC,1738-14.html

Joe L. · September 10, 2010

You can make any conclusions you like, but mine is that the writer is a bit unrealistic.

They say the only simulation that was "bullet-proof" was a raid-1 with 4 drives. Then when questioned, they clairified that to be as follows:

Q: I’m curious what you mean by ‘A RAID1 x 4? – do you mean 1 drive mirrored three times? Or do you mean 4 RAID1’s (= 8 Drives)

A: Raid1 x4 would be a four (4) drive array using the Raid1 standard. 4 total drives, all with the same data – (3 redundant copies)

So... that grad student's simulations found that if you had 4 redundant copies of the exact same data on 4 different drives, the likelihood of all 4 failing at the same time was nearly non-existent. (It took a Graduate level education to come to that conclusion?)

BRiT · September 10, 2010

Now if those 4 drives had been the old IBM DeathStars, the likelihood of them dieing was 100%.

With that said, I still have a 60 Gig IBM DeathStar with 53K power on hours on it that still passes all the SMART short/long/offline tests. Though it did report 3 UNC errors in the past around the 43K hours. More amazingly was after a full preclear_disk.sh it reported the following. I guess there's always exceptions to the normative.

Model Family:     IBM Deskstar 40GV & 75GXP series (all other firmware)
Device Model:     IBM-DTLA-307060
Serial Number:    YQDYQFXD358
Firmware Version: TX8OA50C
User Capacity:    61,492,838,400 bytes

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   060    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   253   253   024    Pre-fail  Always       -       136 (Average 55)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       2701
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       4
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   093   093   000    Old_age   Always       -       53451
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1712
192 Power-Off_Retract_Count 0x0032   098   098   050    Old_age   Always       -       2701
193 Load_Cycle_Count        0x0012   098   098   050    Old_age   Always       -       2701
194 Temperature_Celsius     0x0002   171   171   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       4
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       3

shawn · September 12, 2010

RAID 1, no matter how many copies, is subject to the possibility of human error, hardware error, virus, natural disaster, etc. The internet is filled with stories of those who care enough about their data to build one big "server" and, in the process, destroying the thing the server was supposed to protect. I would not bet on one RAID 1 x4 server, let alone an UnRAID server, being infallible in practice, at least not in my hands. One French bank lost important data archives in the 90's despite having tape backups because a fire burned down the building, where the tapes were.

unRAID mention in RAID reliability review

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived