Article: Why RAID 5 stops working in 2009


bubbaQ

Recommended Posts

http://blogs.zdnet.com/storage/?p=162

 

The gist is that once you have 12 TB of data, and you have a DRIVE fail, during the rebuild, you are likely to have a read error somewhere in the 12TB because drive RELIABILITY has not increased in step with drive CAPACITY.

 

When a read error occurs on a second disk during a rebuild after a disk failure in RAID5, your RAID5 array is kaputnick.

 

In unRAID, I believe the rebuilding drive would be restored, up to the point of the read failure. 

 

But I am curious what would happen afterwards.  Does unRAID have an ability to continue and rebuild the rest of the failed drive?  If so, that would be a bug plus, as you would lose only one sector of the failed drive (since that sector could not be rebuilt from parity) and the unreadable sector of the non-failed drive that could not be recovered.

 

BTW, this is one reason I do not support the idea of replacing smaller drives in unRAID by replacing them with larger ones and letting them rebuild.  I prefer to put the new, larger drive in the system, copy the data, then stop the array and remove the old drive.

 

Which generates a similar question... what does unRAID do during a parity rebuild, if a read error occurs on a disk?

Link to comment

OK.  I read the article.  I'm not going to suggest that the author doesn't give us something to think about.  But, come on, his article is so full of assumptions that must be taken for fact that I have a hard time accepting the reality that what he's stating must be so in all (or even most) scenarios. 

 

e.g. "As we now know, a single disk failure means a second disk failure is much more likely."  Do we REALLY know that????

 

Says who??? The math??  Now, if the fate of one drive were intrinsically determined by the prior fate of another drive in an array, then fine.  However, were making a HUGE assumption of causality here. (again the fate of one drive directly impacts the fate of another drive in the array).  Sorry, I'm not buying the argument.

 

What I do buy is that this is all about reducing risk and that as drive capacities increase so does the rate of failure (purely a technology issue that may be resolved in the future) and that RAID technologies to protect our data aren't currently keeping up at the same pace.  I get it.

 

I don't pretend to understand all the numbers and calculations that are presented, however I can't help but to think that something about the logic is flawed (or, at least, incomplete).

Link to comment

I too have read the article. Its obviously written to get a reaction and TBH thats a good thing.

 

The one thought that I keep coming back to is the relation between speed of rebuild and size of array. With the relatively slow speed of unRAID parity creation or disk recovery and realistically we are looking at the average user having 10TB+ arrays within 3 years the chances of a failure within the unprotected period increases. Disk size is increasing WAY faster than parity speed.

 

That combined with the fact that during these periods the disks are thrashing their asses off increasing the chance of failure.

 

The real probability is beyond me but it is my real worry.

Link to comment

BubbaQ - my guess would be that unRaid would complete the rebuild in spite of the error. The drive error count would be incremented and the sector # written to the log. I don't have any specific experience so this is just an guess based on the ways other things work and it is consistent with Tom's overall philosophy of minimizing data loss.  I hope Tom will confirm or deny as I have never heard of any user experience where this scenario has occurred. 

 

I am a big proponent of monthly parity checks as a way to minimize the chances of sector errors at times when the array is unprotected. I don't know much about RAID5 and if they internally do something similar. My personal experience has been that bad sectors tend to crop up in areas of disks that I access the least. Forcing every sectoer to be read has worked so far in keeping my array healthy.  Besides keeping backups of our entire array contents, which is impractical for most of us, it is the best preventative maintenance we can do.

Link to comment

Mathematically, there is no relationship that increases the probability of a second drive failing after the first.

 

But in practice, there is a relationship, in that the drives in an array are (again, in practice) usually similar in age, environment, and usage.... and age, environment, and usage are factors in drive failure.

 

The gist is that 12 TB = about 1x10^14 bits... and unrecoverable read error rate for modern drives is about 1x10^14.... so he claims that means there are even-money odds that you get a read error for every 12 TB of data you read.  There are some significant mathematical errors in that simplification that I won't bore you with.... but he is simply wrong.

 

But the CONCEPT of the article is valid.... larger array sizes will make such a scenario more likely, because drive capacities grow much faster than drive reliability, and a more graceful handling of a read error on an array should be available.  Would you rather lose one sector, or lose the whole array?

 

Personally, I'd like to be told what sector it is, and what file is using that sector.  Now on a conventional RAID array, the filesystem is usually transparent to the array software and it can't do this... but in unRAID, the unRAID management interface COULD do this.  unRAID could even keep a list of "corrupted" and unrecoverable files and report them to you at the end of the parity check/rebuild.

Link to comment

Interesting topic, thanks for bringing it up.

 

That was a very interesting article.  It does contain some merit and food for thought.

I've yet to encounter the situation.

I found it interesting on how a date is chosen as a point in time. In reality, it is size vs MTBF and BER on devices.

 

In some sense it makes me think unRAID should have multiple smaller arrays.

It should do this by dividing the larger array into smaller chunks protected by individual parity drives. 

It was interesting in how the author states RAID6 is not much more protection then RAID5.

 

A comment by a reader makes allot of sense

the proper procedure for raid maintenance is to 1) always have a cold spare and 2) always replace failed drives with a new drive, and return the old drive for warranty maintenance.

 

This is how I've handled things in the past.

 

What I would like to add is, the chance of multiple drive issues is increased greatly when you purchase multiple drives at the same time.

I've seen this issue over and over again, not just with RAID5. 

In our web server farm, if too many drives were bought from one vendor at the same time. We would see them fail with a pattern.

My suggestion, which worked out well, was to choose multiple vendors and to spread the purchase over time.

 

In some sense, it's interesting to see the warnings of building large arrays with unRAID.

 

Points to consider in the unRAID community,

 

Multiple drive purchases at one time... (as explained above)

 

Using old drives to gain every last ounce of use out of it. This is one fo the great things about unRAID. I can use a hodge podge of old drives. Yet, chances of multiple drive failures are greatly increased when trying to get every last ounce of life out of it.

 

Periodic parity checks - I'm a  proponent of this procedure also. In fact I think it should be added as a standard unRAID feature allowing scheduling just like the mover script.  There are raid cards that do this automatically.  A new term to me, called "BIT ROT" makes sense here. Sectors go bad, periodic scans and checks will help you become aware of potential issues allowing you to be proactive, not reactive.

 

SMART monitoring -  I'm a  proponent of this also, again,  I think it should be added as a standard unRAID feature.

 

Cold or Warm Spare - This is a budget concern, but the faster you replace a disk, the less likely you will see a multiple drive failure. This does not save you from bit rot. LOL.. Periodic parity checks are the only way around that one!

 

 

Some questions to ponder...

 

>> With the relatively slow speed of unRAID parity creation or disk recovery.

Is it really that slow.. You have to consider how many spindles there are involved, then bus speed.

Parity check is reasonable if you have drives arranged on efficient buses. Parity create is slower because there is a write involved.

 

>> That combined with the fact that during these periods the disks are thrashing their asses off increasing the chance of failure.

and in this case, If the array is idle from use and just creating parity, are they really thrashing, or just reading sequentially.

To me thrash is significant inefficient movement, Yet during a parity check/create, access is efficient as long as nothing else is causing movement in the machine.

 

>> e.g. "As we now know, a single disk failure means a second disk failure is much more likely."  Do we REALLY know that?

I agree. I've always questioned this way of thinking. It's not a given as so many people will repeat... Yet the chance of impending failure is greatly increased for drives purchased at the same time and within the same batch of production. I've seen this countless times over the years. One bad batch can wreak havoc.

 

>> this is one reason I do not support the idea of replacing smaller drives in unRAID by replacing them with larger ones and letting them rebuild.  I prefer to put the new, larger drive in the system, copy the data, then stop the array and remove the old drive.

 

What is the difference here?

Do you feel the rebuild is susceptible to bit rot and you will not be able to reconstruct the new drive reliably?

If the drive has been removed and replaced, you still have the old data on the old drive.

If before replacement, a parity check is executed, then chances of reading every drive is higher.

if a drive is swapped with a larger drive, and the build fails, you have the chance of putting the old drive back and potentially accessing the second failed drive virtually.

Just some thoughts.

So what does adding a new drive to the system buy you vs swap replacement?

A new drive requires parity calculation to be execuited at siome point.

removal of the old drive requries parity calculation. (of course these can be combined into one step)

 

Link to comment
Personally, I'd like to be told what sector it is, and what file is using that sector.  Now on a conventional RAID array, the filesystem is usually transparent to the array software and it can't do this... but in unRAID, the unRAID management interface COULD do this.  unRAID could even keep a list of "corrupted" and unrecoverable files and report them to you at the end of the parity check/rebuild.

 

This is a very interesting concept. I remember reading articles whereby an LBA error reported via SMART could be translated into numbers usable by a filesystem debugger for locating the bad file.  In fact the article also mentioned using the badblocks program to force writes over the bad blocks, thus causing reallocation.

 

My friend and I were discussing this yesterday. He mentioned whenever he sees a drive starting to go bad, he takes it offline and runs shred on it, forcing writes to the whole drive, thus reallocating bad sectors.  Not the best approach, but It's saved a few drives of mine at times.

 

Link to comment

Has capacity increased more than reliability?

Is he not using URE out of context?

 

 

BTW, this is one reason I do not support the idea of replacing smaller drives in unRAID by replacing them with larger ones and letting them rebuild.  I prefer to put the new, larger drive in the system, copy the data, then stop the array and remove the old drive.

 

Say you replace a data drive and another drive (not the parity) fails during the 'rebuild' of the new drive. Can we put the old drive back in and then force Unraid to recognize parity as correct using the restore button and 'mdcmd set invalidslot 99'? Will this not then get us back to a position where the failed drive could be rebuilt?

 

I'd guess the best way would be to move the data off the drive then write 0's to the drive you want to remove. Remove the drive and then use the restore button and 'mdcmd set invalidslot 99' to remove the drive from Unraid. That should keep the array protected during the whole process.

 

Peter

 

Link to comment

I'd guess the best way would be to move the data off the drive then write 0's to the drive you want to remove. Remove the drive and then use the restore button and 'mdcmd set invalidslot 99' to remove the drive from Unraid. That should keep the array protected during the whole process.

 

What's needed is a graceful way to remove a drive from the array. I would prefer to remove a drive by zeroing out it's parity slot without writing to the associated drive.  There's gotta be a way to read the parity drive, pretend slot(drive) number X is 0 and re-write the parity sector.

 

Say you replace a data drive and another drive (not the parity) fails during the 'rebuild' of the new drive. Can we put the old drive back in and then force Unraid to recognize parity as correct using the restore button and 'mdcmd set invalidslot 99'? Will this not then get us back to a position where the failed drive could be rebuilt?

 

In theory this could work, in practice we would have to find out (and be careful about it).

I think it's one of the reasons to do a parity check before any drive replacement operations take place.

 

 

Link to comment

Related to this, something we're constantly asked is, "can you increase the number of disks in the array to X?"  Many have asked for an X of 20, some 24.

 

What we are going to do is create a Pro-only feature that raises the array width max up to 32, but also includes the ability to configure a "Q-parity" drive (like RAID-6).  This drive will have similar size restriction as Parity drive, that is, just be larger or same size as any data disk.

 

In this system (as in a RAID-6 system), there will be two redundancy disks, "P" which is the ordinary XOR parity, and "Q" which is a Reed-Solomon code.  This will allow unRAID to recover from any 2 disk errors, with minimal performance impact.

 

  • Like 1
Link to comment

Related to this, something we're constantly asked is, "can you increase the number of disks in the array to X?"  Many have asked for an X of 20, some 24.

 

What we are going to do is create a Pro-only feature that raises the array width max up to 32, but also includes the ability to configure a "Q-parity" drive (like RAID-6).  This drive will have similar size restriction as Parity drive, that is, just be larger or same size as any data disk.

 

In this system (as in a RAID-6 system), there will be two redundancy disks, "P" which is the ordinary XOR parity, and "Q" which is a Reed-Solomon code.  This will allow unRAID to recover from any 2 disk errors, with minimal performance impact.

 

 

How can we utilize that many discs, especially on the PCI-e bus, not to mention cases that would fit that many discs?

Link to comment

How can we utilize that many discs, especially on the PCI-e bus, not to mention cases that would fit that many discs?

 

It's not the discs so much the issue as it is the case to house them.

Supermicro has a case to house 24 Disks. Then there are external cases via Multilane or port multiplier connections.

24 Discs is easily feasible, more then that would take some effort.

 

I was planning on using the norco case, but I've decided to use my CM Stacker with 4 5in3 modules and remove the top USB/Power unit (and rig something into the back). If I put in eSATA controllers each port is capable of 5 disks.. you can see how far the possibility could grow. Is it practical, probably not, I think when the disks start expanding past a case, you run the risk of cables issues, thus mutlitple drive failure issues.

Link to comment

Related to this, something we're constantly asked is, "can you increase the number of disks in the array to X?"  Many have asked for an X of 20, some 24.

 

What we are going to do is create a Pro-only feature that raises the array width max up to 32, but also includes the ability to configure a "Q-parity" drive (like RAID-6).  This drive will have similar size restriction as Parity drive, that is, just be larger or same size as any data disk.

 

In this system (as in a RAID-6 system), there will be two redundancy disks, "P" which is the ordinary XOR parity, and "Q" which is a Reed-Solomon code.  This will allow unRAID to recover from any 2 disk errors, with minimal performance impact.

 

 

How can we utilize that many discs, especially on the PCI-e bus, not to mention cases that would fit that many discs?

I think that is the point... Most people will reach hardware limits LONG before reaching the software limit on the number of disks.  I can't even imagine how long an all IDE array might take to compute parity with 30+ disks. (possibly days)  It might not be practical even with an all SATA array

 

An equally attractive possibility, for those who can handle the hardware, is multiple logical "arrays" in a single case.  Each array having its own parity disk.  At least with that you can compute parity on a smaller set of drives, and a single disk failure affects only needs to query half of the drives to reconstruct the missing data.

 

The true issue for many of us is if the array, when in a degraded state, is still able to serve the media we have stored on it at a rate fast enough to be usable.  I'm thinking you would have to have a very good set of hardware to be able to serve a full bitrate ISO image from a failed disk on an array with 30+ disks.

 

Tom mentioned "multiple logical arrays"  as a possibility long ago...

 

Joe L.

Link to comment
The true issue for many of us is if the array, when in a degraded state, is still able to serve the media we have stored on it at a rate fast enough to be usable.  I'm thinking you would have to have a very good set of hardware to be able to serve a full bitrate ISO image from a failed disk on an array with 30+ disks.

 

This a factor of the controllers and quality of drives present.

Reads are done from multiple drives/sectors to rebuild the failed sector.

Sort of like a stiripe, so performance shouldn't be that much of an issue.

There are guys on the AVS forum who have huge arrays using raid5/raid6.

I'm sure unRAID will be able to serve an ISO or two, it's the writes that will suffer.

The issue of practicality boils down to parity generation time and wiring all the drives.

I think removing limits is positive step. It's up to the end user to determine the risk.

The addition of a second Q-parity is a huge plus.

 

The possibility of splitting the arrays with multiple individual parity disks would be a welcomed addition.

I would certainly use it for external units such as a 5 drive external case using a parity drive within that casing.

Link to comment

How can we utilize that many discs, especially on the PCI-e bus, not to mention cases that would fit that many discs?

 

It's not the discs so much the issue as it is the case to house them.

 

The other issue I see is the Power Supply. I got the Corsair 650 thinking this could handle 17 discs. How much of a PS would you need for a 24 or 32 disc system? In theory it sounds great being able to go to 32 discs, but in practice I see this being very difficult to achieve. A 2nd unRAID server seems to make much more sense to me. Thoughts?

Link to comment

How can we utilize that many discs, especially on the PCI-e bus, not to mention cases that would fit that many discs?

 

It's not the discs so much the issue as it is the case to house them.

 

The other issue I see is the Power Supply. I got the Corsair 650 thinking this could handle 17 discs. How much of a PS would you need for a 24 or 32 disc system? In theory it sounds great being able to go to 32 discs, but in practice I see this being very difficult to achieve. A 2nd unRAID server seems to make much more sense to me. Thoughts?

 

There are so many external drive cases out there with many options.

 

Ramping a machine up to 32 drives is not that difficult if you get past the single case thought.

There are port multipliers or multi-lane connectors whereby SATA paths are combined into a single cable.

 

For example:

http://www.newegg.com/Product/Product.aspx?Item=N82E16816133007

 

Other options

http://www.caloptic.com/cgi-bin/quikstore.cgi?product=sBOX-X&detail=yes

http://www.datoptic.com/cgi-bin/web.cgi?category=RACKMOUNT_CHASSIS

http://www.computervideogear.com/sata/sata-enclosure-sr-hd-pro.htm

I've seen as large as 15 Drive tower external sata units.

 

FWIW, It's possible, I know it's expensive and may not be worth it for some, I'm not saying it's always practical.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.