99.999% reliability


jumperalex

Recommended Posts

Five 9's is a VERY ambitious target, as you can easily see from these articles.    I've worked in organizations where the goal was 4 9's, and we spent many millions to achieve that level of reliability.  Technology has evolved quite a bit since then, so it's a bit easier to add that extra 9 -- but it's still clearly an expensive proposition.

 

For personal use, I'd settle for dual parity in UnRAID  :)

 

Link to comment

Even more interesting is the paper. http://arxiv.org/ftp/arxiv/papers/1501/1501.00513.pdf

 

If you have a lot of data, even unlimited spares will not protect you.

 

Still think single parity is good?

 

yeah that's what I really meant. I didn't want to direct link to a .pdf

 

Well if you have a lot of data it means multiple independent arrays, not just a bigger array.

 

As for single parity [shrug] I mean I'm not sure we need 99.999% uptime with zero human intervention. Most of us have the luxury of taking down our arrays (ignoring lack of hot-swap mandating a shutdown) to deal with failures. In fact I'd say we all do because we aren't losing $millions a minute.

Link to comment

While you can take an outage, for me it is more about data loss events. Which the paper shows are going to happen with even RAID6 and unlimited spares. See Table II.

 

Data loss is always a risk, even at 99.999% :)

 

But their main point / goal is what would it take to get that AND do it with no intervention. If you are willing to accept human intervention your reliability and data integrity is much easier (structurally) to obtain especially when you are dealing with smaller arrays. And a willingness to take the array down means your spare failure rate can be assumed to be lower which will make quite a difference.

 

Mind you I'm in no way intending to argue against the need for dual parity. The article makes that clear enough ... as if we didn't already know it anyway :o

Link to comment

While you can take an outage, for me it is more about data loss events. Which the paper shows are going to happen with even RAID6 and unlimited spares. See Table II.

 

Data loss is always a risk, even at 99.999% :)

 

But their main point / goal is what would it take to get that AND do it with no intervention. If you are willing to accept human intervention your reliability and data integrity is much easier (structurally) to obtain especially when you are dealing with smaller arrays. And a willingness to take the array down means your spare failure rate can be assumed to be lower which will make quite a difference.

 

Mind you I'm in no way intending to argue against the need for dual parity. The article makes that clear enough ... as if we didn't already know it anyway :o

 

Are we reading the same material? Here is the First paragraph:

 

"Abstract

—As the prices of magnetic storage continue to

decrease, the cost of replacing failed disks becomes

increasingly dominated by the cost of the service call itself. We

propose to eliminate these calls by building disk arrays that

contain enough spare disks to operate without any human

intervention during their whole lifetime. To evaluate the

feasibility of this approach, we have simulated the behavior of

two-dimensional disk arrays with n parity disks and n(n–1)/2

data disks under realistic failure and repair assumptions. Our

conclusion is that having n(n+ 1)/2 spare disks is more than

enough to achieve a 99.999 percent probability of not losing

data over four years. We observe that the same objectives

cannot be reached with RAID level 6 organizations and would

require RAID stripes that could tolerate triple disk failures."

 

Human intervention only makes things worse. My days are filled with the affect of human intervention on storage systems. The data in table II shows, no number of spares will reach the goal, so spare failure rate has nothing to do with it.

Link to comment
  • 2 weeks later...

Any one person, myself included, can anecdotally observe amazingly high reliability. But as a matter of practice, to design a system that is statistically likely to have an expected reliability of 99.999% reliability is very difficult as the study shows as does practical experience just reading this forum or knowing "the biz".

 

UnRaid is no where even close to achieving such a thing in so much as it would be legally and morally suspect for them to claim it. But that is OK because they aren't claiming it and we aren't using it in the hopes that it does it anyway. If you NEED that level of reliability then you will need to pay for it, and it will be worth it, because at that point down time is measured in $Millions lost per unit time if not more and you're not taking the risk on a system with a single, non-hot-swappable parity drive.

Link to comment

High reliability systems are more focused on uptime than specifically on data loss.  No matter how many failures a system can tolerate, there still needs to be a solid backup strategy to avoid loss of data.    Clearly data integrity is also important ... but data can be restored from backups; but you can't transact business if the system is down [whether the business is banking, stock market trading, a managing a critical medical procedure, or managing a key strategic asset].

 

 

Link to comment

Systems with 99,999% uptime regarding dataloss with 0 maintenance have existed a long time.

Books and cave paintings. Songs and anekdotes have shown to have some dataloss, but distribution were fast and cheap :)

 

Anyway, as far as I understand Unraid do not protect against dataloss, it protects against fileloss.

/René

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.