How does dual parity work, actually?

shEiD · July 14, 2017

I am still kinda confused, how does dual parity work. Does it really mean, that having unRAID use dual parity I could recover from ANY 2 drives failing?

Like they say, a picture is worth a thousand words...

Edited July 14, 2017 by shEiD

JorgeB · July 14, 2017

Does it really mean, that having unRAID use dual parity I could recover from ANY 2 drives failing?

Yes

shEiD · July 14, 2017

@johnnie.black thanks for the answer.

So what black magic is stored on those 2 parity drives? Any links where I could read up on that?

I mean how does only 2 drives store the info on whats in 28 drives?

Oh, and btw, the latest version drive limits are still the same as in the 6.2 Release Notes, that is 2 parity and 28 data drives?

JorgeB · July 14, 2017

There's no magic:

https://wiki.lime-technology.com/Parity#How_parity_works

shEiD · July 14, 2017

@johnnie.black Thanks

itimpi · July 14, 2017

59 minutes ago, shEiD said:

@johnnie.black thanks for the answer.

So what black magic is stored on those 2 parity drives? Any links where I could read up on that?

I mean how does only 2 drives store the info on whats in 28 drives?

Oh, and btw, the latest version drive limits are still the same as in the 6.2 Release Notes, that is 2 parity and 28 data drives?

Your comment about the data drives seems to be a standard misconception about parity. There is no actual data stored on a parity drive. Instead there is the information that can be combined with ALL the other drives to reconstitute the sector contents of a failed/missing drive. It is the fact that ALL the non-failed drives are involved in recovering a failed drives is what limits the number of failed drives that can be recovered to be equal to the number of parity drives. The link given explains some of the mathematics around how this is done and also that the mathematics used for the second parity drive is different to that used for the first one. This also explains why monitoring the SMART Pending Sectors attribute is important as it implies a sector cannot be read reliably even though the drive appears to otherwise be OK. This would, however, affect the ability to reconstitute the sector accurately on ANOTHER drive if it it has failed.

SSD · July 14, 2017

@shEiD @johnnie.black @itimpi

Oh how we love to be comforted!

While it is true that the mathematics show you are protected from two failures, drives don't study mathematics. And they don't die like light bulbs. In the throes of death they can do nasty things, and those nasty things can pollute parity. And if it pollutes one parity, it pollutes both parties. So even saying single parity protects against one failure is not always so, but let's say it protects against 98% of them. Now the chances of a second failure are astronomically smaller than a single failure. And it does not protect in the 2% that even a single failure isn't protected, and that 2% may dwarf the percentage of failures dual parity is going to rescue. I did an analysis a while back - the chances of dual parity being needed in a 20 disk array is about the same as the risk of a house fire. And that was with some very pessimistic failure rate estimates.

Now RAID5 is different. First, RAID5 is much faster to kick a drive that does not respond in a tight time tolerance than unRaid (which only kicks a disk in a write failure). And second, if RAID5 kicks a second drive, ALL THE DATA in the entire array is lost. With no recovery possible expect backups. And it takes the array offline - a major issue for commercial enterprises that depend on these arrays to support their businesses. With unRaid the exposure is less, only affecting the two disks that "failed", and still leaving open other disk recovery methods that are very effective in practice. And typically our media servers going down is not a huge economic event.

Bottom line - you need backups. Dual parity is not a substitute. Don't be sucked into the myth that you are fully protected from any two disk failures. Or that you can use the arguments for RAID6 over RAID5 to decide if dual parity is warranted in your array. A single disk backup of the size of a dual parity disk might provide far more value than using it for dual parity! And dual parity only starts to make sense with arrays containing disk counts in the high teens or twenties.

(@ssdindex)

shEiD · July 14, 2017

@itimpi Methingks, maybe I just phrased it wrong. I do understand, that parity does not store any regular data, but recovery stuff. And I understand about the math of how many drives can be restored. I was only interested in HOW dual parity works. With single parity and XOR is very simple. With dual - that's what I was interested in. I will need to read up on Reed-Solomon, but the math will probably be above my head.

@bjp2006 I will be running an array with 28 data drives. I wish I could do more Or better still, I hope unraid will implement multiple pools and hence no drive limit as a result. I could easily connect 70 drives to my server with my current setup. And I can't afford to build another server atm, so I'll be stuck with 28 drives, I guess, unless I could run multiple unraids on ESXi...

As for your comments on dual parity. With28 drives in a pool, I would feel uneasy with single parity. Or am I wrong?

As for recovery - how does unraid react, if additional drive (above the tolerance) goes bad while recovery is in progress? I mean, if another drive just up and dies right there - I understand that recovery at this point is mute. But how about, if during recovery some additional drive gives a single bad/unreadable sector? Does unraid just up and gives up, or recovers everything, except that sector? I mean, if it would complete the recovery, there would be a single "bad / partially-recovered" on the recovered drive(s)? I'm just thinking logically, with basically no experience in unraid, so excuse my ignorance, if I'm talking rubbish.

Squid · July 14, 2017

9 minutes ago, shEiD said:

or recovers everything, except that sector?

This. So you may wind up with a corrupted file or two. Contrasted with RAID5/6 where at that point, you will have lost all your files.

Furthermore, since unRaid doesn't stripe the data, if you happen to have say 3 drives catch fire simultaneously, then you will only have lost the files stored on those drives. And if of those 3 drives catching fire at the same time, if 2 of them were parity drives then you would only lose the files stored on the single data drive.

Edited July 14, 2017 by Squid

shEiD · July 14, 2017

@Squid Awesome! And that's why I never did hardware raid, or why I'm reluctant to use Freenas...

All I need for that is Dynamix File Integrity plugin or is tehre something better?

Squid · July 14, 2017

4 minutes ago, shEiD said:

@Squid Awesome! And that's why I never did hardware raid, or why I'm reluctant to use Freenas...

All I need for that is Dynamix File Integrity plugin or is tehre something better?

FIP will let you know about any corruption, but won't fix it. Nothing beats backups of the irreplaceable data. Side Note: My secondary server is still using 10 year old drives, all files have checksums on it, files are checked on a rotating basis every week, and have never had a case of bitrot turn up. YMMV

Edited July 14, 2017 by Squid

JorgeB · July 14, 2017

46 minutes ago, shEiD said:

With dual - that's what I was interested in. I will need to read up on Reed-Solomon, but the math will probably be above my head.

If you want to read:

https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf

Frank1940 · July 14, 2017

33 minutes ago, Squid said:

.... have never had a case of bitrot turn up ...

If you ever find a case of bit rot, PLEASE post up all about it. Personally, I think it is about as likely to occur as an asteroid the size of Rhode Island hitting your computer. Modern hard drives expect that not all of the data can be read every time the heads pass over it. They have multiple levels of error detection and correction built-in to accommodate this expectation. If the data can't be recovered through these procedures, they are suppose to respond with a error at that point. (And the read operation becomes much slower as the more data is read and more calculations are made in the attempt to reconstruct the data.) The only way a case of bit rot should occur would be if that were some combination of errors were not detected properly and the correction routines returned incorrect data as the proper data. ....

Edited July 14, 2017 by Frank1940

SSD · July 14, 2017

4 minutes ago, Frank1940 said:

If you ever find a case of bit rot, PLEASE post up all about it. Personally, I think it is about as likely to occur as an asteroid the size of Rhode Island hitting your computer.

I guess it depends on the definition of "bitrot".

If bitrot means that data is returned by a drive that is different than the data that was actually written (due to decay of the signal on the media, not due to something like bad cache in the drive), I agree with you.

But if bitrot is related to the gradual degradation in magnetic signal on a disk making that data harder and harder to read over time, I DO think that indeed does happen - and is, in fact, is happening on a continuous basis. And real "bitrot" is when the degradation reaches a point where the data cannot be reliably read even with the error corrections built into the drive, causing the drive to have a read error.

The question then becomes, at what point has the signal degraded to the point that it becomes unreadable? Is that a year, 5 years, 25 years, 100 years? 1000 years? Is it consistent across the media, or are some spots weaker in magnetic material. Or are some heads less sensitive imprecise in head placement to make a less than perfect signal on the disk harder and harder to read and correct.

When we rebuild a disk, we are in fact re-writing every sector. This, in itself, would help us protect against bitrot. In the old days, Spinrite had a mode where it would do exactly that - rewrite every sector on the disk with exactly the same data.

But I agree we have not seen anything we can ascribe the bitrot in the unRAID forums. So have to believe the timeline for the magnetic decay to be pretty lengthy, at least 7-8 years. But if one were worried about it, they could rebuild each disk onto itself or around its 3rd or 4th birthday, and feel better that the signal is refreshed and bitrot pushed out in time. Who knows, drives may start working harder and harder in the presence of magnetic degradation, prior to true bitrot, triggering retry cycles we are not even aware of, and adding wear and tear to the drive.

(Of course drives that are continuously updated with new files would be less susceptible to bitrot, although perhaps become susceptible to bitburnout, a very different phenomenon! )

SSD · July 14, 2017

1 hour ago, shEiD said:

@bjp2006 I will be running an array with 28 data drives. I wish I could do more Or better still, I hope unraid will implement multiple pools and hence no drive limit as a result. I could easily connect 70 drives to my server with my current setup. And I can't afford to build another server atm, so I'll be stuck with 28 drives, I guess, unless I could run multiple unraids on ESXi...

As for your comments on dual parity. With28 drives in a pool, I would feel uneasy with single parity. Or am I wrong?

I am bjp999, not bjp2006!

With 28 drives I think dual parity is worthwhile, although i would stop short of saying absolutely necessary. Being here for a long time I've seen a lot of issues, and very very very few where dual parity would have helped. The vast majority of issues are related to bad or loose cabling, and worse yet, users that don't recognize that is what is going on and make matters worse. If you have drive cages, solid wiring, monitor smart reports, and periodically replace drives as they get old, I could argue that dual parity is a nice to have and not a necessity. And lacking those things, I'd recommend those things before putting dual parity in place.

And I'd also say 28 drives is a lot. If you have strung together a fleet of aging small drives, and assembled into a large array, I don't recommend it. But if they are relatively large, modern drives it should be fine.

1 hour ago, shEiD said:

As for recovery - how does unraid react, if additional drive (above the tolerance) goes bad while recovery is in progress? I mean, if another drive just up and dies right there - I understand that recovery at this point is mute. But how about, if during recovery some additional drive gives a single bad/unreadable sector? Does unraid just up and gives up, or recovers everything, except that sector? I mean, if it would complete the recovery, there would be a single "bad / partially-recovered" on the recovered drive(s)? I'm just thinking logically, with basically no experience in unraid, so excuse my ignorance, if I'm talking rubbish.

unRAID does not give up, from what I understand (never had it happen myself). It would continue to rebuild on a best effort basis. Unfortunately, even if you knew that it had happened due to a log entry, you'd have not way to know what file was impacted unless you kept checksums.

I'd say this is very different than a RAID rebuild, one error and it stops and invalidates the array. Maybe good for an enterprise where the requirement for absolute data integrity is paramount. But not for a media array, where you'd rather have a randomly corrupted JPG than be pushed into doing a full array restore.

JorgeB · July 14, 2017

As my sig indicates I use dual parity on all my servers that have 12 or more data disks, although 2 disks failing at the same time is rare, it happened to me on 2 occasions during a rebuild, not a complete disk failure, but a few read errors on a second disk during the rebuild making some of the rebuild files corrupt, at the time dual parity was not yet available.

Also feel that checksums are invaluable, not because of bit rot per se, but for when a problem like above happens, or a disk redballing during a write operation, without them there's no way of knowing if or which files are corrupt.

shEiD · July 14, 2017

I understand that Dynamix File Integrity plugin does not fix the data. I meant, that I will need it to know which files have been "badly" recovered, in that case, if some drives above the threshold give read errors. So I could know which files I need to replace, which is easy, as most of them are media.

As for everyone everywhere constantly repeating about having backups... With all due respect... I bought unraid for the simplest reason, that it wastes the least amount of drives for a reasonable protection, and if shit hits the fan - I'm still left with most of my data NOT gone, as in raid or freenas. And most importantly - if I had the money to waste on the drives just for having a backup of 100TB+ media files, I would not be here I would be sitting on my yacht and my personal IT guy would take care of all this.

@bjp999 what are the chances of there being two of you?

Edit:

Oh, I forgot to mention - all of my data drives are/will be WD Reds:

5x WD Red 10TB
15x WD Red 6TB
8x WD Red 3TB
For parity I was thinking, maybe WD Gold 10TB? or Reds would be ok?

On the very first SMART warning, I will be replacing the drive with a WD Red 10TB, or maybe there will be something larger by then. If a 3TB drive fails, I still have 7 of those "left over", so maybe will use those, but any 6TB or 10TB gets replaced with 10TBs.

Edited July 14, 2017 by shEiD

Frank1940 · July 14, 2017

32 minutes ago, bjp999 said:

But if bitrot is related to the gradual degradation in magnetic signal on a disk making that data harder and harder to read over time, I DO think that indeed does happen - and is, in fact, is happening on a continuous basis. And real "bitrot" is when the degradation reaches a point where the data cannot be reliably read even with the error corrections built into the drive, causing the drive to have a read error.

I agree with you here. The point is that the drive should not be giving out bad data (or 'best guess' of what it thinks the data is), it should just refuse to deliver any data that is suspect and give a drive failure error instead. So the issue is 'seen' by the user as a drive failure and, depending on the OS, the message could be as simple as a CRC error.

32 minutes ago, bjp999 said:

The question then becomes, at what point has the signal degraded to the point that it becomes unreadable? Is that a year, 5 years, 25 years, 100 years? 1000 years? Is it consistent across the media, or are some spots weaker in magnetic material. Or are some heads less sensitive imprecise in head placement to make a less than perfect signal on the disk harder and harder to read and correct.

That is the $64,000 question! I have a friend who says that he has 30 year-old home recorded VHS tapes that are as good as the day they were made. And there are other folks who lost their home-made videos of their kids in about ten years. All magnetic recorded media will eventually fail... All of the CD/DVD/BluRay ROM will fail... All of the CD/DVD/BluRay WR media will fail even sooner... I have seen statements that the late 20th century and most of the twenty-first century will be come known to future historians as the century without a written archive because of this problem.

Frank1940 · July 14, 2017

14 minutes ago, shEiD said:

... As for everyone everywhere constantly repeating about having backups... With all due respect... I bought unraid for the simplest reason, that it wastes the least amount of drives for a reasonable protection, and if shit hits the fan - I'm still left with most of my data NOT gone, as in raid or freenas. And most importantly - if I had the money to waste on the drives just for having a backup of 100TB+ media files, I would not be here...

You might want to read this:

As a further comment on Backups, I would point out that most thoughtful people only backup (offsite) those files which would be impossible to replace by any method. That would be personal financial records, personal photographs, and other items such as this. Most media files are available from other sources and while obtaining them be time-consuming, it is doable.

I would rather doubt that you have a 100TB of personal media files that you have taken unless you could afford that yacht. BUT I would bet you probably have at least 300GB of personal files that you could never replace in the case of some catastrophic event that involved the building where your server is. It is those files that you should be most concerned about developing a backup strategy to safeguard against loss that you are comfort with.

shEiD · July 14, 2017

@Frank1940 Yep, exactly - a couple of TBs of data needs to be backed up, but that's it. All the other stuff would take a very long time to get back, but it's doable.

And then there's grey area, for example - about 5 years ago I lost a 2TB drive filled with TV Shows. That was a "special" in a manner that all the shows were "short-lived", as in cancelled after 1 season or even after a couple of episodes. To this day, more than half of the stuff from that drive - nowhere to be found. Some of it I found, but in a way worse quality, that I had before. So that one hurt

SSD · July 14, 2017

If instead of dual parity, if you loaded up a disk of exactly that size with your most important data, and put that in a safety deposit box or even at a friend or family member's home, you'd have made more valuable use of that space than dual parity IMO.

Of course if you have that or equivalent already, then dual parity is not a bad idea for a 28 disk array!

tdallen · July 14, 2017

With a 28 drive array, I would guess that there is a high likelihood that you would wind up having multiple disks from the same manufacturer, of the same size, purchased around the same time, from the same manufacturing lots. It's undesirable but seems inevitable in an array that large - and the corresponding slightly higher risk of multiple simultaneous failure is all I'd need to rationalize dual parity.

It's probably been mentioned but is worth repeating - neither parity nor dual parity is going to save your bacon when a drive fails. Parity (single or dual) plus the fact that the rest of your array is healthy and capable of contributing to a rebuild is what saves your bacon. Dual parity buys you a little extra safety buffer, but you still need to stay vigilant about drive health.

SSD · July 14, 2017

1 hour ago, shEiD said:

Oh, I forgot to mention - all of my data drives are/will be WD Reds:

5x WD Red 10TB

15x WD Red 6TB

8x WD Red 3TB

For parity I was thinking, maybe WD Gold 10TB? or Reds would be ok?

On the very first SMART warning, I will be replacing the drive with a WD Red 10TB, or maybe there will be something larger by then. If a 3TB drive fails, I still have 7 of those "left over", so maybe will use those, but any 6TB or 10TB gets replaced with 10TBs.

10TB drives are pricey. Maybe you already have - but might suggest looking at the 8T Seagate Archives. Shucking externals would get you to $180 each. Especially when you start thinking about the cost of single vs dual parity, it matters. Perhaps that could save enough money to buy 8x 8TB drives instead of 5 10T, and keep those 8 3T drives for backups. That would bring your drive count down from 28 to 23, allow you to backup 24T of data, for about the same price.

HGST 8T are another option. HGST is the reliability king. A lot of people here love the Reds, but my experience has not been that stellar.

shEiD · July 14, 2017

@bjp999 All the drives that in that list I already have, and then some. I thought about getting WD Red 8TB drives before my last purchase, but got those 10TB WD Reds. The problem lies with unRAID having a 28 drive limit in the array. Without buying some 10TB drives, I would not have been able to even fit all my data in one unRAID to begin with, as my server now has 36 drives connected. Not to mention that I have a bunch of offline drives full of stuff, that just sit in a drawer. Would love to get that data online too, but will see how it goes.

I have 40+ WD Reds, the oldest of them being 3TB drives maybe 4 years old? Only one WD Red "died" on me. But then a couple of months later I connected it to windows machine - and it's OK. Go figure.

On the other hand, I have only 2 HGST 4TB drives, about the same age (~4 years). One just gave me reallocation warnings, like 3 days ago Moved all data out and will remove from server on next reboot.

As for the Seagate Archive drives, are you talking about shingled ones? Nah... Maybe as backup, but I have plenty for backup as it is. I mean for backing up just the important bits. No point in backing up all the media.

At the end of the day, after having so many drives die on me over the years, I just feel kinda good about WD Reds, at least for now. /me knocks on wood. I'm sure you know what I mean. By now I am willing to pay a little extra for a drive, if I am reasonably sure it will be more reliable. Of course nowdays - any drive is lottery.

Also, less (bigger) drives - less points of failure. Right?

@tdallen Yes and yes. Many "batches" of drives bought at the same time. I had 2 drives fail in an ~24-36 hour period twice over the last 5 years. That's why dual parity was the thing that brought me back to trying unraid. I don;t even remember, when I bought a Pro license. I know I tried unraid for short periods of time like 2-3 times before. But never switched to it. But dual parity, plus Docker, plus VM with hardware passthrough - now we are talking completely different unraid

Although, the migration is gonna be f***in nightmare. 100TBs+ of data in a StableBit DrivePool. That means - all the files scattered all over the drives. The main reason, why I am not on unraid already. I'm getting depressed even thinking about it

Edited July 14, 2017 by shEiD

Frank1940 · July 14, 2017

31 minutes ago, shEiD said:

Also, less (bigger) drives - less points of failure. Right?

Well, from what I have seen from the BackBlaze data that seems to be the case. The larger capacity drives seem to have approximately the same failure rates as smaller capacity drives. But the spread of the actual failure percentages (between both manufacturers and individual drive models) might be concealing some real difference. (Remember, we are dealing with very low failure rate rates in most cases anyway.) However, if there is any significant statistical difference, it would be less than a couple of percent at the absolute maximum.

There is another factor to be considered. BackBlaze appears to retire their servers on a regular basis as they fill-up with data. Apparently, they scrap all of the drives in the server at this point. While most of us don't do this, many folks have thought about the problem of small drives, limited case space for additional drives and the relentless increase for additional storage space. One approach is to have a large capacity parity drive(s) and to replace the older smaller data hard drives when the requirement for additional storage becomes necessary. As a part of this plan, they also replaced any failed drive with a drive that is at least as large as the parity drive. This way, they avoid having an array filled with drives nearing the end of the 'bathtub' curve and minimize the number of drives in the array at the same time.

Edited July 14, 2017 by Frank1940

How does dual parity work, actually?

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

SSD

shEiD

itimpi

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation