Parity Check problem


Recommended Posts

  • 5 months later...

Can someone explain to me why one would need to check parity on a scheduled basis? Checking implies that it could be wrong, and that would mean my array is NOT sufficiently protected at all! And unRAID promises me a protected array is it not?

 

Correct me if i'm wrong, but the parity is calculated and saved realtime when you write anything to the array, is it not? Or did i totally misunderstand the principles of unRAID's 'protection' and is the parity 'protection' not realtime at all but on a scheduled basis? I sure hope not...

Link to comment

Can someone explain to me why one would need to check parity on a scheduled basis? Checking implies that it could be wrong, and that would mean my array is NOT sufficiently protected at all! And unRAID promises me a protected array is it not?

 

Correct me if i'm wrong, but the parity is calculated and saved realtime when you write anything to the array, is it not? Or did i totally misunderstand the principles of unRAID's 'protection' and is the parity 'protection' not realtime at all but on a scheduled basis? I sure hope not...

The ability to re-construct a drive in the event of a failure relies on ALL the other drives being able to work perfectly.  It is very possible that you might never discover a hardware issue with a single drive, or the inability of a failing component to service multiple concurrent accesses of all the drives unless you periodically test them. 

 

You are correct though in that parity is real time.  However, that does NOTHING to tell your power supply can handle simultaneous IO on all the drives at once. (and that is critical in the event of a drive re-construction)   

 

It is easily possible that a monthly check will access a drive you've not spun up recently. (since unRAID only spins up the drives you are using)

 

So... it is not so much checking parity, but checking your hardware.

Link to comment

Well that raises another question. Since the parity is calculated over all disks in realtime or better, at 'write-time', that means all disks should be spun up if you write something to any disk isn't it? So a single parity calculation forces the system to do simultanious IO over all drives at once.

 

Doesn't that invalidate the need for this hardware check?

Link to comment

Well that raises another question. Since the parity is calculated over all disks in realtime or better, at 'write-time', that means all disks should be spun up if you write something to any disk isn't it?

No, just the parity disk and the disk you are writing.  They are both read first and then written, and in that way all the others can stay sleeping.
So a single parity calculation forces the system to do simultaneous IO over all drives at once.

No, it only does I/O to the two involved drives.

Doesn't that invalidate the need for this hardware check?

No, it does not eliminate the need, since your assumptions on how it works were flawed.

 

If you think about it at the single bit level it will make sense.  unRAID uses even parity, so for a given bit position across all the disks there are an even number of bits set to a 1.

 

We will either have to change the parity bit from a 1 to a 0, or 0 to a 1, or leave it as its existing value to end up with an even number of bits set to a "1" when we write to a data disk bit position.

 

Now... Let's write bit position 1000000000 on one of the data disks to a "1"    Now, if that bit originally was a "1" before writing to it, the parity bit does not need to change, regardless of what it was.  (the number of "1" bits across that bit position did not change.)

 

If that bit on the data drive was a "0", then the parity bit must change (from its current value of 1 to 0, or from a current value of 0 to 1) to keep the total number of "1" bits even.    We do not care about what is on the other data disks, as we are only changing parity to keep an even number of "1" bits based on the single data disk we are writing at that instant.  We know if one of the data disk bits flipped state, the parity bit must also flip.

 

To learn if the original value on the data disk was a 1 or 0, we read it first.  Then we update it based on what we are writing to the data disk.

To learn the current value of the parity disk bit, we read it, and change it if needed, but only if the data bit changed state.

 

unRAID never spins up the other data disks. It can update parity by first reading both the parity disk and the data disk, then writing them both.  It does not care what is on the other disks, as long as it knows there is an even number of "1" bits across the bit position. 

 

Obviously, unRAID works on sets of bits, not individual ones, but you get the idea. (actually, it works on large sets of blocks of bits)

Repeat the same logic a few trillion times and you can re-construct an entire disk.

 

 

Link to comment

My current 'problem' is that i don't have hands on experience with unRAID's prepping. At the time of my previous message i was in the middle of preclearing. Now i find the next step is... creating (!) the parity. I was always under the assumption that once the system was precleared, the parity would build up in a cumulating fashion, when actually writing stuff to the array, but i understand now that there is an initial parity calculation as well.

 

Also , i missed the fact that, having calculated an initial parity, you only need the parity disk and the disk that is written to, to calculate the new parity, so the other disks don't have to spin up. That all makes sense now.

 

I can imagine you would do this parity/hardware check once in a while, for a system that is on 24/7, with most of the drives spun down. But if you regulary boot the machine or wake it from sleep etc, i guess you will find out soon enough if your machine is failing ;)

Link to comment

I can imagine you would do this parity/hardware check once in a while, for a system that is on 24/7, with most of the drives spun down.

Many of us have a monthly scheduled task that automatically does this once a month.  (there is an installable package in unMENU for this, but you can perform it any way you like, and at any frequency you choose)
Link to comment
I can imagine you would do this parity/hardware check once in a while, for a system that is on 24/7, with most of the drives spun down. But if you regulary boot the machine or wake it from sleep etc, i guess you will find out soon enough if your machine is failing ;)

With the low power systems available now, the power draw of a system with the drives spun down is so minimal it doesn't make a whole lot of sense to constantly power it down and reboot it.
Link to comment

I can imagine you would do this parity/hardware check once in a while, for a system that is on 24/7, with most of the drives spun down. But if you regulary boot the machine or wake it from sleep etc, i guess you will find out soon enough if your machine is failing ;)

With the low power systems available now, the power draw of a system with the drives spun down is so minimal it doesn't make a whole lot of sense to constantly power it down and reboot it.

 

Plus there is a significant increase in power usage when you boot your server, so turning it off to save power is partially negated by the fact that you have to turn it on again (spinning up all drives).

Link to comment

With the low power systems available now, the power draw of a system with the drives spun down is so minimal it doesn't make a whole lot of sense to constantly power it down and reboot it.

 

Getting a bit OT but...  ::)

 

The impact of the server depends greatly on how you live...  ;)

 

My new server is not the best system out there but a pretty average 9-drive system perhaps?

It uses 53W with all the drives spun down. (6x140mm fans)

 

That would make 1.272KWh/day where my whole studio flat used 1.22KWh last saturday while I was away for the weekend.

So If I leave my server running when I'm not there (or sleeping) it doubles my base energy consuption.  >:(

 

53W 24/7 for a year = 484KWh/Year = around 25% increase for my energy bill.

Then again... Should you live in a mansion with AC then...  8)

 

If I only use it for a few hours a day it stays shut the rest of the day.  :)

 

 

Link to comment

Now to get even more off topic... booting that same system might pull 280+W for 5 min and then settle down to 180W for 30min to an hour before starting to sleep unused drives... does that power use outweigh the time off?

 

One option some people have done.. PWM fans and scripts to slow or stop the fans when the drives are spun down or based on system/drive temps..

that same 53W system might drop to 30-35W.

 

 

There is no correct answer here..

everyone has a personal preference and need. I have systems that are one 24x7 and those that only on when I need them. most of my systems that are 24x7 are Atoms or sandy/ivy-bridges configured for minimal heat and power consumption. Then to add to that lineup I have ESXi boxes to merge several servers and desktops in to a single system to reduce even more physical machines.

 

 

back on topic. checking parity is sort of like playing it back, you know it was written correctly..

 

one of the best reasons for the automated parity checks is in case of bad/unreadable sectors on a drives. It is not a question of if, but when a drive will have bad sectors or completely die. it is an insurance policy that everything is good. you might have had driveX have a sector or block of sectors die and you have not run a parity check in a while. then driveZ completely dies. You go to rebuild driveZ but driveX has those hidden issues... you have now got a problem... you cant rebuild driveZ correctly at this point...

 

Think of it this way.. you backup your PC, the PC dies.. you go to restore the PC backup.. but, you never tested it and you find that it does not work or is corrupt.. its not a back up until you know you can restore it.

 

 

Link to comment

Offtopic a bit as well but Johnm; do you use any sort of CRC/MD5/SHA-1 verification when copying files to/from your server? This is something else I've been thinking of doing. For now I'm using Teracopy w/ auto verify and that seems to work OK. Obviously monthly parity checks are also still required :-)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.