Jump to content

For fun: how can parity be improved


NAS

Recommended Posts

Been a while since we had a for fun technical thread.

 

Thoughts on how parity can be made better, anything goes.

 

Some old stuff to start off. Multiple parity disks, multiple parity sets containing a subset of disks. Disks outwith parity

 

A new one perhaps... why do we need to spin all disks simultaneously to make parity. Running all disks for a day is the same as fully loading your server for a day. Fully loading a server should be avoided as it increases the probability of something breaking. Since most users have PCI BUS based disks, spinning all disks causes silly slow speeds and not as you would intuitivly think a faster set of disks. Why not add to the parity disk a single data disk at the time. It trades off more parity disk writes for a cooler quieter unstressed server. There are things to take into account like writing to a already processed data disk but thats not insurmountable. There is a small chance the parity creation process might even be faster as you remove PCI bus contention.

 

Next idea: built in parity benchmarking of disks. Picture a page on the GUI that tells you which would be the best disk to use for parity. So many factors come into play for real world disk performance we could make the pick of the parity disk easier for the end user.

 

Keep the ideas coming :)

Link to comment
  • Replies 53
  • Created
  • Last Reply
Since most users have PCI BUS based disks, spinning all disks causes silly slow speeds and not as you would intuitivly think a faster set of disks. Why not add to the parity disk a single data disk at the time. It trades off more parity disk writes for a cooler quieter unstressed server.

 

I like this idea... however, the "unstressed" part is not entirely true.  This would put much greater stress on the Parity drive (read: greatly increase the chances of parity drive failure).  Not a big deal, as you would simply replace the parity drive and rebuild it, but still something to note/think about.

 

Link to comment

Since most users have PCI BUS based disks, spinning all disks causes silly slow speeds and not as you would intuitivly think a faster set of disks. Why not add to the parity disk a single data disk at the time. It trades off more parity disk writes for a cooler quieter unstressed server.

 

I like this idea... however, the "unstressed" part is not entirely true.  This would put much greater stress on the Parity drive (read: greatly increase the chances of parity drive failure).  Not a big deal, as you would simply replace the parity drive and rebuild it, but still something to note/think about.

 

 

Agreed, the parity disk would have writes for every single disk added if you added each a volume at a time.

 

Perhaps what would be nice is a parity generate mode for all disks. and a parity generate mode for a single disk (and a remove option  for a single disk).

 

While entirely possible, has anyone delved into the complexity of the driver to do this?

 

 

 

As far as speeding up parity.  It seems hardware SIL chipset RAID0 helps, but not greatly.

It would be interesting to see if a hardware caching controller would do better.

I've considered this option. if unRAID were to support other disks rather then just SATA, this would be an interesting test.

 

I know the argument is... A caching controller has risks.. Yet some have batteries and if you have a UPS the risks get smaller and smaller.  Combined with the ability to do a parity check/refresh on demand. I think the risk is worth it if I can get a big boost on speed.

 

 

When all is said and done, would it be better to have RAID1 on the Cache drive or spend time with advanced control of the parity drive? (other then the RAID6 style second parity drive option).

 

 

Link to comment

You are obviously correct about the parity disk. However depend on how i look at it the parity disk is either the disk i care about the least (as it only has parity data that can be replaced) or the most (as it has parity data i may absolutely need).

 

Giving it some more thought what i really dont like is the idea of 20 disks spinning to create parity for say 30 hours non stop. No real world unRAID usage even comes close to this... so what we have is unRAID being tasked really heavily just for parity and proportionately very lightly for actual usage.

 

Perhaps if disks could be done in bundles rather than an all or nothing approach this would be a good compromise.

 

Also when adding a drive I would be happy to live with the expense of the array being unavailable for a couple of hours meaning just updating the parity for this one new disk  (and not every single disk again). This logically leads you to wanting to just be able to remove one disk.

 

Then one step further... "please replace this disk with this one". Given the nature of the cap on the number of disks and the ever decreasing costs of disks this one is arguably the most useful. (and could possibly be done without even touching parity calculations)

 

In theory with a bit of though you could make it so that you only need to create a full parity set once.

 

My current test system is based upon the older PCI based official board and Supermicro cards. I estimate creating a full parity set when i hit 20 * 1TB disks on this server as close to 32 hours. That is getting towards being impractical/risky in itself.

Link to comment

Processing one drive at a time, you would be w/o parity protection while this process took place, until the last data disk was procesed.

 

What you would have to have, is a second "temporary" parity disk, that you would build parity on.  Process 1 data disk at a time, XORing the data on each pass.  After all are processed, you could swap-in that disk as parity.

 

The problem is handling writes to disks that have already been processed.... more overhead and more work.

Link to comment

My current test system is based upon the older PCI based official board and Supermicro cards. I estimate creating a full parity set when i hit 20 * 1TB disks on this server as close to 32 hours. That is getting towards being impractical/risky in itself.

 

I suppose you could change your mother board, get off the PCI bus, and use a Seagate 1.5TB for parity and increase your speed too.

 

I think if my parity build took that long I would split it into multiple servers.

 

In any case, I think creating parity by reading all disks (at least once) is paramount to testing your system health.

If you can not operate like that safely, then you can never handle a disk failure or a rebuild.

In this case single disk incremental inserts add to code complexity and the chance that someone will never test a real world failure condition until it's too late.

 

Don't get me wrong here. I want to improve my write speed and parity generation.

I'm wondering if advanced controller support would be easier to add then incremental parity insert.

This would be done at the kernel level with a driver and at emhttp at the application support level.

 

 

 

Link to comment

You are obviously correct about the parity disk. However depend on how i look at it the parity disk is either the disk i care about the least (as it only has parity data that can be replaced) or the most (as it has parity data i may absolutely need).

Actually, if you have a data disk failure, your parity disk is no more important, and no less important than every single one of your remaining disks in the array.  Every single one of the data disks in combination with the parity disk are needed to restore the failed disk.

 

Have a parity drive itself is useless without ALL the data drives used to create it (unless there was only one data drive, in which case, the parity drive is an exact copy of the data drive since unRAID uses even parity. In that case, you can just consider it as a clone and use it as needed.)

Giving it some more thought what i really dont like is the idea of 20 disks spinning to create parity for say 30 hours non stop. No real world unRAID usage even comes close to this... so what we have is unRAID being tasked really heavily just for parity and proportionately very lightly for actual usage.

 

Perhaps if disks could be done in bundles rather than an all or nothing approach this would be a good compromise.

The tradeoff would be that parity would not be valid for any of the drives until the entire set of them were calculated.  It is true though, that when adding disks you could just read the existing parity, all the new disks, and then write the new parity.  If you assume the old parity is correct, there is no need to read all the other data drives.  Also, you would never discover the weird errors that might occur if your hardware was not up to the task  when you need to access all the disks at the same time to either run in degraded mode with a failed drive (very important) or when rebuilding a failed drive. (equally important) 

Also when adding a drive I would be happy to live with the expense of the array being unavailable for a couple of hours meaning just updating the parity for this one new disk  (and not every single disk again). This logically leads you to wanting to just be able to remove one disk.

 

Then one step further... "please replace this disk with this one". Given the nature of the cap on the number of disks and the ever decreasing costs of disks this one is arguably the most useful. (and could possibly be done without even touching parity calculations)

 

In theory with a bit of though you could make it so that you only need to create a full parity set once. 

 

My current test system is based upon the older PCI based official board and Supermicro cards. I estimate creating a full parity set when i hit 20 * 1TB disks on this server as close to 32 hours. That is getting towards being impractical/risky in itself.

Iff you compare the rebuild times with commercial arrays of similar size, I think you will find it pretty decent.  There is no magic to calculating parity, it is entirely about if the calculation is valid or not, and if you have the bandwidth to read and write the disks fast enough.  It is EXACTLY why large array sizes are impractical. (and 20 disks is probably over that limit for some people.)  Tom has increased the number of disks only because the throughput to the disks has increased.  Disk sizes will increase, that will increase parity check/calc times.  No matter what you do, you are still limited by the rotational speed of the disk platters... no way to update parity on a sector you just read until that same sector rotates around under the disk head once more.  All you can to here is read and write in as a large size blocks as you can, so you can read and write to the disks as fast as possible...

 

Joe L.

Link to comment
no way to update parity on a sector you just read until that same sector rotates around under the disk head once more.  All you can to here is read and write in as a large size blocks as you can, so you can read and write to the disks as fast as possible...

This is where an advanced battery backed caching controller comes in. The write is cached and can be consolidated until a real cache flush is needed. I just don't know if it will help more then a hard drive with a huge cache itself.

(other then the controller operates at PCIe speed while the drive operates at SATA Speed.

Link to comment

You are obviously correct about the parity disk. However depend on how i look at it the parity disk is either the disk i care about the least (as it only has parity data that can be replaced) or the most (as it has parity data i may absolutely need).

Actually, if you have a data disk failure, your parity disk is no more important, and no less important than every single one of your remaining disks in the array.  Every single one of the data disks in combination with the parity disk are needed to restore the failed disk.

 

Have a parity drive itself is useless without ALL the data drives used to create it (unless there was only one data drive, in which case, the parity drive is an exact copy of the data drive since unRAID uses even parity. In that case, you can just consider it as a clone and use it as needed.)

 

Yes I agree. However in a situation where a disk has not failed (the vast majority of the server life) the parity disk is the arguably the least important disk since losing it does not constitute any risk by itself of losing data.

 

I am not disagreeing I am just pointing out that I dont really mind how much my parity drive is thrashed.

 

I know you could argue this either way hence my original comment but the essence of my statement is that I personally would prefer not to be thrashing 20 disks for 30 hours as a trade off for thrashing my parity disk longer.

 

 

 

Giving it some more thought what i really dont like is the idea of 20 disks spinning to create parity for say 30 hours non stop. No real world unRAID usage even comes close to this... so what we have is unRAID being tasked really heavily just for parity and proportionately very lightly for actual usage.

 

Perhaps if disks could be done in bundles rather than an all or nothing approach this would be a good compromise.

The tradeoff would be that parity would not be valid for any of the drives until the entire set of them were calculated.  It is true though, that when adding disks you could just read the existing parity, all the new disks, and then write the new parity.  If you assume the old parity is correct, there is no need to read all the other data drives.   Also, you would never discover the weird errors that might occur if your hardware was not up to the task  when you need to access all the disks at the same time to either run in degraded mode with a failed drive (very important) or when rebuilding a failed drive. (equally important)   

 

 

I think validating exisitng parity and adding disks to an existing parity set should be considered two differernt things. Whilst I agree it is simpler to not assume parity is correct and spining all disks for every parity task is one way I dont think you necessarily have to always assume the worst.

 

Or put another way the default should be to assume parity is correct. It almost always is. If it not then thats a whole other thing.

 

Also when adding a drive I would be happy to live with the expense of the array being unavailable for a couple of hours meaning just updating the parity for this one new disk  (and not every single disk again). This logically leads you to wanting to just be able to remove one disk.

 

Then one step further... "please replace this disk with this one". Given the nature of the cap on the number of disks and the ever decreasing costs of disks this one is arguably the most useful. (and could possibly be done without even touching parity calculations)

 

In theory with a bit of though you could make it so that you only need to create a full parity set once. 

 

My current test system is based upon the older PCI based official board and Supermicro cards. I estimate creating a full parity set when i hit 20 * 1TB disks on this server as close to 32 hours. That is getting towards being impractical/risky in itself.

Iff you compare the rebuild times with commercial arrays of similar size, I think you will find it pretty decent.  There is no magic to calculating parity, it is entirely about if the calculation is valid or not, and if you have the bandwidth to read and write the disks fast enough.   It is EXACTLY why large array sizes are impractical. (and 20 disks is probably over that limit for some people.)   Tom has increased the number of disks only because the throughput to the disks has increased.  Disk sizes will increase, that will increase parity check/calc times.  No matter what you do, you are still limited by the rotational speed of the disk platters... no way to update parity on a sector you just read until that same sector rotates around under the disk head once more.  All you can to here is read and write in as a large size blocks as you can, so you can read and write to the disks as fast as possible...

 

Joe L.

 

I understand that the speed is favourable and probably better than most. However I tend to ignore the others when thinking about things i.e. saying its better than someone else doesnt really bother me... whats more intereting is that its as good as it can be in isolation.

 

We all wanted more and more disks in the array but we have to keep in mind that not that long ago my server was the pinacle of officially suportted hardware and awhen it takes 2 days to do something parity there is a finite risk that somethings going to go wrong.

 

 

p.s. its a pita quoting on quotes do think i will be doing that much :)

Link to comment
I personally would prefer not to be thrashing 20 disks for 30 hours as a trade off for thrashing my parity disk longer

 

The disks are not "thrashing"... they are engaged in a sequential read of the entire disk from beginning to end.

 

30 hours of sustained, sequential reads of a disk is in no way stressful for a modern disk, nor should you try to jump through hoops to avoid it.

Link to comment

I personally would prefer not to be thrashing 20 disks for 30 hours as a trade off for thrashing my parity disk longer

 

The disks are not "thrashing"... they are engaged in a sequential read of the entire disk from beginning to end.

 

30 hours of sustained, sequential reads of a disk is in no way stressful for a modern disk, nor should you try to jump through hoops to avoid it.

 

I tend to agree with this. I've had disks spinning for years without failure.

 

I think there is a tendency to retire older disks to an unRAID array. It's confidence in these older disks that are the issue.

 

In addition, I've said this once and I'll say it again, the chances of multiple disk failure are so much higher when you buy spindles from the same vendor at the same time.

 

 

I'm wondering where this thread is going. What in the parity itself needs to be improved the most.

 

Speed?  Build? Redundancy (i.e. 2 parity disks.. not just a raid 1 copy, like RAID6 so you can handle multiple drive failure).

 

Link to comment

In my opinion, the Parity drive is the MOST important drive in my entire system.  This drive gives me the ability to rebuild any other single failed drive  if something were to happen.  If building Parity causes a drive to fail, I will send it out for recovery if the data is that important.  If building Parity causes the system to fail because of some faulty component, I would rather know now instead of when I am trying to rebuild the drive from Parity.  I would like to see the Parity drive receive the highest priority, not the lowest.

 

In fact, Parity is so important to heavy users and businesses that the main focus of this thread should be how can we help Tom add a second Parity drive while maintaining reasonable performance.  In general it is recommended anyone running in a normal RAID environment with more than 7-8 disks should be running RAID6 or dual parity drives.  I am about to cross that 8 disk threshold and would love the option of adding a second Parity drive.

 

The Parity process we have today works great.  Sure there are some graphical or minor improvements (outside of write performance) that could be tweaked, but I would really like to see Tom's valuable time spent adding new features that would attract more customers.

 

Just my opinions.

Link to comment

I am not in favor of a change in how parity currently works and processes.  I like to stress the system during the parity checks - it gives me confidence that my system is up to the challenge of recovery in a real failure scenario.  I DO think there is a limit to how many drives you should try to protect in one array - and that the time it takes to perform a parity check should be a part of that decision.

 

Changing the subject to another aspect of how parity could be improved, I'd like to see an option for data-oriented protection.  Parity works well if drives fail as they are supposed to do.  But it is not uncommon for people to get unexplained sync errors right before a disk failure.  The affected person always asks "did I lose any data" and the answer is normally "I don't think so".  Having unRAID using its spare cycles to calculate MD5, or better yet Reed Soloman recovery blocks (something like PAR2 but with incremental update features).  With such a feature in place, usiing just a small percentage of the disk capacity, you'd be able to DETECT any file corruption with a good chance to correct many.

Link to comment

 

I'm wondering where this thread is going. What in the parity itself needs to be improved the most.

 

Speed?  Build? Redundancy (i.e. 2 parity disks.. not just a raid 1 copy, like RAID6 so you can handle multiple drive failure).

 

 

Well stated, and perhaps we need a Poll:

 

What Parity feature would you like to see improved or added to unRAID?

 

a.  Write Speed to a non cache enabled system

b.  Second Parity Drive (similar to RAID 6)

c.  Reduced system stress for Parity drive creation

d.  Software Mirrored Parity Drive support

e.  Other

 

 

The fact is most major NAS/SAN vendors have the ability to do dual parity in some fashion (ReadyNAS, QNAP, Drobo, etc.)

 

After writing this, I realize Tom needs to do more Poll's, or the community needs to come up with them.  I would like to understand what Tom's ambitions are and what the target audience of unRAID is (media enthusiast, businesses, backups, etc.).  This community has a myriad of great ideas and understanding where Tom wants to go will help to drive the ideas in that direction.

 

Link to comment

I personally would prefer not to be thrashing 20 disks for 30 hours as a trade off for thrashing my parity disk longer

 

The disks are not "thrashing"... they are engaged in a sequential read of the entire disk from beginning to end.

 

30 hours of sustained, sequential reads of a disk is in no way stressful for a modern disk, nor should you try to jump through hoops to avoid it.

 

That's a very good point and I accept that logic.

 

So power and heat considerations aside and accepting that "stressing the server" is actually a good thing (unless it fails) then we are left with the conclusion that the current way of building the initial parity is best. The time it takes is just the time it takes within the bounds of current HDD, MB and controller technology.

 

So do we think the current methods or adding, removing or replacing a new drive from the data set is as optimal as it could be from a parity drive point of view?

Link to comment

 

So do we think the current methods or adding, removing or replacing a new drive from the data set is as optimal as it could be from a parity drive point of view?

 

I believe adding a drive with the preclear script that exists is the best way of adding a disk.  Perhaps not the easiest for the basic user, but hopefully 5.0 with API's will fix that.  May just changing the way a format is done, or adding a prepare disk button on the main screen.

 

I would like to see the option of "remove drive from array" - where all the data would be migrated off of the disk and onto the other disks, then the drive would be removed from the drive set.  I believe right now the only way to remove a drive is to pull it from the set and have Parity rebuilt, correct? - Windows Home server has a similar feature which is really nice.

 

Link to comment

I don't mind the current setup and how parity is created.  Granted it does raise the temps in the system a little but I can handle that.

 

 

I think the one feature I would like to see the most is a RAID6 like dual parity.  Just in case something god forsaken happens and I loose 2 data disk then i would be able to recover from both of those disk failures.

Link to comment

I believe you are correct that there is no official way of removing a drive is to remake parity. This sounds less than efficient as it is an extended period where the data is unprotected. I see no reason why the parity could not be removed directly without accesing the other drives.

 

Also adding a disk is the same.? I have no experience of the preclear script but i was under the impression that it was a clever format and stress test only.

 

Someone correct me ?

Link to comment

I believe you are correct that there is no official way of removing a drive is to remake parity. This sounds less than efficient as it is an extended period where the data is unprotected. I see no reason why the parity could not be removed directly without accesing the other drives.

 

Also adding a disk is the same.? I have no experience of the preclear script but i was under the impression that it was a clever format and stress test only.

 

Someone correct me ?

 

The preclear script reads the disk, writes the disk as many times as you want (up to 20) and then it sets the special signature so the unraid knows it has been cleared.

 

It does NOT do any formatting.  This is still done when the disk is added to unraid and only takes a couple minutes.

Link to comment

I believe you are correct that there is no official way of removing a drive is to remake parity.

There is an un-official way to remove a drive without losing parity... It does lose the data on that drive, as it requires you to write zeros to all the bytes on the disk, calculating the effect as you go in the parity drive.  Then, you can remove the drive with a few steps (non-standard) and still keep parity protection.  It is described in the wiki.  Basically, you zero the drive before removing it.  This NOT what you want to do if you want to move data to another machine...

 

This sounds less than efficient as it is an extended period where the data is unprotected. I see no reason why the parity could not be removed directly without accesing the other drives.

You can remove parity at any time.  When assigning it back the unRAID array has no way of knowing you did not write to other disks in the interim, so it re-computes parity.

 

Also adding a disk is the same.? I have no experience of the preclear script but i was under the impression that it was a clever format and stress test only.

It is NOT a format at all.  Never was, never will be.  It is a stress test.  It is a clearing of the drive (writing zeros to almost all of it but a very few special bytes in the Master Boot Record area of the disk (within the first 512 bytes)

 

Do you know what parity is at all?  Each parity bit on the parity drive is based on the mathematical ADDITION of the equivalent bit across all the drives.  If the sum of the data drive bit is an even number, the parity bit for that position is a zero.  making the total sum for that bit position an even number.

 

If the sum of the data bits for a given bit position is an odd number, the parity bit is set to a one, making the total number of bits in that position set to a 1 equal to an even number.

 

Now you can see why adding a drive that is completely zero has no effect on the parity calculation.  An even number of bits on the existing drives+parity plus a zero for the new drive, is still a even number.

 

Same with removing a drive that has been completely zeroed out in the non-standard way I talked about earlier.  Any even number minus zero will still be an even number.

 

So, we can either add, or delete a drive that is completely zero fairly quickly, without impacting parity.  The preclear_script sets up a special signature that lets unRAID know it has been completely zeroed out so it does not do the clearing step itself.

Someone correct me ?

See above... ;)
Link to comment

Yes i know what parity is.

 

You have to realise though that there are so many unofficial addons with dozens of pages of posts no one can follow them all.

 

So where we are at then is we have :

 

You can remove a disk by zeroing it. Very clever but having to zero a disk is inconvenient. (i.e. if you wanted to pull a data disk out of the array.. one of the biggest features of unRAID is independent disks)

Add a disk. But only if it is blanked and only then if you run the preclear script. (as above... nice but not as nice as it could be)

No slick way of migrating a drive

 

What I am talking about here is the real ability to insert and remove data disks from the array without having to fully recalculate all of the parity... without caveats and tricks.

Link to comment

That makes sense.

 

So to be absolutely clear then:

 

every time you add, remove or replace a disk in unRAID currently there is no way other than recreating the whole parity from scratch again.

Not true.

 

When you replace a drive, parity is not created. Instead, the parity disk in combination with all the other data disks are read to determine what should be written to the disk being replaced.  Remember the even number of bits.  If the number of bits in all the remaining drives+parity is an even number, then the missing bit on the drive being replaced must be a zero, since if it was a "1" there would be an odd number of bits set to a one.   If the number of bits in all the remaining drives+parity is an odd number, then the missing bit on the drive being replaced must be a one, since if it was a "0" there would be an odd number of bits set to a one.   This process is done 8 times for each byte, and  8,000,000,000,000 times for a 1TB drive.

 

When adding a new drive, the new drive is completely zeroed, so it has no effect on parity at all.  Then it is added to the array, then a file-system is created on it (and parity is calculated on the formatting bits, since the drive is already part of the array)

 

When removing a drive, you can remove the drive and calculate all parity with the remaining drives (by pressing Restore and then Start), or zero it all first, then remove it and force unRAID to think parity is good. (But this is highly non-standard.  I've only done it once, just to see how it would work)

 

Joe L.

Link to comment

I was editing as you were typing...

 

...

 

What I am talking about here is the real ability to insert and remove data disks from the array without having to fully recalculate all of the parity... without caveats and tricks.

 

I should also be clear that when i was tlaking about replacing a drive i used confusing terminology... I am talking about replacing a working drive with another working drive (aka my array is full take this 250GB one and replace it with a 1TB one).

 

Whilst the current route is to assume the drive is failed and recreate from parity theres no reason that you couldnt drive to drive copy method.

Link to comment

Yes i know what parity is.

Good
No slick way of migrating a drive (with data)

You got it.  Mathematically not possible unless it is all zeros...

What I am talking about here is the real ability to insert and remove data disks from the array without having to fully recalculate all of the parity.

Yup, no way... not if they have data on them and in the protected part of the array.  If not part of the unRAID "md" array, and not part of the parity calcs, you can do as you wish.... and they have no effect on parity at all.

 

Link to comment

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...