Sudden Change in Parity-Check Times


Recommended Posts

Definitely strange =>  the parity rebuild time for the 2nd parity seems about right; but the parity check time is WAY off what it should be.    The speed after the 1TB point ... where the only drive involved in the check is the 3TB Red ... should be FAST !!

 

I'm off to catch a plane in a couple hours, but it'll be interesting to see if this gets resolved in the next few days until I get a chance to check the forum again.

 

FWIW, my initial thought when I read your very first post was that you had simply not factored in the extra 2TB that needed to be tested ... and that was the reason for the longer check times ==> UNTIL I read a bit further and realized that the actual check speed had dramatically slowed down even in the first 1TB.

 

One thought (we've already touched on this earlier) => Were all of the faster checks with just single parity?  ... and have all of the slow checks been with dual parity?    This may indeed simply be a computational bottleneck with your Sempron.    Open the Web GUI and see what the CPU load is on the Dashboard during parity checks.    If it's "pegged" at 100% for the whole time that's almost certainly the issue.  Of course, be sure you've got page updates disabled while checking this.      The 2nd parity computation probably still works okay because it's ONLY computing that parity.

 

If you go back to the first post, you will find that the times (and speeds) were initially fine for dual parity until sometime after b22 was released. 

 

Looking at updating the CPU is not an option that I would consider with this server.  (More on that below) I will never need more than four data drives in it.  And to be honest, the only reason it has a dual parity setup, is because I had a spare 1Tb drive laying around.  I thought it would be interesting to have some information on the viability of using dual parity on low end hardware and, initially, it appeared that there was no performance penalty to do so.  And now there is!  So if LimeTech wishes to have a look into this situation, I would be happy to cooperate with them. 

 

I, personally, think that having dual parity for arrays with less than six-to-eight data drives is unjustified from a statistical standpoint.  Of course, this assumes that the user does look at the status of the array and checks its health on a regular basis.  Once you get to, say, sixteen drives, it should almost be mandatory.  Between those two points, it is the comfort level of the user to determine which to use-- single or dual.  Of course, all of this depends on the due diligence of the user.  If the user never looks at the array, having quadruple parity won't save him from a data loss! 

 

Further comment of upgrading the CPU.  The problem is that upgrading the CPU in this system has several issues.  First, the power consumption of a more powerful chip for this MB is significant higher.  And that MB is several years old at this point.  If I were to do anything, I would replace both the MB and CPU and probably move to Intel as AMD is taking their CPU line down the combined CPU/GPU route with the emphasis on the GPU part.

 

As you can see by these comments, I am not enamored with the ideal of using dual parity on this server.  I only did because I could easily do so at no cost and thought it might provide some useful information.  One of these days, I will probably revert back to a single parity setup with this system. 

Link to comment
  • Replies 73
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

If you go back to the first post, you will find that the times (and speeds) were initially fine for dual parity until sometime after b22 was released. 

 

Sorry, I wasn't very clear, latest v6.2rc has more CPU utilization than first v6.2betas, due I believe to kernel upgrades, but if CPU limited any v6.2 using dual parity will have a constantly high CPU utilization during a parity check, e.g., if your have a 6 disk array with 5 x 1TB + 1 x 2TB and the CPU is @ 90% in the beginning of a parity check, it will still be high in the end when checking only 1 disk.

Link to comment

 

Sorry, I wasn't very clear, latest v6.2rc has more CPU utilization than first v6.2betas, due I believe to kernel upgrades, but if CPU limited any v6.2 using dual parity will have a constantly high CPU utilization during a parity check, e.g., if your have a 6 disk array with 5 x 1TB + 1 x 2TB and the CPU is @ 90% in the beginning of a parity check, it will still be high in the end when checking only 1 disk.

 

Are you saying that the kernel and basic core of unRAID is using a higher percentage of the CPU when it is active than the earlier beta series?  If that is the case, then I can see that it is the root cause of the issue which I am seeing.

 

In my opinion, if this is the case then LimeTech should be changing their hardware recommendations for a minimum CPU.  At the present, their current recommendation is "A 64-bit capable processor (1.0 GHz or better)".  While they do mention that "To add applications or virtual machines, additional requirements will apply ", I think that dual parity should be considered a basic part of the core unRAID setup and not an application as nothing has be downloaded and installed to use it. 

 

BTW, for all those folks who think that using the GUI to monitor the CPU usage have never used unRAID with a low end CPU.  The GUI itself can suck-up 40+% of the CPU cycles when it is active!  (It's difficult to get a good reading as the GUI load on the CPU is constantly varying.) 

Link to comment

I am using the AMD Sempron 145 on my test server, the motherboad Asus M4A88TD-M EVO allows unlocking of the CPU cores and transforms the Sempron into a Athlon II X2 4400e.

 

This is a dual core processor and CPU load on an idle system hovers between 5% and 10% (as observed by the GUI).

 

Have you tried to unlock your Sempron?

 

Link to comment

I am using the AMD Sempron 145 on my test server, the motherboad Asus M4A88TD-M EVO allows unlocking of the CPU cores and transforms the Sempron into a Athlon II X2 4400e.

 

This is a dual core processor and CPU load on an idle system hovers between 5% and 10% (as observed by the GUI).

 

Have you tried to unlock your Sempron?

 

If you look, I have two systems.  The Testbed server (which is the one that this whole thread is discussing) has a Biostar A770E which I got back in March 2011.  I did a bit of checking and it appears that chances are good that the Semprom 140 in that system does have a working second core, but that Biostar MB does not support unlocking.  So trying it is not an option.  (The Media server has the ASUS MB and that one does support unlocking.)

 

You have to understand that this whole dual parity tryout was an experiment on my part.  (I am a retired EE and I love to play around to see how things work and what are their limits.)  My work involved a lot of statistics and I developed a pretty good feel for using that 'science' and how numbers can be  used to show about anything unless one is really willing to find to look behind the curtain at the raw data.  I just have this feeling that dual parity is not really going to be a miracle cure for data loss for most users with less than about eight data drives.    Most of them would probably be better served by increasing the drive size rather than the number of drives in their data arrays as bigger drives (looking the trends and not a particular model) have no worse failures rates than smaller drives.  Doing drive replacement to increase the storage capacity has a better chance of keeping the power-on time down of the array drives in the bottom of the bathtub reliability curve.  (Plus, keeping smaller drives in the array until they get the end-of-life portion of that curve is not a smart move!) 

 

I also realize that there are a lot of folks who wear both a belt and suspenders and will want to use dual parity.  Perhaps, some day a few of them may benefit but if they think it can reduce the need for constant monitoring of their servers, the day will come when they have three drives out service.

 

For these reasons, I will be going back to single parity setup as soon as it become clear that the dual parity setup is not going be of any benefit in unRAID development. 

Link to comment

Are you saying that the kernel and basic core of unRAID is using a higher percentage of the CPU when it is active than the earlier beta series?

 

I'm not sure but I believe the kernel updates are the main reason for these changes, I had some time and let a parity check run for a few seconds with all v6.2 releases, there's a significant difference in CPU usage from b18 to rc3, but the increase is not linear, mostly goes up but surprisingly to me sometimes also comes down.

 

This was done a very low end CPU, most users with dual core or above wouldn't notice any difference, test server used was a HP Microserver N40L (AMD Turion 1.5Ghz dual core but with only one core enable), dual parity + 3 data disks.

 

output_1q52_OW.gif

 

I did two full parity checks, and although CPU usage is very high on the RCs, parity check from b18 to rc3 only changed a little, but a few more percent and it's going to slow down a lot, disks used are old 160 to 320GB disks, hence the low average speed.

 

6.2b18: Duration: 1 hour, 29 minutes, 45 seconds. Average speed: 59.4 MB/sec

6.2rc3: Duration: 1 hour, 35 minutes, 19 seconds. Average speed: 56.0 MB/sec

Link to comment

To my other point, with dual parity, CPU usage is high even when checking only one disk, this screenshot shows it well:

 

Screenshot_2016_08_16_12_57_40.png

 

Array is made of:

Parity:      320GB

Parity2:    250GB

3 x Data: 160GB each

 

 

First part it's checking all disks, then only both paritys and finally only parity1, the CPU usage only goes down because the HDDs speed goes down as it goes to the inner sectors, but if say at some point the speed is 80MB/s, CPU usage will be the same with 5 disks or 1 disk only. This only happens with dual parity, I assume due to how it works, though it's strange since there are no parity2 calculations being made on the last part when it's only checking parity1.

 

 

Link to comment

 

 

I did two full parity checks, and although CPU usage is very high on the RCs, parity check from b18 to rc3 only changed a little, but a few more percent and it's going to slow down a lot, disks used are old 160 to 320GB disks, hence the low average speed.

6.2b18: Duration: 1 hour, 29 minutes, 45 seconds. Average speed: 59.4 MB/sec

6.2rc3: Duration: 1 hour, 35 minutes, 19 seconds. Average speed: 56.0 MB/sec

 

I also suspect that in these old (slow) disks helped to keep the down the required CPU cycles/sec required to service the parity checking routine.  That might explain why you did not observe the drop in speed in the RC releases that I saw as your disks were already the limiting factor. 

 

In most cases, the CPU's have enough horsepower that the disks are always the limiting factor.  I have the case that is sitting in the middle.  That is to say, that with the early beta releases, the disk IO is the limiting factor on the speed of the parity check,  and with the  in the RC releases, the CPU becomes the limiting factor.

 

The thing that we can't determine (at our end) is if (1) changes in the kernel are at fault, or (2) LimeTech made some change in the Parity checking routine which increased the required CPU cycles. 

 

EDIT  Let me add one more thing for those folks who are casually reading this thread.  If the CPU has sufficient processing power (horsepower, if you want), then the parity checking time for BOTH single parity and dual parity will be virtually identical.  In other words, the CPU can make the required calculations faster then the disks can be read.  Thus, the speed of the process is constrained by the speed of the slowest hard drive.  And that is the ideal situation!

Link to comment

I also suspect that in these old (slow) disks helped to keep the down the required CPU cycles/sec required to service the parity checking routine.  That might explain why you did not observe the drop in speed in the RC releases that I saw as your disks were already the limiting factor. 

 

Exactly, with faster disks the system I used would be CPU limited.

 

 

The thing that we can't determine (at our end) is if (1) changes in the kernel are at fault, or (2) LimeTech made some change in the Parity checking routine which increased the required CPU cycles.

 

Can't be sure but doubt it's (2), that would mean changes to the MD driver in almost every release, and there's nothing about that in the release notes, but only Tom could confirm that.

Link to comment

 

 

I also suspect that in these old (slow) disks helped to keep the down the required CPU cycles/sec required to service the parity checking routine.  That might explain why you did not observe the drop in speed in the RC releases that I saw as your disks were already the limiting factor. 

 

Exactly, with faster disks the system I used would be CPU limited.

 

 

The thing that we can't determine (at our end) is if (1) changes in the kernel are at fault, or (2) LimeTech made some change in the Parity checking routine which increased the required CPU cycles.

 

Can't be sure but doubt it's (2), that would mean changes to the MD driver in almost every release, and there's nothing about that in the release notes, but only Tom could confirm that.

 

No. Actually you can confirm that. Just do diffs of /usr/src/linux*/drivers/md/ on each version of unraid. Pay attention to md.c and unraid.c.

 

Link to comment

johnnie.black ---

 

I am wondering what was the PassMark CPU of the systems that you have been testing with and if you have a lower-end system where the parity check are virtually identical, what is that PassMark number?

 

Doing a very rudimentary extrapolation of the PassMark of 739 for my CPU, I come up with a PassMark CPU number of 1600 (rounding it to 2000 would probably be prudent) to allow the speeds for both parity (dual and single)  check operations to be about equal.  The reason for my query is that someone has to establish some sort of a new parameter for the minimum CPU hardware which will allow unRAID to run its basic NAS function without performance issues caused the CPU.  The PassMark number is available for virtually every CPU ever made and is easy to understand.  Of course, if changes in hard drive technology improves the read speed (or if SSD become larger and cheaper), the PassMark number would probably require adjustment to accommodate the new drive technology.

 

 

Link to comment

It's difficult to say a number, because it depends on other factors, like Intel or AMD, disk speed and mostly the total number of disks, but I would agree that 2000 is a good passmark base number.

 

If there aren't other limits, like disk speed and/or controller bottlenecks, and the parity check is CPU limited it will always be slower with dual parity, but with a decent dual core speeds are still good, these are some tests I did when the first v6.2 beta was released:

 

Celeron D430 (Passmark 585)

SP + 7 Data disks: 102.0MB/s

DP + 6 Data disks: 84.5MB/s

 

SP + 11 Data disks: 77.5MB/s

DP + 10 Data disks: 68.9MB/s

 

Pentium G2030 (Passmark 2923)

SP + 23 Data disks: 191.7MB/s

DP + 22 Data disks: 164.2MB/s

 

SP + 28 Data Disks: 164.2MB/s

DP + 27 Data Disks: 138.6MB/s

 

AMD A4-6300 (Passmark 2220)

SP + 15 Data disks: 164.2MB/s

DP + 14 Data disks: 107.4MB/s

 

SP + 19 Data disks: 137.4MB/s

DP + 18 Data disks: 87.2MB/s

 

AMD A8-7600 (Passmark 5177)

SP + 15 Data disks: 201.4MB/s

DP + 14 Data disks: 106.0MB/s

 

SP + 19 Data disks: 150.3MB/s

DP + 18 Data disks: 83.6MB/s

 

 

Speeds are both max and average, as all tests were done with SSDs, I purposely kept the total number of disks the same for theses tests, so a normal user adding parity2 and maitaining the same number of data disks will probably notice an even higher slowdown.

 

AMD CPUs appear to take a larger hit with dual parity, though I only tested on APUs, so FX line could be different.

Link to comment

It's difficult to say a number, because it depends on other factors, like Intel or AMD, disk speed and mostly the total number of disks, but I would agree that 2000 is a good passmark base number.

 

If there aren't other limits, like disk speed and/or controller bottlenecks, and the parity check is CPU limited it will always be slower with dual parity, but with a decent dual core speeds are still good, these are some tests I did when the first v6.2 beta was released:

 

AMD CPUs appear to take a larger hit with dual parity, though I only tested on APUs, so FX line could be different.

 

Were any of your tests done with the rc releases or was it all done with the early beta releases? 

 

Now let me post up my numbers for v6.2 rc3 in the same format as you did:

 

AMD Semprom 140 (Passmark 739)

SP + 3 Data disks:  97.0MB/s

DP + 3 Data disks:  48.4MB/s

 

Actually, it seems to me that something is going on and it is not wholly a PassMark thing.  The Intel CPU's simply do a better job with a lower PassMark number.  (Perhaps, it has something to do with how the kernel is being optimized?)

 

I can now see that a recommendation for CPU selection for budget unRAID servers is not going to be a simple process...

 

 

Link to comment

Were any of your tests done with the rc releases or was it all done with the early beta releases? 

 

They were all done at the same time, beta18 or 19, can say for sure which one.

 

OK, here is my speed for the beta 19 release:

 

AMD Semprom 140 (Passmark 739)

DP +3 Data disks:  105.3MB/s

 

And here is the speed for 6.1.9 (as I never did a single parity check speed check for beta 19):

 

AMD Semprom 140 (Passmark 739)

SP + 3 Data disks:  106.2MB/s

 

An additional piece of data that I do have for both b19 and and rc3 is the built time for the parity 2 disk:

 

It was 2hrs, 45m, 14s for beta 19

  and

It was 3hrs, 16m, 40s for rc3.

 

Which was about 19% longer for rc3 than for beta 19.  So I suspect that you would not see the same results if you repeated the tests with rc3.  I wonder how many folks are really using dual parity at this point.  After all, it is big hardware commitment (that second parity disk) to even try it out, or if the ones who are using it have high end processors so they can get decent performance out of VM's.  I seem to recall from an earlier feature survey for version 6 that the majority of users wanted dual parity over all of the other choices in the survey.  I wonder what is going to happen when version 6.2.0 is finally released and those folks, who had to have it, make the commitment to add dual parity...

Link to comment

So I suspect that you would not see the same results if you repeated the tests with rc3.

 

Probably not, and it won't be easy for me to re-test, but remember only CPU limited servers will show a difference, most servers are disk or bus/controller limited, e.g., all but my SSD server are disk speed limited, so I've seen no difference in parity check speed from 6.1 with single parity to 6.2 with dual parity, even with the extra disk, average speed is usually around the 100Mb/s mark, these servers have all sandy/ivy bridge Celeron and Pentium CPUs and from 8 to 22 disks total.

 

I do have some stats from my SSD server (Skylake based Pentium G4400 with DP + 28 Data) with two different releases:

6.2b21 - 210.5MB/s

6.2rc3 - 209.5NB/s

 

This is the average speed as this server has at the moment a controller bottleneck, I estimate this CPU's limit with 30 disks is ~230MB/s, so a 60$ CPU is capable of enough speed for the max possible unRAID array size with the fastest HDDs available at the moment.

 

 

 

 

I wonder what is going to happen when version 6.2.0 is finally released and those folks, who had to have it, make the commitment to add dual parity...

 

I recognize that I'm somewhat obsessed with parity check speeds (though there are some valid reasons for not having exaggeratedly slow checks/rebuilds, like server availability and the more time a rebuild takes the longer the server is at risk, now less so with dual parity), but I believe that most users will be disk/controller limited or won't care, only a few (mostly I think with single core CPUs) will have a big slowdown and notice/complain.

 

 

Link to comment

Frank, you may want to try the new rc4, CPU utilization with dual parity is as low as it ever been, same as beta18.

 

Will do.  Have just installed update and assigned the same drive again as the Parity 2 disk.  Parity sync is now in progress. That speed will give the first clue as any improvement.  Be back with results tomorrow.

 

Link to comment

Just read this from Limetech on the rc 4Release Thread:

 

    http://lime-technology.com/forum/index.php?topic=51308.msg491919#msg491919

 

Looking at the second link, it appears that any Intel Processor released before Sandy Bridge in 2011 and any AMD Processor before Bulldozer will be behind the eight ball.  And if it is low power then the situation is a double whammy. 

 

EDIT: And it does help to explain why there was not decent correlation of speed results with Passmark numbers... 

Link to comment

Results for 6.2 rc4:

 

Event: unRAID Parity sync / Data rebuild
Subject: Notice [ROSE] - Parity sync / Data rebuild finished (0 errors)
Description: Duration: 3 hours, 14 minutes, 8 seconds. Average speed: 257.6 MB/s
Importance: normal

 

(Apparently, the average speed for the Parity sync is calculated over the 3TB array size but since the Parity2 and data disks are only 1TB that is all that parity 2 is calculated over.)

 

Event: unRAID Parity check
Subject: Notice [ROSE] - Parity check finished (0 errors)
Description: Duration: 12 hours, 58 minutes, 22 seconds. Average speed: 64.2 MB/s
Importance: normal

 

As you can see this was an improvement of the 6.2 rc3 results:

 

Event: unRAID Parity check
Subject: Notice [ROSE] - Parity check finished (0 errors)
Description: Duration: 17 hours, 15 minutes, 42 seconds. Average speed: 48.3 MB/s
Importance: normal

Link to comment

Your results seem to confirm my suspictions that these speed changes are kernel related, out of LT's control.

 

If you want you can look in your syslog for the RAID6 algorithm speed with 2 or 3 different releases, in my server they appear to correspond to CPU usage, i.e., lower speed=higher CPU usage, it looks like this:

 

Tower kernel: raid6: using algorithm sse2x4 gen() 4183 MB/s

 

This appears to indicate that with a certain kernel release some CPUs will be a little faster or slower than other releases.

Link to comment

Even on what I consider to be a P.O.S. cpu in my main server, I don't think its that bad:

 

raid6: using algorithm sse2x4 gen() 13394 MB/s

 

My secondary server which is about as low as you can go on the other hand is a different story

 

raid6: using algorithm sse2x4 gen() 2980 MB/s
Link to comment

Based on the numbers of some of my other servers, I'd say a >10000MB/s is ideal for dual parity, though a lower number can be OK for smaller arrays:

 

Celeron G550

raid6: using algorithm sse2x4 gen() 9984 MB/s

 

Pentium G2030

raid6: using algorithm sse2x4 gen() 11800 MB/s

 

Xeon E3-1220

raid6: using algorithm sse2x4 gen() 12082 MB/s

 

Like Tom pointed out on the rc4 thread, Haswell and newer CPUs are much faster using avx2 instructions.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.