Sudden Change in Parity-Check Times


Recommended Posts

I have been investigating a situation with my Testbed server  for a more than a week and I can't seem to find a solution to it.  I thought I would post up as much information about what happened and see if anyone else has any ideas.

 

As many of you do, I have a monthly non-correcting parity check scheduled to run on the first day of the month.  When I checked the results of those tests for both of my servers, I saw that the for the Testbed server had went from 8hr, 28sec (on June 1st) to 17hr, 40min (on Aug 1st).  ( For some now forgotten reason, the July check was terminated early by me.)  Below are the times for the last few successfully completed checks:

 

Dec  3 06:42:02        |28346|  105.9 MB/s|0
Jan  1 08:52:05        |28323|  105.9 MB/s|0
Jan 17 22:29:12        |28306|  106.0 MB/s|0
Feb  1 08:50:45        |28244|  106.2 MB/s|0
Mar  1 08:51:05        |28263|  106.2 MB/s|0
Mar 22 02:27:56        |28502|  105.3 MB/s|0
Apr  1 08:53:11        |28390|  105.7 MB/s|0
May  1 08:59:52        |28791|  104.2 MB/s|0
Jun  1 09:00:29        |28828|  104.1 MB/s|0
2016 Aug  1 18:40:04   |63603|  47.2 MB/s|0
2016 Aug  6 07:10:39   |62014|  48.4 MB/s|0

 

For anyone who wants to know the actual running time for the parity-check is contained in the second column and is in seconds. 

 

I have checked the SMART status and run the SMART short test on all of the drives.  The tests and the inspections of the SMART attributes revealed nothing to me. 

 

I rebooted the server on Aug 3rd (always the standard advice given by tech support when something usual happens!) and ran a second parity-check on Aug 6th starting in the evening when I could check its progress.  (When checking I open the GUI in a browser window, check the progress by refreshing the screen and closing the window when done.  The whole process takes less than a minute.)

 

At 704GB into the check, the speed was 45.8MB/s

At 1.53TB into the check, the speed was 47.4 MB/s

At the finish, the speed was 48.4MB/s

 

Perhaps a quick review of the array would be useful at this time.  It consists a WD Red 3TB parity drive, a Hitachi 1TB parity2 drive and (3) Hitachi 1TB data drives.  So the first look at the parity check speed (at 704GB) all five drives were involved.  At the second look, at 1.53TB, only the WD Red 3TB was left running.  (The other drives had actually spun down at the time of look!)

 

I thought it might be some sort of side effect of version 6.2-rc3 had reared its ugly head.  My Media server (running 6.1.9 at the time) has virtually an identical MB-CPU combination so I decided to test out that premise.  I formatted an SD card, stopped that server and copied the contents of its Flash Drive (actually an SD card in a Kingston card reader with a unique GUID) to the new SD card, and made it bootable.  I shutdown the server, swapped the SD cards and rebooted the MEDIA server.  I updated from 6.1.9 to 6.2-rc3 by installing the update plugin.

 

The Aug 1st time (running 6.1.9), the time was 7hr, 16min, 29 sec.  The Aug 5th time (running 6.2-rc3) was 7hr, 17min, 11 sec.  So the unRAID version was not a factor.

 

At this point, I am at a loss as to what is happening.  I am attaching a Diagnostics file from August 6th.  While I didn't see anything in it, I am not an expert on reading syslogs.  Perhaps, one of you might spot something.

 

 

rose-diagnostics-20160806-1535.zip

Link to comment
  • Replies 73
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

One thing that jumps out is the sync_thresh value, it's too low compared to the sync_window value:

 

md_num_stripes 6096
md_sync_window 2744
md_sync_thresh 192

 

It should be close to it unless you're using a SASLP or SAS2LP, try with these settings:

 

md_num_stripes 6096
md_sync_window 2744
md_sync_thresh 2700

 

Link to comment

One thing that jumps out is the sync_thresh value, it's too low compared to the sync_window value:

 

md_num_stripes 6096
md_sync_window 2744
md_sync_thresh 192

 

It should be close to it unless you're using a SASLP or SAS2LP, try with these settings:

 

md_num_stripes 6096
md_sync_window 2744
md_sync_thresh 2700

 

Reset them back to the defaults:  1280      384    192  (That is what the Media Server is set at.) and the GUI agreed that was the defaults.

 

At 1hr, 19 mins, the average is at 45.8MB/s.  So I don't think that is the problem.

 

(After you pointed out they weren't at the defaults, I then remembered I was played around with a tuneable adjustment program back in the dark distance past.  (This is my Testbed server and I do 'play' with it at times!  ::)        )

Link to comment

If the settings don't make a difference the only other thing I can think of is the CPU, parity check with dual parity has a noticeable higher CPU utilization, it won't make much of a difference with Xeons and alike, but slower CPUs and/or big arrays can see a noticeable slowdown, I also noticed that with most Linux kernel updates since unRAID v5, CPU utilization during a parity check seems to go a little higher, especially with AMD CPUs.

 

This is a very small array to be CPU limited, but you can easily confirm by unassigning parity2 and starting a check, I believe the other server you updated to 6.2 is using single parity, hence the normal speed.

 

If I'm right and it's the CPU, one more thing to keep in mind, unlike with single parity, CPU utilization with DP is high even when checking a single disk, i.e., in this server after the 1TB mark only one disk is active but CPU utilization will be the same as when checking all disks, I don't know why it's like this but I noticed it since first 6.2 beta.

 

If it's not the CPU I'm out of ideas.

Link to comment

If the settings don't make a difference the only other thing I can think of is the CPU, parity check with dual parity has a noticeable higher CPU utilization, it won't make much of a difference with Xeons and alike, but slower CPUs and/or big arrays can see a noticeable slowdown, I also noticed that with most Linux kernel updates since unRAID v5, CPU utilization during a parity check seems to go a little higher, especially with AMD CPUs.

 

This is a very small array to be CPU limited, but you can easily confirm by unassigning parity2 and starting a check, I believe the other server you updated to 6.2 is using single parity, hence the normal speed.

 

If I'm right and it's the CPU, one more thing to keep in mind, unlike with single parity, CPU utilization with DP is high even when checking a single disk, i.e., in this server after the 1TB mark only one disk is active but CPU utilization will be the same as when checking all disks, I don't know why it's like this but I noticed it since first 6.2 beta.

 

If it's not the CPU I'm out of ideas.

 

Johnnie.Black -----

 

Thanks for your insight.  I am a hour or so into the non-correcting parity check and it is running at about 94MB/s.  So it must be the CPU and combination of some changes make in the Linux kernel  between when I first decided to try out the dual parity concept. 

 

The reason that I say that is that I remember checking running a parity check after I added the Parity2 drive and it was very close to the speed for single parity at that time.  (Otherwise, I would have posted up something at that time...)  Unfortunately, I believe a lot of compromises are being made in the Basic unRAID server functionality to accommodate VM performance and reliability.  And those changes are probably most effect those with low end CPU's.

 

I will post back with the final time and speed after it finishes.

Link to comment

I could be wrong but I believe this is out of LT's control, I've been noticing it with most kernel upgrades, since the first v5 betas, sometimes the kernel upgrade is the only major change in a release, it's only noticeable on less powerful CPUs and for some reason AMD seems to be the worst affected.

 

I have a single core 1.8Ghz Celeron D430 that can still do 100MB/s+ with dual parity plus 4 data disks, but it's been getting a little slower with each new release.

Link to comment

If the settings don't make a difference the only other thing I can think of is the CPU, parity check with dual parity has a noticeable higher CPU utilization, it won't make much of a difference with Xeons and alike, but slower CPUs and/or big arrays can see a noticeable slowdown, I also noticed that with most Linux kernel updates since unRAID v5, CPU utilization during a parity check seems to go a little higher, especially with AMD CPUs.

 

This is a very small array to be CPU limited, but you can easily confirm by unassigning parity2 and starting a check, I believe the other server you updated to 6.2 is using single parity, hence the normal speed.

 

If I'm right and it's the CPU, one more thing to keep in mind, unlike with single parity, CPU utilization with DP is high even when checking a single disk, i.e., in this server after the 1TB mark only one disk is active but CPU utilization will be the same as when checking all disks, I don't know why it's like this but I noticed it since first 6.2 beta.

 

If it's not the CPU I'm out of ideas.

 

Johnnie.Black -----

 

<< snip>>

 

The reason that I say that is that I remember checking running a parity check after I added the Parity2 drive and it was very close to the speed for single parity at that time.  (Otherwise, I would have posted up something at that time...)  

 

<< snip >>

 

Quck check more than half way through and speed is still above 100MB/s!    Looking back at my Posts, I found this one:

 

  http://lime-technology.com/forum/index.php?topic=47626.msg457606#msg457606

 

So it appears something has occurred in the kernel that is reducing performance for low-end processors in and/or after the 6.2-b22 release.  I suspect that no one is really interested in determining what is happening as I appear to be the first one to mention it.  It could well be that most of the folks using low end hardware will never implement dual parity in any case.  The only other possible issue would be memory but the server has 6Gb in it and I would think that would be more than enough. Have you ever observed if memory affects dual parity checking speed? 

Link to comment

I think most users running lower performance CPUs are using single parity, some are probably still on v5.

 

I never found RAM quantity to have an impact on parity check speed, maybe with less than 2GB, single vs dual channel does however have an impact for CPU limited servers (about 20%), with 6GB you're using 4+2 or 2+2+2, AFAIK only some Intel chipsets support asymmetrical dual channel, so changing to a dual channel config may improve it a little.

 

Nevertheless, I believe that to enjoy dual parity you'll need to upgrade your CPU, or even better, change to an Intel platform, you don't need a high end CPU, a recent dual core is enough, e.g. Pentium G4400 is enough for excellent dual parity performance for a 30 disk unRAID server.

Link to comment

I think most users running lower performance CPUs are using single parity, some are probably still on v5.

 

I never found RAM quantity to have an impact on parity check speed, maybe with less than 2GB, single vs dual channel does however have an impact for CPU limited servers (about 20%), with 6GB you're using 4+2 or 2+2+2, AFAIK only some Intel chipsets support asymmetrical dual channel, so changing to a dual channel config may improve it a little.

 

Nevertheless, I believe that to enjoy dual parity you'll need to upgrade your CPU, or even better, change to an Intel platform, you don't need a high end CPU, a recent dual core is enough, e.g. Pentium G4400 is enough for excellent dual parity performance for a 30 disk unRAID server.

 

I know that the next MB/CPU will be in the Intel family.  AMD is moving more to the integrated high performance GPU integrated with the CPU.  And there is no need for any sort of a performance GPU with a server.  I am simply not one who will ever go down the route of a combined server/VM combination.  I want to be able to totally hide my servers in a out-of-the-way corner of the house. 

 

BTW, My server memory is actually 2- 1GB sticks and 2- 2GB sticks in the appropriate channel slots for dual channel operation.  (I did a google at the time to make sure I did it right.)

Link to comment

BTW, My server memory is actually 2- 1GB sticks and 2- 2GB sticks in the appropriate channel slots for dual channel operation.  (I did a google at the time to make sure I did it right.)

 

Forgot that possibility  :-[

 

In that case, only a CPU upgrade will significantly improve dual parity check speed.

Link to comment

Parity Check completed with only one parity drive being the only change in hardware and still using 6.2 rc3.  Here are the Results:

 

    Duration: 8 hours, 35 minutes, 21 seconds. Average speed: 97.0 MB/sec

 

Johnnie.Black, I will post a link up in the rc3 release thread and point to this thread...

 

 

Link to comment

My dual-parity testbed also has a pretty old CPU ... a Pentium E5300.  But I didn't notice any significant difference between parity check speeds when I added the 2nd parity drive.    Looking more closely, however, even that old Pentium scores 1556 on PassMark ... and your Sempron 145 only scores 807, so I suspect that's indeed the difference.

 

One thing you might check => Be sure you've got display updates disabled during parity checks.  That makes a BIG difference on these older/slower CPU's -- even a single parity system has appreciably slower checks if the display updates are set to real time (I don't remember the exact numbers -- they're buried in one of the early v6 threads) -- but I posted the details when I first installed v6; and the checks were a LOT slower than with v5.    Once I disabled display updates, the speeds were back at v5 levels.

 

The latest version of v6.2 has those updates disabled by default when parity operations are in progress -- but you might want to confirm that's the case on your system ... Settings - Display Settings - Page Update Frequency

 

Link to comment

Found a few of the earlier posts I had mentioned above ...

 

When I first installed v6 on one of my systems (not the one I'm now using for my testbed; but an almost-identical one that had a slightly faster Pentium E6300); my parity checks took 31% longer than they had on v5.

[ https://lime-technology.com/forum/index.php?topic=43023.msg412445#msg412445 ]

 

Later in that thread, I found that it improved a LOT by simply disabling display updates -- still took longer, but not by nearly as much.

 

Still later, based on another thread discussing issues with v6 and Marvel-based controllers, Eric Shultz posted a fix for those [changing the nr_requests value for all of the disks attached to those controllers ... I simply changed it for all of my disks, which worked just fine] -- I was able to get the parity check times down to the same levels that v5 had taken.

[ http://lime-technology.com/forum/index.php?topic=42629.msg417261#msg417261 ]

 

I don't believe that nr_requests modification is needed anymore with the latest versions; but turning off display updates absolutely helps a LOT with slower CPUs.    Now that I think about it, I believe the Go file in my media server still has all the nr_requests lines to set them to 8 instead of the default 128 (which caused the issues) ... I'll have to confirm whether removing all those lines indeed is "okay" ... and I may also test whether adding these to my dual-parity testbed has any impact.

 

 

 

Link to comment

 

<<snip >>

One thing you might check => Be sure you've got display updates disabled during parity checks.  That makes a BIG difference on these older/slower CPU's -- even a single parity system has appreciably slower checks if the display updates are set to real time (I don't remember the exact numbers -- they're buried in one of the early v6 threads) -- but I posted the details when I first installed v6; and the checks were a LOT slower than with v5.    Once I disabled display updates, the speeds were back at v5 levels.

 

 

For some reason, it was not disabled on the Testbed Server.  (Was disabled on my Media Server)  However, I doubt if it had much of an impact because when I was running the tests, I am (and was) fully aware that running the GUI severely impacts the speed of the parity checks.  (And the scheduled parity check on August 1st run at least seven hours before I would have even realized that something was amiss.)  When I checked the progress during my testing, it was a quick link to the server in a new browser window, look at the speed and progress, and close the browser GUI window.  Takes about thirty-to-forty-five seconds at the most.  During the tests that are a part of this post, I probably checked the progress less than four-or-five times in each test.  I am not talking about a change of 5% in speed, this is a change of 50%...

 

Your results tend to confirm that folks with low end processors can expect a performance hit with dual parity. 

 

<<<snip >>

Still later, based on another thread discussing issues with v6 and Marvel-based controllers, Eric Shultz posted a fix for those [changing the nr_requests value for all of the disks attached to those controllers ... I simply changed it for all of my disks, which worked just fine] -- I was able to get the parity check times down to the same levels that v5 had taken.

[ http://lime-technology.com/forum/index.php?topic=42629.msg417261#msg417261 ]

 

I don't believe that nr_requests modification is needed anymore with the latest versions; but turning off display updates absolutely helps a LOT with slower CPUs.    Now that I think about it, I believe the Go file in my media server still has all the nr_requests lines to set them to 8 instead of the default 128 (which caused the issues) ... I'll have to confirm whether removing all those lines indeed is "okay" ... and I may also test whether adding these to my dual-parity testbed has any impact.

 

Are you going to do this and report back what you observe?  I could try this but it will take a couple of days to get back to a spot where I can.  I am also assuming that I can now change the nr_requests by changing it under the 'Disk Settings' icon on the 'Settings' tab.  Has anyone (to your knowledge) reported any problems/issues with changing the setting from its default of 128 to something lower?

Link to comment

... Are you going to do this and report back what you observe?  I could try this but it will take a couple of days to get back to a spot where I can.  I am also assuming that I can now change the nr_requests by changing it under the 'Disk Settings' icon on the 'Settings' tab.  Has anyone (to your knowledge) reported any problems/issues with changing the setting from its default of 128 to something lower?

 

I'll try and do this when I get back, but we're leaving tomorrow for 3 weeks away from the Texas heat  :)

[Going to Maine, where the temps are at least 20 degrees lower than we've been having here !! ]

... so it'll be the first week of September before I do any "fiddling".

 

However, as for your last question => I'm not aware of ANY issues with setting nr_requests to a lower value.  My media server (just checked) still has the command in the Go file to set all of the disks to 8, and it's running fine.  I'm more interested in whether lowering it from 128 to something lower speeds up my test server ... although I don't think that's needed on the later releases (and it's running the latest RC).  Yes, it can be changed very easily with the Disk Settings menu; but we're packing today; and leaving in the morning, so I don't want to "mess" with anything today.

 

 

Link to comment

It's been over 100 most days in the past week; although it's projected to be "cool" next week -- not much over 90 until the end of the week.    But the projections for where we'll be are about 20 degrees lower than that ... so we're definitely looking forward to relaxing a bit.    ... and Texas doesn't have those sumptuous Maine lobsters !!  :)

Link to comment

Has anyone (to your knowledge) reported any problems/issues with changing the setting from its default of 128 to something lower?

 

Nr_requests tweak is most useful for Marvell controllers, especially the SASLP and the SAS2LP, I have it set to 8 on all my servers with one or both of these, no issues whatsoever, doubt it will make much difference in your case, but certainly won't hurt to try it.

Link to comment

Has anyone (to your knowledge) reported any problems/issues with changing the setting from its default of 128 to something lower?

 

Nr_requests tweak is most useful for Marvell controllers, especially the SASLP and the SAS2LP, I have it set to 8 on all my servers with one or both of these, no issues whatsoever, doubt it will make much difference in your case, but certainly won't hurt to try it.

 

I decided to give it a try.  I have already rest the nr_requests to 8 and reassigned the parity2 disk and started the rebuild of parity.  It should take about four hours.  I will probably start the non-correcting parity check later today.  So I should know what the impact is sometime tomorrow...

Link to comment

Promised update:

 

1--- The rebuild of parity 2 (1TB) took 3hrs, 16mins, 40sec.  (remember that number!)

 

2--- The non-correcting parity check took 17hrs, 15mins, 42sec.  (Wow!  Close to six times longer than the parity2 build for only for 3TB array.) 

        Average speed was 48.3MB/s. 

 

I did three quick looks at the average speed during this check.  The checks were done by opening the GUI in a browser window, checking the speed and percent complete and closing the browser window.

 

First one was at 10 minutes in and  average speed was 43MB/s

 

Second one was at 4hr 28 minutes--- 723GB (24.1%) and average speed was 43.6MB/s

 

Third one was at 9hrs, 59mins--- 1.68GB (56.1%) and average speed was 50MB/s

 

During the time this check was running, the daily status report of the health of the array ran at 12:20AM (this morning) and the following information was a part of that report:

 

Parity check in progress.
Total size: 3 TB
Elapsed time: 11 hours, 49 minutes
Current position: 2.02 TB (67.2 %)
Estimated speed: 54.1 MB/sec
Estimated finish: 5 hours, 3 minutes

 

Please observe one important item!  The Parity Drive is a 3TB drive and all of the other drives are 1TB.  So, basically, for the last 2TB of any check the only thing that is happening is that this 3TB parity drive is being read.  Does the slow speed of what is basically a read operation on a single drive seem odd to anyone besides me?  (And remember that based on the parity2 rebuild time, the non-correcting parity check time should be closer to 10hrs!)

Link to comment

Definitely strange =>  the parity rebuild time for the 2nd parity seems about right; but the parity check time is WAY off what it should be.    The speed after the 1TB point ... where the only drive involved in the check is the 3TB Red ... should be FAST !!

 

I'm off to catch a plane in a couple hours, but it'll be interesting to see if this gets resolved in the next few days until I get a chance to check the forum again.

 

FWIW, my initial thought when I read your very first post was that you had simply not factored in the extra 2TB that needed to be tested ... and that was the reason for the longer check times ==> UNTIL I read a bit further and realized that the actual check speed had dramatically slowed down even in the first 1TB.

 

One thought (we've already touched on this earlier) => Were all of the faster checks with just single parity?  ... and have all of the slow checks been with dual parity?    This may indeed simply be a computational bottleneck with your Sempron.    Open the Web GUI and see what the CPU load is on the Dashboard during parity checks.    If it's "pegged" at 100% for the whole time that's almost certainly the issue.  Of course, be sure you've got page updates disabled while checking this.      The 2nd parity computation probably still works okay because it's ONLY computing that parity.

 

Link to comment

Your current CPU scores 807 on PassMark.

 

For $45 you can buy a Phenom-II x4 945  [ http://www.ebay.com/itm/AMD-CPU-Phenom-II-X4-945-3-0GHz-Socket-AM3-HDX945WFK4DGM-GI-95W-/182235005911?hash=item2a6e0d83d7:g:wt0AAOSw7W5XOOSg ]

This CPU scores 3725 on PassMark ... more than 4.5 times as much "horsepower" as your current CPU !!  :)

 

If your CPU utilization indeed shows "pegged" during parity checks with dual parity, I suspect this is all you need to make a BIG difference  8)

 

Link to comment

Please observe one important item!  The Parity Drive is a 3TB drive and all of the other drives are 1TB.  So, basically, for the last 2TB of any check the only thing that is happening is that this 3TB parity drive is being read.  Does the slow speed of what is basically a read operation on a single drive seem odd to anyone besides me?  (And remember that based on the parity2 rebuild time, the non-correcting parity check time should be closer to 10hrs!)

 

I notice this since the first v6.2 beta, with dual parity, CPU utilization during a parity check is constant when using different size disks, even when only one disk is left like it was the case, I'm guessing that this has something to do with how parity2 works, if you use single parity with v6.2 CPU utilization will lower as it reads less disks, like v6.1

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.