[Plugin] Parity Check Tuning


Recommended Posts

11 hours ago, itimpi said:

This is quite normal in the current version.    

OK - understood.  Just thought I should check.

 

11 hours ago, itimpi said:

If you last tried changing the settings using the 2021-01-01/02 versions then there was a permissions issue that would mean the pause/resume would not have been correctly scheduled to run.

That almost certainly was the case.  Thanks again.

 

11 hours ago, itimpi said:

 If it does not then I would welcome a syslog covering the problem period with at least Debug logging set (Testing level is even better but more volumous).

OK - I will see how it goes.  I can do some further testing after the weekend.   (I am still doing disk upgrades at present and I'm letting the parity processes run to completion to get though it as fast as possible.) 

 

Thanks for the plug-in by the way - it is genuinely very useful.

Link to comment
  • 2 weeks later...

@itimpi

 

On 1/9/2021 at 1:29 AM, S80_UK said:

OK - I will see how it goes.  I can do some further testing after the weekend.   (I am still doing disk upgrades at present and I'm letting the parity processes run to completion to get though it as fast as possible.) 

 

So my parity check is still not pausing.  Last night I did a run with test logging enabled.  Syslog is attached.  This was after a fresh install of the plugin, and after a fresh reboot of the server to try to eliminate any effects of earlier activity.

 

The critical point is at line 4386 when the log shows the pause attempt like this...

 

Jan 17 08:00:01 Tower-V6 parity.check.tuning.php: TESTING: ----------- PAUSE begin ------

Jan 17 08:00:01 Tower-V6 parity.check.tuning.php: TESTING: disk2 temp=*C, status=cool (drive settings: hot=42C, cool=35C)

Jan 17 08:00:01 Tower-V6 parity.check.tuning.php: DEBUG: Pause request

Jan 17 08:00:01 Tower-V6 parity.check.tuning.php: TESTING: disk3 temp=38C, status=warm (drive settings: hot=42C, cool=35C)

Jan 17 08:00:01 Tower-V6 parity.check.tuning.php: DEBUG: ...action not configured for manual Non-Correcting Parity Check (check P)

Jan 17 08:00:01 Tower-V6 parity.check.tuning.php: TESTING: disk4 temp=*C, status=cool (drive settings: hot=42C, cool=35C)

Jan 17 08:00:01 Tower-V6 parity.check.tuning.php: TESTING: ----------- PAUSE end ------

 

The pause was configured for 08:00 - I have highlighted the obvious error report, but I have no clue as to the cause. The pause was then immediately cancelled and the parity check resumed and ran to completion nearly four hours later.

 

Please let me know if there's anything else I can try.

syslog.txt

Edited by S80_UK
Link to comment

Have you enabled pause/resume for manually initiated checks in the plugin settIngs?    That log suggests the plugin thinks you have not!  If you have can you PM the paritytuning.cfg file from the plugins folder on the flash drive.

 

i have a version I am about ready to release that has improved notification support as well as fixing a number of other minor issues.    I would like to know if there is a bug I need to fix if you DO have the setting for pause/resume of manual checks set.

Link to comment
On 1/3/2021 at 8:24 PM, itimpi said:

What I am trying to assess is the pros and cons of providing such a feature.   In particular how it might be misused in a way that could lead to data loss.  If I DO implement it I would give positions as a percentage rather than a sector number.

 

Miss the new feature ( resume even array stop-start ) add on 1st 2021, this so great. I never try this plugin or official pause-resume, mainly because I am not run parity check in schedule, I will found a suitable period monthly to do that.

 

It really useful if plugin can start-stop on customer point, sometimes we really don't need whole disk sync/check because time critical. 

 

- Only sync in disks start-end area, this will complete on the fly and should effectively prevent partition info. corrupt problem, as result, reduce rebuild disk in unmountable state. But this need check all member disks size and sync at different region.

 

 

- Only sync necessary protect area, we usually buy higher capacity disk, so parity will bigger then member disks, it will save lot of time if no check/sync last parity disk region.

 

- Sound good if we can input in TB, i.e. 0 ,3-5, 10 ....

 

 

----------------------------------------------------

 

In first try, it can't pause or resume, it catch mdcmd "pause"

 

Quote

Jan 18 07:10:16 X299 crond[2429]: exit status 255 from user root /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null
Jan 18 07:16:32 X299 ool www[24013]: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php 'updatecron'
Jan 18 07:18:47 X299 ool www[29453]: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php 'updatecron'
Jan 18 07:19:53 X299 ool www[746]: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php 'updatecron'
Jan 18 07:20:16 X299 crond[2429]: exit status 255 from user root /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null
Jan 18 07:21:45 X299 ool www[523]: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php 'updatecron'
Jan 18 07:21:45 X299 parity.check.tuning.php: TESTING: ----------- UPDATECRON begin ------
Jan 18 07:21:45 X299 parity.check.tuning.php: TESTING: Deleted /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
Jan 18 07:21:45 X299 parity.check.tuning.php: DEBUG: created cron entries for running increments
Jan 18 07:21:45 X299 parity.check.tuning.php: TESTING: updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
Jan 18 07:21:45 X299 parity.check.tuning.php: TESTING: ----------- UPDATECRON end ------
Jan 18 07:25:01 X299 parity.check.tuning.php: TESTING: ----------- PAUSE begin ------
Jan 18 07:25:01 X299 parity.check.tuning.php: DEBUG: Pause request
Jan 18 07:25:16 X299 crond[2429]: exit status 255 from user root /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null
Jan 18 07:35:06 X299 kernel: mdcmd (37): nocheck Pause
Jan 18 07:35:06 X299 kernel: md: recovery thread: exit status: -4
Jan 18 07:36:39 X299 parity.check: TESTING: ----------- RESUME begin ------
Jan 18 07:36:39 X299 parity.check: DEBUG: Resume request
Jan 18 07:38:11 X299 parity.check: TESTING: ----------- PAUSE begin ------
Jan 18 07:38:11 X299 parity.check: DEBUG: Pause request

 

 

If execute in command line, I got below output

Quote

root@X299:~# parity.check resume
TESTING: ----------- RESUME begin ------
DEBUG: Resume request

Fatal error: Uncaught ArgumentCountError: Too few arguments to function actionDescription(), 0 passed in /usr/local/bin/parity.check on line 1092 and exactly 2 expected in /usr/local/bin/parity.check:1104
Stack trace:
#0 /usr/local/bin/parity.check(1092): actionDescription()
#1 /usr/local/bin/parity.check(343): configuredAction()
#2 {main}
  thrown in /usr/local/bin/parity.check on line 1104
root@X299:~# 

 

 
root@X299:~# parity.check pause
TESTING: ----------- PAUSE begin ------
DEBUG: Pause request

Fatal error: Uncaught ArgumentCountError: Too few arguments to function actionDescription(), 0 passed in /usr/local/bin/parity.check on line 1092 and exactly 2 expected in /usr/local/bin/parity.check:1104
Stack trace:
#0 /usr/local/bin/parity.check(1092): actionDescription()
#1 /usr/local/bin/parity.check(359): configuredAction()
#2 {main}
  thrown in /usr/local/bin/parity.check on line 1104
root@X299:~# 

 

 

root@X299:/boot/config/plugins/parity.check.tuning# cat parity.check.tuning.cfg
parityTuningIncrements="yes"
parityTuningFrequency="daily"
parityTuningResumeCustom="15 0 * * *"
parityTuningResumeHour="6"
parityTuningResumeMinute="30"
parityTuningPauseCustom="30 3 * * *"
parityTuningPauseHour="7"
parityTuningPauseMinute="25"
parityTuningUnscheduled="yes"
parityTuningRecon="yes"
parityTuningClear="no"
parityTuningNotify="no"
parityTuningRestart="yes"
parityTuningHeat="no"
parityTuningHeatShutdown="no"
parityTuningDebug="test"

 

Edited by Vr2Io
Link to comment

@vr2gb are you sure you are on the latest version of the plug-in?    Permission issues were a problem at one point but these have been fixed for a while as far as I know.    I am not seeing them on my test systems.  
 

If you had the version with permission issues installed you may need to go into the plugin settings; make a nominal change; and then re-apply the settings to get scheduled pause/resume to work correctly again (although thinking about it I may be able to do make a change that does that automatically as part of the plug-in installation).

 

EDIT:  On checking the code I see that the cron schedules are already being re-generated during plug-in installation.

 

Link to comment
17 hours ago, itimpi said:

Have you enabled pause/resume for manually initiated checks in the plugin settIngs?    That log suggests the plugin thinks you have not!  If you have can you PM the paritytuning.cfg file from the plugins folder on the flash drive.

 

i have a version I am about ready to release that has improved notification support as well as fixing a number of other minor issues.    I would like to know if there is a bug I need to fix if you DO have the setting for pause/resume of manual checks set.

I shall PM the cfg file to you.  I had not enabled for manual checks, but this was not a manual check - it was a scheduled check run at 2am in the morning using the regular parity check scheduler, so I would not have expected to need to enable the scheduler for manual checks for that purpose.     Have I misunderstood how it works?

Link to comment
6 hours ago, itimpi said:

@vr2gb are you sure you are on the latest version of the plug-in?    Permission issues were a problem at one point but these have been fixed for a while as far as I know.    I am not seeing them on my test systems.  
 

If you had the version with permission issues installed you may need to go into the plugin settings; make a nominal change; and then re-apply the settings to get scheduled pause/resume to work correctly again (although thinking about it I may be able to do make a change that does that automatically as part of the plug-in installation).

 

EDIT:  On checking the code I see that the cron schedules are already being re-generated during plug-in installation.

 

Try again and troubleshoot, still no joy, but error only return during parity sync/check in progress.

Cause by haven't cache pool, so no mover status ?? Thanks your plugin and effort.

 

4.thumb.PNG.0087bcb1571deaf0310e14fcd623a1f5.PNG

Edited by Vr2Io
Link to comment

I see the problem - I introduced a regression in the action Description function within the code.   I will fix and push out an update shortly.   It was very obvious as you used the CLI option.  I must add doing more comprehensive tests that way to my regular test plan :) 

Link to comment
2 minutes ago, itimpi said:

I see the problem - I introduced a regression in the action Description function within the code.   I will fix and push out an update shortly.   It was very obvious as you used the CLI option.  I must add doing more comprehensive tests that way to my regular test plan :) 

 

Thanks a lot, waiting for the update.

Link to comment
3 hours ago, Vr2Io said:

 

Thanks a lot, waiting for the update.

Update now available.

 

Hopefully no other regressions surface.   I did ensure this version successfully runs the CLI commands that were failing in your earlier report.

 

Please to not hesitate to report any other issues or even what you think might be just minor anomalies.

 

  • Like 1
Link to comment
1 hour ago, itimpi said:

Update now available.

 

Hopefully no other regressions surface.   I did ensure this version successfully runs the CLI commands that were failing in your earlier report.

 

Please to not hesitate to report any other issues or even what you think might be just minor anomalies.

 

Just update and try, CLI seems OK, but I can't save any setting with below error, so I can't test pause/resume work or not. ( Reboot after update also same, it affect official parity setting too, both parity scheduler can't change-save  )

 

Jan 19 01:23:14 X299 root: error: /update.php: missing csrf_token

 

Thanks

 

 

Edited by Vr2Io
Link to comment
2 minutes ago, Vr2Io said:

Just update and try, CLI seems OK, but I can't save any setting with below error, so I can't test pause/resume work or not. ( Reboot after update also same )

 

Jan 19 01:23:14 X299 root: error: /update.php: missing csrf_token

 

Thanks

 

 

Weird :)  I can reproduce that error but as I did not change anything that relates to the Settings page I have no idea why it should suddenly occur.  I did not even think to test that area I must admit :( 

Link to comment
42 minutes ago, Vr2Io said:

Just update and try, CLI seems OK, but I can't save any setting with below error, so I can't test pause/resume work or not. ( Reboot after update also same, it affect official parity setting too, both parity scheduler can't change-save  )

 

Jan 19 01:23:14 X299 root: error: /update.php: missing csrf_token

 

Thanks

 

 

I have just cleared my browser's cache and the problem has disappeared.   See if the same applies to you?  That makes some sense as csrf_token messages are typically seen when you have browser windows open to the Unraid GUI across a server reboot.

 

EDIT:  weirder - I now have this happening one server and not on another !!!!

 

Link to comment

I am going to have to back out the most recent change and introduce them one at a time to see what causes this.    The only thing that springs to mind is a variable name collision with something in the standard Unraid code on the Settings page.   As I say it is weird as there have been no changes to the actual plugin’s settings page for weeks.

  • Like 2
Link to comment

I have pushed a new version that fixes the csrf issue on the plugin's Settings page on my servers and means the Apply button works again - I would like confirmation it has done it for others.

 

The only change I made was to make the names of some variables recently added to the code less generic (and sure to be unique to my code) so it looks like the problem was that I had inadvertently used a name used elsewhere in existing Unraid code where it had a different meaning.  I suspect that this is a coding trap other developers could fall into if they use short variable names.

  • Thanks 1
Link to comment

It fully work now !!!!

 

For about history record, would you review the speed figure if you have time, the figure may not useful. Seems figure will take during disk operation happen, i.e. pause operation ( pause then take the figure ) or resume ( resume then take figure without wait a steady time). So it varies a lot.

 

Thanks.

 

4.PNG.fc5d9456f2d9d87530e611785767e8bb.PNG

 

Another problem, a message ( not always there, seems no this if mount array in maintenance mode  ) was show at footnote, it mask other info., such as temperature, fan speed.

4.thumb.PNG.7ff78f78099aad92a8ddb479e9efc698.PNG

Edited by Vr2Io
Link to comment
1 hour ago, Vr2Io said:

For about history record, would you review the speed figure if you have time, the figure may not useful. Seems figure will take during disk operation happen, i.e. pause operation ( pause then take the figure ) or resume ( resume then take figure without wait a steady time). So it varies a lot.

 

This SHOULD in principle be accurate.   The plugin takes into account pause/resume by working out the length of time that the process was actually running when calculating speed.   However the fact you queried it made me check the calculation and I see the speed calculation is currently always being based on the total number of sectors on the parity disk rather than the point reached.  This means at the moment it will be correct if the parity check completes but wrong if it does not.  I will fix this.

 

1 hour ago, Vr2Io said:

Another problem, a message ( not always there, seems no this if mount array in maintenance mode  ) was show at footnote, it mask other info., such as temperature, fan speed.

 

A missing $ on a variable name so I will fix this.  The effect should not be serious as it is where the plugin is trying to tell the difference between whether a check being run completed before an unclean shutdown occurred or was still in progress at the time.  The error means that the check will always (possibly incorrectly) be assumed to have been completed and recorded/reported that way.  Another thing to fix though.  What is surprising is that I did not get this flagged up in my system as I have the PHP warning level turned up on my test systems - but it is a rather obscure code path.

 

  • Like 1
Link to comment
4 minutes ago, itimpi said:

This SHOULD in principle be accurate.   The plugin takes into account pause/resume by working out the length of time that the process was actually running when calculating speed.   However the fact you queried it made me check the calculation and I see the speed calculation is currently always being based on the total number of sectors on the parity disk rather than the point reached.  This means at the moment it will be correct if the parity check completes but artificially high if it does not.  I will fix this.

 

How about if calculate how many sector process per each pause period time, this suppose should be meaningful and straightforward.

Link to comment
1 minute ago, Vr2Io said:

How about if calculate how many sector process per each pause period time, this suppose should be meaningful and straightforward.

This is implicit if I know how much time each increment was actually running for :)

 

When retrospectively analyzing the running of the parity check (as recorded in the file 'parity.check.progress') I was already tracking both the total size of the parity disk and the point reached by each increment.  Correcting the speed calculation was just a case of using $reachedSector instead of $totalSectors.

 

I will need to revisit this calculation when/if allowing for partial checks that do not necessarily start from the beginning to also take into account the start sector (currently assumed to be 0) but that is not required yet.   Not sure yet whether I want such partial checks recorded in the History - do you have a view on that?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.