[Plugin] Parity Check Tuning


Recommended Posts

Released version that displays version number of plugin that is running in the plugin's GUI pages (top right).

 

This is to help with a recent report where it appears that the version of the plugin showing as installed in the syslog did not agree with what was showing on the Plugins tab.  Makes it easier to check the version actually running which should help with support going forward.

  • Thanks 1
Link to comment
  • 4 weeks later...
  • 2 weeks later...

I am seeing this over and over in my system log.

How do I resolve?

 

Nov 17 14:57:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null Nov 17 14:57:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null Nov 17 15:12:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null Nov 17 15:12:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null Nov 17 15:27:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null Nov 17 15:27:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null

Link to comment
8 hours ago, a12vman said:

I am seeing this over and over in my system log.

How do I resolve

What version of the plugin are you using?   There was a bug of this sort some time ago but this should not happen with any recent release.

 

You may find that going into the plugin’s settings and making any change and then hitting Apply may fix things as that causes the cron settings to be regenerated.  

 

BTW:  if you have the latest version of the plugin installed you will see the version number displayed at the top right of the plugins settings.

Link to comment

My Version is 2021.10.10

 

I went into the parity tuning config. Changed my interval from Monthly to Yearly and clicked apply.  The I changed it from Yearly to Monthly and clicked apply.

 

I am still getting these messages in my SysLog:

 

Nov 18 05:48:11 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null Nov 18 05:48:11 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null Nov 18 05:49:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null Nov 18 05:49:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null Nov 18 05:56:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null Nov 18 05:56:01 MediaTower crond[2014]: failed parsing crontab for user root: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null

Link to comment

not sure why you might be seeing this when as far as I know you are the only one.   I need to see why it is happening for you.


to help me look further can you please let me have the .cfg and the .cron files from the plugin’s folder on the flash drive?   If possible I would not mind also a copy of your system’s diagnostics taken after you change a setting in the plugin.   Even better if you can have enabled the Testing mode logging option enable in the plugin’s settings as this will give me more information in the diagnostics.

 

in addition have you managed to reboot the server so that it confirms there is nothing being cached by Unraid in RAM.

Link to comment
  • 1 month later...

Hi @itimpi, a Happy New Year to you. 

Here's a strange one...

 

EDIT: IGNORE THIS....

 

Parity check started (as scheduled) at 02:30 this morning (reported as 02:31 in Unraid and email

Paused at 09:00 this morning (as scheduled) and reported correctly in Unraid and email (so 6 hours 30 minutes roughly)

 

Unraid reports Parity History with a runtime of :

    Elapsed time: 15 hours, 44 minutes (paused) <---- seems very strange

    Current position: 3.52 TB (58.7 %)  <---- seems completely normal

The plugin reports "Version: 2021.10.10"

 

Is there anything that you would like to see?  I have not seen this strange elapsed time report before.   I shall allow the second part of the check to run tonight and let you know what happens.

I am not too worried, if I am honest, but I thought you would like to have the possible bug report.

Take care,

Les.

 

EDIT: So Elapsed time includes the run time and the paused time.  I had not appreciated that, and had thought it only included the active time spent checking parity.  My apologies.

Edited by S80_UK
Link to comment
  • 3 weeks later...
On 9/19/2021 at 4:34 PM, itimpi said:


I tried to recreate your symptoms but have so far not succeeded?

 

one thing that occurred to me is there any chance a temperature related pause could have been active at the time the scheduled pause should have happened?   If so maybe the plugin is getting confused, and resuming when the temperatures return to good values when it should not?    It is surprisingly hard to set this up as robust test scenario that works correctly which is why I thought it was worth asking first before trying to simulate that specific set of circumstances.

 

Revisting a slightly old comment here - but looks like the scenario you mentioned here might've occurred to myself today.

Parity was scheduled to pause at 9:00am, but temperature related pause triggered at 8:28am, and then resumed at 9:07am.

 

Attached a diagnostics file as well, in case it's something worth looking into.

 

Feb  1 08:28:01 UnraidOS Parity Check Tuning: Paused Correcting Parity Check  (23.5%% completed): Following drives overheated: parity(45C) disk1(43C)
Feb  1 08:28:01 UnraidOS kernel: mdcmd (49): nocheck PAUSE
Feb  1 08:28:01 UnraidOS kernel:
Feb  1 08:28:01 UnraidOS kernel: md: recovery thread: exit status: -4
Feb  1 08:43:03 UnraidOS emhttpd: spinning down /dev/sdb
Feb  1 09:00:16 UnraidOS emhttpd: spinning down /dev/sde
Feb  1 09:00:42 UnraidOS emhttpd: spinning down /dev/sdc
Feb  1 09:07:01 UnraidOS Parity Check Tuning: Resumed Correcting Parity Check  (23.5%% completed) as drives now cooled down
Feb  1 09:07:07 UnraidOS kernel: mdcmd (50): check RESUME
Feb  1 09:07:07 UnraidOS kernel:
Feb  1 09:07:07 UnraidOS kernel: md: recovery thread: check P ...
Feb  1 09:07:07 UnraidOS emhttpd: read SMART /dev/sde
Feb  1 09:07:07 UnraidOS emhttpd: read SMART /dev/sdb
Feb  1 09:07:07 UnraidOS emhttpd: read SMART /dev/sdc

 

unraidos-diagnostics-20220201-1155.zip

Link to comment
23 minutes ago, mozyman said:

 

Revisting a slightly old comment here - but looks like the scenario you mentioned here might've occurred to myself today.

Parity was scheduled to pause at 9:00am, but temperature related pause triggered at 8:28am, and then resumed at 9:07am.

 

Attached a diagnostics file as well, in case it's something worth looking into.

 

Feb  1 08:28:01 UnraidOS Parity Check Tuning: Paused Correcting Parity Check  (23.5%% completed): Following drives overheated: parity(45C) disk1(43C)
Feb  1 08:28:01 UnraidOS kernel: mdcmd (49): nocheck PAUSE
Feb  1 08:28:01 UnraidOS kernel:
Feb  1 08:28:01 UnraidOS kernel: md: recovery thread: exit status: -4
Feb  1 08:43:03 UnraidOS emhttpd: spinning down /dev/sdb
Feb  1 09:00:16 UnraidOS emhttpd: spinning down /dev/sde
Feb  1 09:00:42 UnraidOS emhttpd: spinning down /dev/sdc
Feb  1 09:07:01 UnraidOS Parity Check Tuning: Resumed Correcting Parity Check  (23.5%% completed) as drives now cooled down
Feb  1 09:07:07 UnraidOS kernel: mdcmd (50): check RESUME
Feb  1 09:07:07 UnraidOS kernel:
Feb  1 09:07:07 UnraidOS kernel: md: recovery thread: check P ...
Feb  1 09:07:07 UnraidOS emhttpd: read SMART /dev/sde
Feb  1 09:07:07 UnraidOS emhttpd: read SMART /dev/sdb
Feb  1 09:07:07 UnraidOS emhttpd: read SMART /dev/sdc

 

unraidos-diagnostics-20220201-1155.zip 131.39 kB · 0 downloads

Thanks for the feedback - I will certainly look into why it resumed when it should not have.    I think I know why that might have happened and it is a simple fix but I still need to check it out.

 

I am currently testing a new feature that has been requested a few times where there is a setting to automatically pause while mover is running as contention between mover and parity check slows them both down.   At the moment I am not sure whether to make the default for this setting to be on or off.   The advantage of defaulting to on is that users who might have otherwise miss this feature was added may realise it when they get a notification saying paused because mover running.

 

I also notice that it looked like your regularity scheduled parity check is set to be correcting.   It is normally recommended that you set this to be non-correcting so you do not inadvertently corrupt parity if you have a drive acting up.   Then only run correcting checks manually as needed.

 

 

Link to comment






It is normally recommended that you set this to be non-correcting so you do not inadvertently corrupt parity if you have a drive acting up.   Then only run correcting checks manually as needed.


Hi, sorry to hijack this topic but is this the commonly agreed procedure?
If so, what is the recommended workflow to get everything straightened if there is a sync. error detected (and not corrected)?

Link to comment
12 minutes ago, Fireball3 said:

f so, what is the recommended workflow to get everything straightened if there is a sync. error detected (and not corrected)?

Normally one wants to try next work out why the parity error occurred in the first place.  It could be something obvious like an unclean shutdown but there are all sort of hardware errors (RAM. Power supply, drives) that could be occurring that triggered the errors.  You therefore want to be sure you are not currently having any hardware related issues that you know about before letting the system try to correct parity.

 

Once you are happy that there is no hardware error and all drives appear fine you then manually trigger a correcting check via the Mover button on the Main tab.

 

Link to comment



It could be something obvious like an unclean shutdown but there are all sort of hardware errors (RAM. Power supply, drives) that could be occurring that triggered the errors.  You therefore want to be sure you are not currently having any hardware related issues that you know about before letting the system try to correct parity.

Assuming the system has been properly commissioned and tested before taken into service, it's not reasonable to tear it apart everytime a sync error pops up.
At some point one should be able to rely on what it shows - a sync error.
That's why the correcting check is on by default and from my understanding there is still no commonly agreed recommendation that this should be done in another way.
Link to comment
4 minutes ago, Fireball3 said:


 


Assuming the system has been properly commissioned and tested before taken into service, it's not reasonable to tear it apart everytime a sync error pops up.
At some point one should be able to rely on what it shows - a sync error.
That's why the correcting check is on by default and from my understanding there is still no commonly agreed recommendation that this should be done in another way.

note I did nothing about tearing the system apart before running a correcting check.   I just said that it should be a human who makes the decision as to whether it is appropriate to run a correcting check.
 

If the hardware is performing optimally then it is irrelevant as you will not get a sync error.

 

We frequently see cases in the forum where a drive has started misbehaving for some reason and the user did not notice.   By that time they may have corrupted parity badly enough to prejudice data recovery actions.

 

Link to comment




If the hardware is performing optimally then it is irrelevant as you will not get a sync error.

Of course you will and that's why the check is done. But it is expected to be only read errors on the drives. That was the reason why I joined unraid in the first place, after I found out that my drives we throwing read errors when lying on the shelf.


We frequently see cases in the forum where a drive has started misbehaving for some reason and the user did not notice. By that time they may have corrupted parity badly enough to prejudice data recovery actions.

I remember having had this discussion some years ago in this forum already. If this is a frequent and serious issue, shouldn't there be a better solution how to deal with sync errors?
1. There should be no automatic correction
2. There should be an easy way to inspect the potentially affected data
3. The resync should be limited to the affected area without the need to run another 24h parity check

Parity checks put the most stress on the system - at least on mine.
Link to comment
1 minute ago, Fireball3 said:

The resync should be limited to the affected area without the need to run another 24h parity check

Just though I should mention that using the Parity Problems Assistant that is installed as part of the Parity Check Tuning plugin (under the Tools tab) it is possible to do this.    Not had any feedback as to whether users have successfully used it in practice and whether there are any suggestions for improving it.

Link to comment



Just though I should mention that using the Parity Problems Assistant that is installed as part of the Parity Check Tuning plugin (under the Tools tab) it is possible to do this.    Not had any feedback as to whether users have successfully used it in practice and whether there are any suggestions for improving it.


Thanks for the heads up!
Obviously I missed some improvements since I didn't have to jump on every new release as my servers are doing well as they are now.
In the meantime I'm increasingly following the principle: "never change a running system"
Although, pausing the parity check, is a much anticipated feature that would fit my needs.
Link to comment
  • 2 weeks later...

Just tried it myself and got same results so I can look at why.  

 

Well timed report as I was just about to make a new release with (amongst other changes) fixes to make the plugin compatible with changes at the core Unraid level I have been told about that will affect the plugin and that are coming in 6.10.0 rc3.

 

EDIT:  Found the bug (a misnamed variable in my code).  Effect was that the Settings would always display "No", but rest of plugin would always act as if it was set to "Yes".

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.