[Plugin] Parity Check Tuning


Recommended Posts

5 minutes ago, Masterwishx said:

 

i think it was pause but if you can check the log and tell if it ,but its not big problem i can make run parity at 0:30 after mover 0:00..

i posted becose we talked befor...

 

Rather than perusing the logs you might want to look at the parity.check.tuning.progress.save file on the flash drive.    This should contain an entry for each pause or resume.

 

I am currently overhauling the logging messages at each level to make sure you get told about all pause and resume actions even at the Basic logging level.   That should give far less verbose output while still conveying the key information.

  • Thanks 1
Link to comment
Mar 21 23:52:43 Tower Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED

Every couple of minutes I'm finding this in my logs . I presume this is nothing major?

Link to comment
6 hours ago, shiftylilbastrd said:

Trying to set the resume and pause times.

After clicking apply the resume time saves as entered but the pause time reverts to 00:00

image.thumb.png.3ed3857c6919a85df900008a86fb72d8.png

Pause time was set for 06:00 and this is after hitting apply

This is a regression where several fields in the settings are not correctly displaying the current settings.   I have fixed for release that is forthcoming shortly.  It was a caused by me making a global edit that somehow missed some fields and I did not pick that up.
 

You will find that the values you set ARE being saved when you hit Apply if you examine the file parity.check.tuning.cfg in the plugins folder on the flash drive. 

Link to comment
8 hours ago, itimpi said:

This is a regression where several fields in the settings are not correctly displaying the current settings.   I have fixed for release that is forthcoming shortly.  It was a caused by me making a global edit that somehow missed some fields and I did not pick that up.
 

You will find that the values you set ARE being saved when you hit Apply if you examine the file parity.check.tuning.cfg in the plugins folder on the flash drive. 

Thanks.

 

Also getting this error almost constantly. Is it related?

Mar 22 08:49:01 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:49:08 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:49:17 Unraid emhttpd: shcmd (2038663): /usr/local/sbin/update_cron
Mar 22 08:49:18 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:49:30 Unraid flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update
Mar 22 08:49:51 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:51:08 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:56:01 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED

 

Link to comment
42 minutes ago, shiftylilbastrd said:

Thanks.

 

Also getting this error almost constantly. Is it related?

Mar 22 08:49:01 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:49:08 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:49:17 Unraid emhttpd: shcmd (2038663): /usr/local/sbin/update_cron
Mar 22 08:49:18 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:49:30 Unraid flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update
Mar 22 08:49:51 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:51:08 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED
Mar 22 08:56:01 Unraid Parity Check Tuning: PHP error_reporting() set to E_ERROR|E_WARNING|E_PARSE|E_CORE_ERROR|E_CORE_WARNING|E_COMPILE_ERROR|E_COMPILE_WARNING|E_USER_ERROR|E_USER_WARNING|E_USER_NOTICE|E_STRICT|E_RECOVERABLE_ERROR|E_USER_DEPRECATED

 

That is not actually an error.    It is an informative message that I only intended to happen when logging level was set to Testing, but I accidentally set it to occur at all log levels.

 

I have fixed the error you reported, but have found I seem to have also created a regression in the Parity Problems Assistant that I am currently tracking down.

Link to comment

The update I released today should fix all issues I know about, so please report any new anomalies you spot.

 

For those who use a language other than English with Unraid please not that I have NOT yet updated the plugin’s translations file to include any changes for this version of the plugin.    I will start working on this but if you spot any text unexpectedly coming out in English then please let me know so I can check that specific text against my translation file.

Link to comment

Using Version: 2022.03.23, the plugin seems to be generating invalid crontab entries.

 

This is from /etc/cron.d/root

 

# Generated schedules for parity.check.tuning
0 4 * *1 /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null
0 6 * * 1 /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "pause" &> /dev/null
*/17 * * * * /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "monitor" &>/dev/null

 

This is from my log:

 

Mar 28 06:54:01 trantor crond[2424]: failed parsing crontab for user root: *1 /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "resume" &> /dev/null

 

I've tried changing the times, and still get the invalid crontab entry without the space (*1)

Screenshot 2022-03-28 at 15-30-12 trantor_Scheduler.png

Link to comment

According to that screenshot you have set custom pause/resume using crontab type entries to pause/resume every 5 minutes!

 

That is the sort of thing I do during testing (one of the reasons I added the crontab format option) but not something I would expect to see in normal use :) 

 

Link to comment

The issue is I have 2 unraid NAS servers one with the 6.10 rc4 which seems to have no issues running and the big older one with 6.9.x that is having the pause and resume issue - both have the same settings.
Yesterday I uninstalled the plugin and let the parity check finish and it finished without overheating or other issues on the 6.9.x one.


Gesendet von iPhone mit Tapatalk

Link to comment
1 hour ago, Anubclaw said:

The issue is I have 2 unraid NAS servers one with the 6.10 rc4 which seems to have no issues running and the big older one with 6.9.x that is having the pause and resume issue - both have the same settings.
Yesterday I uninstalled the plugin and let the parity check finish and it finished without overheating or other issues on the 6.9.x one.


Gesendet von iPhone mit Tapatalk

Did you try changing the increment frequency to be Daily rather than Custom?    I checked and the Custom value is what you currently end up with if you never have gone into the plugin settings to make a change.   One of the not-visible internal changes was to how default values are set if the user never explicitly sets values.   I can change the Frequency to default back to Daily on such a case.

 

Since you mentioned only having this problem on your 6.9.2 system I will try to see if I can get different behaviour on a 6.9.2 system compared to a 6,10.0-rc* variant.

Link to comment

ah ok I will change it to daily then and see what happens :)  Yep I usually did not set anything explicitly up besides the parity in parts and such and did not fiddle with the time settings :) also maybe one Idea would be if i configured something wrong that there should be a “reset settings to defaults” button that explicitly sets the conditions of the plugin the way you set it up initially. It is hard for me to find the right values for the heat pausing so I ended up disabling it (which did not have an effect actually) on 6.9.x on 6.10.rc4 it was disabled by default it seems

Link to comment
5 hours ago, Anubclaw said:

ah ok I will change it to daily then and see what happens :)  Yep I usually did not set anything explicitly up besides the parity in parts and such and did not fiddle with the time settings :) also maybe one Idea would be if i configured something wrong that there should be a “reset settings to defaults” button that explicitly sets the conditions of the plugin the way you set it up initially. It is hard for me to find the right values for the heat pausing so I ended up disabling it (which did not have an effect actually) on 6.9.x on 6.10.rc4 it was disabled by default it seems

 

If you had never explicitly set anything up then you would have the default settings. Once you set up any setting then the ones displayed get saved and used going forward.  I pushed out an update earlier today to give better defaults for those who never change any of the plugins settings. 

 

Did you read the built in help regarding how the heat pausing values work?  If so and it was not clear maybe you can mention what you found confusing to it can be improved.

 

It is possible to revert to default values by either deleting the .cfg file in the plugins folder on the flash drive, or by removing and reinstalling the plugin.   I will think about whether adding a Defaults button adds value  - it would be trivial to implement but may be unnecessary now I have mad the initial defaults more like what a typical user would want.

Link to comment

My Parity Tuning is configured to pause parity by disk temperature but asa temperature down below warning disk threshold the parity doenst resume. Maybe because disk spin down time is set to 15 minutes and that time is not enough to cool down disks?

unraid.png

Link to comment
3 hours ago, Quejo said:

My Parity Tuning is configured to pause parity by disk temperature but asa temperature down below warning disk threshold the parity doenst resume. Maybe because disk spin down time is set to 15 minutes and that time is not enough to cool down disks?

unraid.png


Not quite sure what you are trying to say here?    Are you saying the drives DO cool below the resume threshold but you do not get a resume?  I am not sure that the drive spin down timeout is relevant, but 15 minutes does seem rather aggressive.

 

Perhaps turning on the Debug (or even better Testing) mode of logging in the plugin’s settings and providing your system’s diagnostics after you think you have a problem would help?    Also make sure you are on the current 2022-03-31 release of the plugin as some earlier ones were not handling temperatures correctly.

 

EDIT:  Just found that the 2022-03-31 release ended up with the same files as the 20:22-03-31 release which had unintended default settings.  Pushed a 2022-04-02 release with correct updated default settings

Link to comment
On 4/2/2022 at 3:18 AM, itimpi said:


Not quite sure what you are trying to say here?    Are you saying the drives DO cool below the resume threshold but you do not get a resume?  I am not sure that the drive spin down timeout is relevant, but 15 minutes does seem rather aggressive.

 

Perhaps turning on the Debug (or even better Testing) mode of logging in the plugin’s settings and providing your system’s diagnostics after you think you have a problem would help?    Also make sure you are on the current 2022-03-31 release of the plugin as some earlier ones were not handling temperatures correctly.

 

EDIT:  Just found that the 2022-03-31 release ended up with the same files as the 20:22-03-31 release which had unintended default settings.  Pushed a 2022-04-02 release with correct updated default settings

That's what happen even after setting spin down delay to 1 hour I think that drives goes to standby before cooling down so it reaches the resume threshold temperature while sleeping and never resumes the parity automatically

tower-diagnostics-20220403-2152.zip

Edited by Quejo
Link to comment
3 hours ago, Quejo said:

That's what happen even after setting spin down delay to 1 hour I think that drives goes to standby before cooling down so it reaches the resume threshold temperature while sleeping and never resumes the parity automatically

I do not think it is related to the spindown (at least not directly).   In the diagnostics posted I can see the parity disk getting detected as being too hot  (at 57C), a pause issued and the drives and subsequently start cooling down.   When the drives are spun down they are treated as cooled down so at that point a resume should be issued.  This is designed behaviour as the plugin assumes that drives do not spin down in a normal parity check (maybe this assumption will need revisiting) unless the check has gotten beyond the drive size.   The problem appears to be that the plugin has lost track of the fact that the pause happened due to drives overheating, so it does not issue a resume even after it thinks they have cooled down.  I suspect there must be a bug somewhere for this to happen.  
 

Having said that it may simply due to the fact that you are currently outside the time slot allocated for running increments so that the observed behaviour is actually correct.  

 

Is there any chance of repeating what you did , but with this time having the plugin Testing mode of logging active.   That should allow me to pin down exactly why the plugin is not issuing a Pause.

Link to comment
6 hours ago, itimpi said:

I do not think it is related to the spindown (at least not directly).   In the diagnostics posted I can see the parity disk getting detected as being too hot  (at 57C), a pause issued and the drives and subsequently start cooling down.   When the drives are spun down they are treated as cooled down so at that point a resume should be issued.  This is designed behaviour as the plugin assumes that drives do not spin down in a normal parity check (maybe this assumption will need revisiting) unless the check has gotten beyond the drive size.   The problem appears to be that the plugin has lost track of the fact that the pause happened due to drives overheating, so it does not issue a resume even after it thinks they have cooled down.  I suspect there must be a bug somewhere for this to happen.  

I can confirm that the issue is not related to disks spinning down. i have disabled the spin down delay but the behavior keeps the same. 

6 hours ago, itimpi said:

Having said that it may simply due to the fact that you are currently outside the time slot allocated for running increments so that the observed behaviour is actually correct.  

But shouldnt plugin resume parity asa next time window arrives?

6 hours ago, itimpi said:

Is there any chance of repeating what you did , but with this time having the plugin Testing mode of logging active.   That should allow me to pin down exactly why the plugin is not issuing a Pause.

Testing mode just activated. Just give it some time to get some logs.

Link to comment
9 minutes ago, Quejo said:

But shouldnt plugin resume parity asa next time window arrives?

It appears that you do not have the option to run Manual checks in increments set.   In that case the plugin is meant to resume the check when the disks cooled off, but it did not.  If that option WAS set then it would be correct to pause until the next increment window happened.

 

Hopefully the diagnostics with Testing logging mode set will allow me to pin down more exactly what happened.  If not I will have to build in some additional logging to get to the root cause.

Link to comment

I just noticed this in my logs with unRAID 6.9.2.

 

Apr 4 21:14:02 nasserv Parity Check Tuning: Pause
Apr 4 21:14:02 nasserv Parity Check Tuning: Following drives overheated: disk23(108C)

 

However, checking the SMART values, the disk never exceeded 53C.

 

This is the first time I've seen such an error... anyone ever have this issue before?

 

 

Checking the disk's SMART values, it seems, Attribute 2 is 108, could it be that it mis-read the value?

 

2	Throughput performance	0x0004	128	128	054	Old age	Offline	Never	108

 

I just updated the plugin to the latest version to see if it'll resolve the issue, and now it's stopping the parity at a lower temperature:

 

Apr 4 22:35:02 nasserv Parity Check Tuning: Paused Parity Sync/Data Rebuild (2.5% completed): Following drives overheated: disk19(42C) disk23(42C)

Apr 4 22:35:02 nasserv Parity Check Tuning: Paused Parity Sync/Data Rebuild (2.5% completed): Following drives overheated: disk19(42C) disk23(42C)

 

My warning temp is 45 and critical is 55c. 

Edited by coolspot
Link to comment
15 hours ago, itimpi said:

It appears that you do not have the option to run Manual checks in increments set.   In that case the plugin is meant to resume the check when the disks cooled off, but it did not.  If that option WAS set then it would be correct to pause until the next increment window happened.

 

Hopefully the diagnostics with Testing logging mode set will allow me to pin down more exactly what happened.  If not I will have to build in some additional logging to get to the root cause.

so i enabled increments for manual parity checks and changed increment pause and resume times to extend the parity window.

tower-diagnostics-20220405-0011.zip

Sem título.jpg

Edited by Quejo
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.