[Plugin] Parity Check Tuning


Recommended Posts

On 3/6/2023 at 3:17 PM, itimpi said:

Not sure if you catch everything if logging to a share as this will stop as soon as Unraid starts unmounting any drives. 

 

i added syslog to usb but cant really see why unclean shutdown , when i stop array befor shutdown its OK.

but if not, having message on boot that unclean shutdown detected and parity check will run , but also not every time its running....

also checked usb disk on windows chkdsk and its OK.

 

i have timeout on vm 30s increased to 60s but not using really vms now,  but a lot of dockers.

i read manual link they say 300s  for vm...

i have docker timeout 10s to 20s and disk timeout 90s to 180s.

but somehow still have unclean shutdown ....

 

is (VM/Docker shutdown time-out) for one vm/container only or for all of them?

 

Edited by Masterwishx
Link to comment

Hello! I was having an issue before where my Incremental parity checks were not reading the disk temperatures correctly when the disks had spun down (they were reporting "=*".

 

I have updated to the latest version of the Parity Tuning Script, and now the script doesn't appear to be collecting/detecting the disk temperature at all anymore.

 

here is a snippet from the syslog (with Testing logs enabled)

 

***

 

 

Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR begin ------
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR /boot/config/forcesync marker file present
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR manual marker file present
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR parityTuningActive=1, parityTuningPos=886346616
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR appears there is a running array operation but no Progress file yet created
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR MANUAL record to be written
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Current disks information saved to disks marker file
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written header record to  progress marker file
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written MANUAL record to  progress marker file
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Creating required cron entries
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Created cron entry for scheduled pause and resume
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Created cron entry for 6 minute interval monitoring
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR CA Backup not running, array operation paused
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... no action required
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR global temperature limits: Warning: 50, Critical: 55
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR plugin temperature settings: Pause 3, Resume 8
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   array drives=0, hot=0, warm=0, cool=0, spundown=0, idle=0
Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Array operation paused but not for temperature related reason
Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR end ------
 

***

 

the parity check tuning clearly shows "Warm=0, Cool=0, Spundown=0" but there are several disks above 55c.

 

and heres a screenshot of the disk temps in the webui.

 

(thanks again for reading this message)

2023-03-11_diskTemps_Screenshot 2023-03-11 133428.png

Link to comment

@rsbuc  I did not think anything had changed in this area but it looks like it might have done.    Can you please get me the testing mode log when the temperature should be detected as high so I can see what might be going wrong.    It is always hard to debug on my own system where I have to artificially force temperature events so never sure if it reflects real mode scenarios.

Link to comment

@rsbuc  Thanks for that log - it shows that for some reason the loop that is meant to loop through the drives checking temperatures is not being executed.    I just need to work out why.   My guess that I did something while tidying code for the 6.12 release that was incorrectly changed.

Link to comment

@rsbuc  The only way I can see that the temperatures for drives do not get checked is if the file /var/local/emhttp/disks.ini does not exist or has unexpected contents.  The code currently assumes it always exists with a specific layout.   Can you please check that you have that file and ideally if it does exist post a copy so I can check the contents as that is the place I get the temperatures from.

 

I can add a check to the plugin to flag an error if this file does not exist, and also look to see if I can get the same information from elsewhere in such a case. 

Link to comment

@Masterwishx  Those snippets are insufficient to be certain but it is the presence of the file /config/forcesync on the flash drive (that is created by Unraid when the array is started) that makes the plugin think there has been an unclean shutdown.   A more complete log would be needed to look further. 

 

I could obviously simply not generate the notification but it would be better to work out why it might be generated spuriously.  It is only meant to be an information message as the plugin is going to do nothing in such a case but better for it to be correct.

  • Like 1
Link to comment
4 hours ago, Nodiaque said:

Hello,

 

Is it normal that since I installed this plugin, everytime the server is rebooted from the UI, it says the server had an unclean reboot and want to parity check?

 

Thank you

It should not if the plugin code for detecting the fact that Unraid is intending to start a parity check is working correctly.   

 

Is the parity check actually starting (the plugin does not do this itself). 

 

If not then the message is incorrect so I need to work out why it is being displayed when it should not be.  I need to accurately detect whether a shutdown is clean or not as this affects the option to restart array operations on reboot from point previously reached which is only done if the shutdown is clean.

Link to comment
17 minutes ago, Nodiaque said:

Yes the parity check does run 

Then that suggests you really DID get an unclean shutdown as it is Unraid that starts that check, not the plugin.   The plugin notification was just something I added because it seemed informative and since I was detecting whether an unclean shutdown had occurred for other reasons it was easy to generate the notification.   I will be interested in scenarios where you get that notification and a parity check does NOT start so the notification is incorrect.


You probably need therefore to investigate why you might be getting unclean shutdowns.   You may find this section of the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page to be of use to give you some ideas on troubleshooting this.

 

Link to comment
  • 3 weeks later...

This morning I noticed that the parity check was running although it has been configured to run in increments and between 1:00:00 and 5:30:00 AM. I went through syslog and realised that parity check was resumed when the mover finished. Mover is configured at 6:00:00 (after the parity check has ended) so parity check should definitely not be resumed when it finishes.

I haven't noticed this behavior before (ie. last month's scheduled check) but cannot be certain.

Below is the syslog excerpt containing relevant entries:
 

--- Parity check scheduled to run between 1:00:00 and 5:30:00 -> OK
Apr  4 01:00:02 TMS-740 Parity Check Tuning: Resumed: Scheduled Correcting Parity-Check
Apr  4 01:00:02 TMS-740 Parity Check Tuning: Resumed: Scheduled Correcting Parity-Check (28.1% completed)
Apr  4 01:00:07 TMS-740 kernel: mdcmd (40): check resume
Apr  4 01:00:07 TMS-740 kernel: 
Apr  4 01:00:07 TMS-740 kernel: md: recovery thread: check P Q ...
Apr  4 05:30:01 TMS-740 Parity Check Tuning: Paused: Scheduled Correcting Parity-Check
Apr  4 05:30:06 TMS-740 kernel: mdcmd (41): nocheck pause
Apr  4 05:30:06 TMS-740 kernel: 
Apr  4 05:30:06 TMS-740 kernel: md: recovery thread: exit status: -4
Apr  4 05:30:12 TMS-740 Parity Check Tuning: Paused: Scheduled Correcting Parity-Check (40.7% completed)

--- Mover scheduled to run on 6:00:00 -> OK
Apr  4 06:00:01 TMS-740 root: mover: started

--- Mover took ~14mins this time -> OK
Apr  4 06:14:25 TMS-740 root: mover: finished

--- Parity check resuming -> NOK
Apr  4 06:18:43 TMS-740 Parity Check Tuning: Resumed: Mover no longer running
Apr  4 06:18:48 TMS-740 kernel: mdcmd (42): check resume
Apr  4 06:18:48 TMS-740 kernel: 
Apr  4 06:18:48 TMS-740 kernel: md: recovery thread: check P Q ...
Apr  4 06:18:48 TMS-740 Parity Check Tuning: Resumed: Mover no longer running: Scheduled Correcting Parity-Check (40.7% completed)

--- Manually pausing parity check after noticing that the parity check was still running
Apr  4 09:56:34 TMS-740 kernel: mdcmd (43): nocheck Pause
Apr  4 09:56:35 TMS-740 kernel: md: recovery thread: exit status: -4
Apr  4 09:58:23 TMS-740 ool www[3302]: /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php 'updatecron'
Apr  4 09:58:23 TMS-740 Parity Check Tuning: Configuration: Array#012(#012    [parityTuningScheduled] => 1#012    [parityTuningManual] => 0#012    [parityTuningAutomatic] => 0#012    [parityTuningFrequency] => 0#012    [parityTuningResumeCustom] => #012    [parityTuningResumeDay] => 0#012    [parityTuningResumeHour] => 1#012    [parityTuningResumeMinute] => 0#012    [parityTuningPauseCustom] => #012    [parityTuningPauseDay] => 0#012    [parityTuningPauseHour] => 5#012    [parityTuningPauseMinute] => 30#012    [parityTuningNotify] => 0#012    [parityTuningRecon] => 1#012    [parityTuningClear] => 1#012    [parityTuningRestart] => 0#012    [parityTuningMover] => 1#012    [parityTuningCABackup] => 1#012    [parityTuningHeat] => 1#012    [parityTuningHeatHigh] => 3#012    [parityTuningHeatLow] => 8#012    [parityTuningHeatNotify] => 1#012    [parityTuningHeatShutdown] => 0#012    [parityTuningHeatCritical] => 2#012    [parityTuningHeatTooLong] => 30#012    [parityTuningLogging] => 0#012    [parityTuningLogTarget] => 0#012    [parityTuningMonitorDefault] => 17#012    [parityTuningMonitorHeat] => 7#012    [parityTuningMonitorBusy] => 6#012    [parityTu

 

Link to comment
3 hours ago, itimpi said:

@henris  Interesting - I will look into that.   It might be helpful if you could post (or PM me) a copy of the parity.tuning.cfg file from the plugins folder on the flash drive so I can see if any other setting might be relevant as I am checking this out.

Here you go. Thank you for the fast response and for the plugin itself, it has been a core plugin for many years and worked wonderfully.

 

parityTuningIncrements="1"
parityTuningFrequency="0"
parityTuningResumeCustom=""
parityTuningResumeHour="1"
parityTuningResumeMinute="0"
parityTuningPauseCustom=""
parityTuningPauseHour="5"
parityTuningPauseMinute="30"
parityTuningUnscheduled="1"
parityTuningRecon="1"
parityTuningClear="1"
parityTuningNotify="0"
parityTuningHeat="1"
parityTuningDebug="no"
parityTuningAutomatic="0"
parityTuningRestart="0"
parityTuningHeatHigh="3"
parityTuningHeatLow="8"
parityTuningHeatNotify="1"
parityTuningHeatShutdown="0"
parityTuningLogging="0"
parityTuningScheduled="1"
parityTuningManual="0"
parityTuningResumeDay="0"
parityTuningPauseDay="0"
parityTuningMover="1"
parityTuningCABackup="1"
parityTuningLogTarget="0"

 

Link to comment

Thanks for that.   I am reasonably certain I have tracked down the issue (1 line of code in the wrong place) but I need to do testing to confirm.    
 

While checking I think on examination the logic is also slightly flawed in that if the parity check is legitimately paused because mover is running, but mover then completes after the scheduled time for the increment to end the plugin will resume the check not taking into account it is past the end of the current increment.  I will need to fix that as well.

Link to comment
  • 2 weeks later...

Using version 2023.04.08 for the first time, the Action in my History page has only a hyphen ("-") for the most recent run which was completed at noon today. It was running in 4 increments according to the same page. Normally, I would expect the Action to show "Scheduled Non-Correcting Parity-Check" like it did in March.

Link to comment
1 hour ago, daTroll said:

Using version 2023.04.08 for the first time, the Action in my History page has only a hyphen ("-") for the most recent run which was completed at noon today. It was running in 4 increments according to the same page. Normally, I would expect the Action to show "Scheduled Non-Correcting Parity-Check" like it did in March.

Cannot think of any recent change in the plugin that should have that result (famous last words :)).

 

Could you perhaps post (or PM me) the following files from the flash drive:

  • /config/parity-checks.log
  • /config/plugins/parity.check.tuning/parity.check.tuning.progress.save

If you still have diagnostics covering the period when the check took place then the syslog might just be relevant although I think the above two files should allow me to see where the issue might be.  Also, what release of Unraid are you running?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.