[Plugin] Parity Check Tuning


Recommended Posts

11 hours ago, wgstarks said:

Last night my parity check paused during an appdata backup (as configured) but never resumed. After about 16 hours I went ahead and resumed the parity check manually.

 

 

brunnhilde-diagnostics-20230417-1858.zip 185.17 kB · 0 downloads

Looking at the diagnostics it looks like you have no array operation type scheduled to run in increments (please confirm this), and I think there is a bug in the plugin that in that scenario the plugin is not correctly resuming the check after mover or ca.backup operations finish.  I will look into rectifying that.

Link to comment
55 minutes ago, daTroll said:

Yeah, that could be it.. I'm running 6.12.0-rc2. The files you asked for are attached (the server hasn't been rebooted yet after the last parity check).

Looks like it a plugin bug that I will have to fix rather than the Unraid release :(  I did not think anything had changed in the code in the area affected but I must have been wrong.

 

If you look in the parity-checks.log file (it is a text file) then you will see that the last line has an empty field that on the other entries is set to "check P".   This entry is generated by the plugin so I need to work out why that field was not populated as according to the progress.save file you sent it should have been.  As a temporary fix you can add the missing field to the parity-checks.log entry and it will then display the history entry as expected.

  • Thanks 1
Link to comment
23 minutes ago, wgstarks said:

I’ll check this evening but IIRC you are right. Will try enabling increments if it isn’t already and report the outcome.

Do not need to enable increments if you do not want to.  I have confirmed that the plugin will fail to resume if no increments enabled, and I have already worked out the fix to test.

Link to comment
13 hours ago, daTroll said:

 

Yeah, that could be it.. I'm running 6.12.0-rc2. The files you asked for are attached (the server hasn't been rebooted yet after the last parity check)
 

The release I have just made should fix the parity history reporting issue.   Please let me know if it does not.

Link to comment
On 4/20/2023 at 12:24 AM, itimpi said:

Glad to have confirmation.  
 

Was not the biggest code change as it simply involved changing the word ‘false’ to ‘true’ as the return value from a function I had written but got the logic backwards :) 

I don't think this worked for me. The parity check pauses when mover runs, but doesn't start up again when it's finished. Is there a setting I need to change for this to happen?

rack-diagnostics-20230421-1650.zip

Link to comment
46 minutes ago, psikoh said:

I don't think this worked for me. The parity check pauses when mover runs, but doesn't start up again when it's finished. Is there a setting I need to change for this to happen?

rack-diagnostics-20230421-1650.zip 614.5 kB · 0 downloads

Difficult to tell anything from your diagnostics as the syslog is continually being spammed by error messages of the form:

Apr 21 16:09:16 Rack kernel: megaraid_sas 0000:02:00.0: 3368374 (735372555s/0x0004/CRIT) - Enclosure PD 21(c Port 0 - 7/p1) hardware error

You really need to get this fixed.   It was enough to tell me that the plugin detected mover running and ending, but not why it did not resume after mover ended.

 

Perhaps you can let me have a copy of the following files (either post them or PM them to me) from the flash drive so I can check out your configuration and what happened:

  • config/plugins/parity.check.tuning/parity.check.tuning.cfg   
  • config/plugins/parity.check.tuning/parity.check.tuning.progress.save  (or if there is one the version without a .save extension)

 

If I cannot see the issue from examining the above files then It might also be useful to me if you set the logging option in the plugin's settings to Testing mode but select the option to log only to flash (to avoid all the spam in the main syslog) and then recreate the issue and then let me have the file 'config/plugins/parity.check.tuning/parity.check.tuning.log'.  After doing that you want to reset the logging option to a lower level to avoid excessive writes to the flash drive.

 

Link to comment
10 hours ago, DontWorryScro said:
error: /update.php: missing csrf_token

 

this is in my error logs.  When I try to apply Parity Tuning settings it just launches a new blank tab and the above message shows up in the logs.  Ive made sure I am up to date on all Plugins.  What else can I do?

 

 

snuts-diagnostics-20230423-1703.zip 282.04 kB · 2 downloads

The only way I could think that might happen is if you had a browser window kept open after a reboot :)  is that a possibility in your case?

Link to comment
7 hours ago, itimpi said:

The only way I could think that might happen is if you had a browser window kept open after a reboot :)  is that a possibility in your case?

 

I just went around and shut down any and all browsers open on any PC/phone to be sure and I still am getting the error.  I uninstalled and reinstalled the plugin and now it is set to the default parameters but I still cant update the settings at all without getting the missing csrf_token and a new tab that never loads anything opened when I click on Apply.  

 

 

Maybe Ill try a reboot of Unraid.  It did finish a 14TV data rebuild yesterday and hasn't been restarted since then.  Maybe a nice reboot will shake out the cobwebs.

 

Edit: No dice.  A full reboot did not change this behavior.

Edited by DontWorryScro
additional info
Link to comment
13 hours ago, DontWorryScro said:

Any other ideas?  There's no open browsers.  Do my diagnostics show anything revealing?

Nothing that can explain the error message occurring, but I will look again to see if I can replicate it in any way.   

 

what they DO show is the array operation being paused due to CA Backup running and not being resumed when it is detected that it is no longer running.   The diagnostics also give me a copy of your current plugin settings so I can use those for testing.   I will add some additional logic to see if I can detect why the resume is not happening as the plugin has detected that the CA backup completed and a resume is required, but it is not actually issuing it.

 

 Maybe if I release an updated plugin for this your other issue might disappear as well :) 

  • Like 1
Link to comment

Firstly thanks for this plugin, have been using it for a while and your work is greatly appreciated.

 

I have a few strange issues which I'm unsure are due to configuration errors on my part, let me try to give an overview.

 

My default parity check options are to trigger a custom parity check on the last Monday of every month, and cumulative parity checks are disabled here as shown:

 image.png.adb30a48f09ea8087376ca12e20e27b9.png

 

My settings for your plugin are to resume daily at 3:00, pause at 17:30 and pause if mover gets in the way. I have just now enabled the debugging option to see if that provides any more detail.

 

image.thumb.png.fdffdf2a88e3032ad698ba426f17d63f.png

 

Looking through the syslogs I can see the parity check is resumed as expected at 3:00, is correctly paused when mover interferes, but throws exit status 255 after the mover exits and does not resume.

 

Apr 26 03:00:01 medianator Parity Check Tuning: Resumed: Scheduled Correcting Parity-Check
Apr 26 03:00:01 medianator Parity Check Tuning: Resumed: Scheduled Correcting Parity-Check (71.7% completed)
Apr 26 03:00:07 medianator kernel: mdcmd (63): check resume
Apr 26 03:00:07 medianator kernel: 
Apr 26 03:00:07 medianator kernel: md: recovery thread: check P ...

Apr 26 06:00:24 medianator Parity Check Tuning: Mover running
Apr 26 06:00:29 medianator kernel: mdcmd (64): nocheck PAUSE
Apr 26 06:00:29 medianator kernel: 
Apr 26 06:00:29 medianator kernel: md: recovery thread: exit status: -4
Apr 26 06:00:29 medianator Parity Check Tuning: Paused: Mover running: Scheduled Correcting Parity-Check (82.6% completed)
Apr 26 06:04:11 medianator  crond[1200]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Apr 26 06:06:26 medianator Parity Check Tuning: Mover no longer running
Apr 26 06:06:31 medianator  crond[1200]: exit status 255 from user root /usr/local/emhttp/plugins/parity.check.tuning/parity.check.tuning.php "monitor" &>/dev/null

 

Additionally, despite the syslog showing "Scheduled Correcting Parity-Check", the parity operation history now references my last 3 scheduled checks as "Data-Rebuild" rather than a scheduled check. Once the check is complete each month there are no errors and everything looks fine.

 

image.thumb.png.66dcd8c101d8e01f8c6ae2c9ab7fc9eb.png

 

Does this give you an idea of what could have gone awry? Any suggestions would be welcome.

Link to comment

I am reworking the code that handles pause/resume around mover/CA backup running, so hopefully that will either fix the issue of resume not working, or if not at least give some insight as to why.

 

As to the history giving strange results can you look at the config/parity-check.log file on the flash drive (it is a human readable text file) to see what the last few entries show (and possible let me have a copy) so I can determine if the problem is something that has crept in around displaying the history, or if it is actually an issue recording it in that file (which it is should be obvious).   If the latter I would appreciate a a copy of the parity.tuning.progress.save file from the plugins folder on the flash drive as that would have been used to generate the last history entry.

  • Thanks 1
Link to comment

Thanks for taking the time to respond. I noticed that the entries corresponding to the ones in my previous screenshot are identified as "recon D5" rather than "check P" so I assume that's where the "Action" column comes from.

 

config\parity-checks.log

 

2022 Dec 27 14:38:55|101329|177.6MB/s|0|0|check P|131928|2|SCHEDULED  Correcting Parity Check
2023 Jan 31 16:14:10|51244|351.3MB/s|0|0|check P|51244|1|SCHEDULED  Correcting Parity Check
2023 Feb  9 22:54:05|105093|171.3MB/s|0|0|recon D5|105093|1|AUTOMATIC Parity Sync/Data Rebuild
2023 Feb 27 02:00:22|3|6.0 TB/s|0|0|recon D5|17578328012|3|1|Scheduled Correcting Parity Check
2023 Mar  1 08:06:32|18361|980.4 MB/s|0|0|recon D5|17578328012|18361|1|Manual Correcting Parity Check
2023 Mar 30 04:34:41|83202|216.3 MB/s|0|0|recon D5|17578328012|336267|5|Scheduled Correcting Parity-Check

 

Requested files attached. Thanks for your help.

 

parity.check.tuning.progress.save parity-checks.log

Link to comment

@Jimmeh  I think I have tracked down why you were getting that 255 error in the syslog (and the operation not resuming) and am testing my fix.

 

Hopefully the two files you sent me will allow me to track down why the history entry is going wrong.   I can see from the .save file that the generated record is wrong and should have had ‘check P’ in the operation type field.   If you want you can edit the entries in the parity-checks.log file to have ‘check P’ instead of ‘recon D5’ to get them displayed correctly.

  • Thanks 1
Link to comment

@Jimmeh I notice that you have your regular scheduled check to be a correcting one.   It is normally recommended that this be non-correcting so that a drive that is acting up will not end up corrupting parity.  You normally only want to run correcting checks manually when you are reasonably certain you have no outstanding hardware issues with any drives.

  • Thanks 1
Link to comment
42 minutes ago, itimpi said:

@Jimmeh I notice that you have your regular scheduled check to be a correcting one.   It is normally recommended that this be non-correcting so that a drive that is acting up will not end up corrupting parity.  You normally only want to run correcting checks manually when you are reasonably certain you have no outstanding hardware issues with any drives.

Fair enough, I'll make that change.

 

Thanks again!

Link to comment

Yikes and I just now noticed that my issues arent just with Parity Tuning.  I just now tried to change frequency of Mover from Daily to Monthly just to confirm that other settings weren't also acting up.  And unfortunately they are.  Changing the frequency of Mover does not get recognized as changed.  The Apply button stays greyed out.  I'm also attaching another Diagnostics file since I have uninstalled and reinstalled Parity Tuning since the last time I had attached diagnostics and behavior is slightly different now.

Any help is appreciated.  This is sort of a problem for me if I can't change settings in Scheduler.  Tell me you see something in my diagnostics that is a red flag now.  Crossing fingers.

image.png.79648755baf9deb6e585dbe9bdc5e4b3.png

snuts-diagnostics-20230427-1817.zip

Link to comment
5 hours ago, itimpi said:

@DontWorryScro  Could not spot anything obvious in the diagnostics.

 

I did notice that some of your disks appear to be 100% full and mover is complaining that there is no space to move files off a cache pool to the array.   This should not I would think cause your symptoms but who knows :) 

 

 

Upon further investigation it seems every single thing in all of Scheduler is borked.  I thought it was only Parity Tuning because that was the only thing I had been trying to update.  Would my best bet at this point be trying to update to the latest RC of Unraid to see if it fixes my Scheduler?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.