Jump to content

[Plugin] Parity Check Tuning


Recommended Posts

I am quite sure it did not work for a long time.

I always got lots of disk overheated and critical overheat warnings during partity checks at last during the last 1-2 years.

But it paused nevertheless, before it got a real problem (as I see from the code it also has a check for critical temperature).

I just happened now, that I had set critical to high accidently, what caused two disks to actually die.

Thats when I started to realize that it never should have reached critical at all with the plugin active.

Edited by Zerginator
Link to comment
23 minutes ago, Zerginator said:

I am quite sure it did not work for a long time.

I always got lots of disk overheated and critical overheat warnings during partity checks at last during the last 1-2 years.

But it paused nevertheless, before it got a real problem (as I see from the code it also has a check for critical temperature).

I just happened now, that I had set critical to high accidently, what caused two disks to actually die.

Thats when I started to realize that it never should have reached critical at all with the plugin active.

 

Sorry about that - the code that handles temperature issues had not been touched for some time and I assumed that the lack of problem reports meant it was working as I thought it was. 

 

I have just pushed an update that seemed to pause and resume as expected in my quick test when disks overheat (without reaching critical value).   Let me know if now works for you.

 

The critical value should not have been relevant to the pause/resume on temperature.   It was intended for the option to shutdown the whole server if disks reached that value.

 

  • Thanks 1
Link to comment

Just paused correctly at 3° below warning temperature.

I noticed a change in the settings to yesterdays version (from beginning of Octobert)

the crontab custom settings are not hidden anymore now, when not set to custom.

 

Screenshot of still open tab from before update:

2023-10-26 13_12_38-Spiele-PC - TeamViewer.png

 

Screenshot of settings after update:

2023-10-26 13_13_06-NAS_Scheduler – Mozilla Firefox.png

Edited by Zerginator
Added image captions
Link to comment
38 minutes ago, Zerginator said:

Just paused correctly at 3° below warning temperature.

I noticed a change in the settings to yesterdays version (from beginning of Octobert)

the crontab custom settings are not hidden anymore now, when not set to custom.

 

Screenshot of still open tab from before update:

2023-10-26 13_12_38-Spiele-PC - TeamViewer.png

 

Screenshot of settings after update:

2023-10-26 13_13_06-NAS_Scheduler – Mozilla Firefox.png

I cannot reproduce the settings issue.   My custom settings appear/disappear as I change the frequency setting.  

 

glad to hear that you correctly got the pause.   I guess you now need to wait to see if the resume happens as expected.

Link to comment

Yes, everything works just as expected now!

The plugin is a have for me, as this server is a low noise/low power always on unit with 2,5" drives.

It runs all the daily need containers, and the disks are running perfectly cool in normal usage.

2,5" Laptop disks are just not built for the 24 hour continuous use cycle that happens during a parity check and overheat,

so it is the perfect solution for this case. Additional rebuild time does not matter much, as it is dual parity.

 

Maybe the settings issue is gone after reboot, I will report here, when the rebuild is finished.

Link to comment

@Zerginator From your description of your Use Case you might also find the option to resume array operation after a restart is also of use.  
 

 I personally use it so I can shutdown my server overnight when it is not being used and then resuming any running array operation from the previous point reached when I restart it the next day.

  • Like 1
Link to comment

Thanks for the suggestion. But that opposite is generally the idea. To have a server that is always on for image uploading, syncthing, webdav, vpn and whatsoever, that is build for low energy consumption (only 2,5" ssd with cache disks, so they are mostly spun down) in a small case.

I use the SilverStone SST-CS280 case that is a brilliant micro ATX Case with 8x 2,5" hotswap). Only the cooling is bad, as there is no active fan for the hdds, and using 15mm 5TB disks means they are nearly touching each other leading.

The whole setup with a good Core i7 CPU is idling at around 25W consumption and all services (more that 30 docker containers) are always available.

I have another server with good cooling, a lot of CPU and graphics power and storage that is only activated on demand.

 

Link to comment
  • 2 weeks later...

I'm not quite sure when this started, but my incremental parity checks don't continue to run after the first pause. When looking at syslogs, it says:

Nov 7 08:00:46 Unraid Parity Check Tuning: Send notification: mover no longer running: (type=normal link=/Settings/Scheduler)

Nov 7 08:00:46 Unraid Parity Check Tuning: Array operation not resumed - outside increment window: Scheduled Non-Correcting Parity-Check (25.3% completed)

 

But my custom cron job looks like:image.thumb.png.3a125be670ac6822434d97d5db1b1034.png

 

It looks like the Parity Check last updated on 10-28. Has something changed within the last release or two to have the crontab not work right?

Link to comment
5 hours ago, dataslayer said:

I'm not quite sure when this started, but my incremental parity checks don't continue to run after the first pause. When looking at syslogs, it says:

Nov 7 08:00:46 Unraid Parity Check Tuning: Send notification: mover no longer running: (type=normal link=/Settings/Scheduler)

Nov 7 08:00:46 Unraid Parity Check Tuning: Array operation not resumed - outside increment window: Scheduled Non-Correcting Parity-Check (25.3% completed)

 

But my custom cron job looks like:image.thumb.png.3a125be670ac6822434d97d5db1b1034.png

 

It looks like the Parity Check last updated on 10-28. Has something changed within the last release or two to have the crontab not work right?


Nothing has changed recently in the handling around crontab.    
 

The messages you display suggest that it is nothing to do with the increment pause/resume via crontab but instead to do with a pause being triggered by mover running and then the code that detects mover has completed deciding not to resume the check because it thinks it is outside the increment window.  
 

If you not on the latest release then you could get this symptom when NOT using the custom scheduling option but as far as I know this is no longer the case.
 

The resume code that checks for current increment window is probably not working correctly when using the Custom option because I think it only looks at the increment start/stop times for the non-custom options.   It could be quite complicated to get it working for all combinations of the custom option so could you try setting the start and end times in the daily option to be the same as those you have set in the custom option and then see if it works as expected when you revert to the Custom option?  I may have to rethink how I am handling pause/resume around mover running so that using the Custom option does not break the resume.

Link to comment

I’m confused - the log said 

Nov 7 08:00:46 Unraid Parity Check Tuning: Send notification: mover no longer running: (type=normal link=/Settings/Scheduler)

“No Longer Running” 

 

And then 

Nov 7 08:00:46 Unraid Parity Check Tuning: Array operation not resumed - outside increment window: Scheduled Non-Correcting Parity-Check (25.3% completed)

”not resumed - outside increment window”

 

that tells me that the mover was not the issue, but the cron was because it saw it was outside the window i specified, but the window was Monday - Friday, 8AM - 11 AM. 

 

Anyway, I did try to remove the idea of a cron tab for now and changed the setting to be not custom but daily 8 - 11. Only problem with that is that it might run on the weekends, which his what i don’t want. But this is a start to the test. 

 

It did run again this morning and continued properly:

Nov 9 08:00:45 Unraid Parity Check Tuning: Send notification: mover no longer running: (type=normal link=/Settings/Scheduler)

Nov 9 08:00:50 Unraid kernel: mdcmd (46): check resume

Nov 9 08:00:50 Unraid kernel:

Nov 9 08:00:50 Unraid kernel: md: recovery thread: check P Q ...

Nov 9 08:00:50 Unraid Parity Check Tuning: Resumed: Scheduled Non-Correcting Parity-Check (25.3% completed)

 

so maybe it was the custom cron tab. But it has worked in the past. Was working flawlessly for several months until recently.

Link to comment
1 hour ago, dataslayer said:

I’m confused - the log said 

Nov 7 08:00:46 Unraid Parity Check Tuning: Send notification: mover no longer running: (type=normal link=/Settings/Scheduler)

“No Longer Running” 

 

And then 

Nov 7 08:00:46 Unraid Parity Check Tuning: Array operation not resumed - outside increment window: Scheduled Non-Correcting Parity-Check (25.3% completed)

”not resumed - outside increment window”

 

that tells me that the mover was not the issue, but the cron was because it saw it was outside the window i specified, but the window was Monday - Friday, 8AM - 11 AM. 

 

Anyway, I did try to remove the idea of a cron tab for now and changed the setting to be not custom but daily 8 - 11. Only problem with that is that it might run on the weekends, which his what i don’t want. But this is a start to the test. 

 

It did run again this morning and continued properly:

Nov 9 08:00:45 Unraid Parity Check Tuning: Send notification: mover no longer running: (type=normal link=/Settings/Scheduler)

Nov 9 08:00:50 Unraid kernel: mdcmd (46): check resume

Nov 9 08:00:50 Unraid kernel:

Nov 9 08:00:50 Unraid kernel: md: recovery thread: check P Q ...

Nov 9 08:00:50 Unraid Parity Check Tuning: Resumed: Scheduled Non-Correcting Parity-Check (25.3% completed)

 

so maybe it was the custom cron tab. But it has worked in the past. Was working flawlessly for several months until recently.


the current code that checks if it is inside a timeslot after mover finishes does not look at the custom settings but only at those set on the daily/weekly option.    That was why I suggested you set the daily option to correspond to the timeslot in the custom option.    Having done that you can re-instate the custom option so that you do not even start increments at the weekend.

 

I think I have a solution for an alternative way of handling the timeslot checking which will also work when using custom scheduling, but I want to make sure it is well tested before I release it.

Link to comment
  • 3 weeks later...

I have just pushed an update with the most notable changes being:

- Now improves handling of custom scheduling being used for increment pause/resume times 

- Set most notifications to be at normal (green) priority.   The ones left at 'warning (orange) and error (red) level are ones where the user really wants to take notice of them.   I would welcome any feedback if this is an improvement or suggestions for changing the  priority of specific notifications.

 

Hopefully I have not broken anything, but please report any anomalies that are spotted.

 

  • Thanks 1
Link to comment
2 hours ago, Masterwishx said:

Assistant page is working , but throw same error in syslog

Appears some DOS end-of-line characters have crept into the .page files and this is upsetting the underlying language translation sub-system.    Just made sure that all files have Linux style end-of-line characters and am now checking to see if this rectifies the issues.

 

EDIT:   Just pushed an update that corrects the EOL issue as it fixes the problem on my test system.   Please confirm if it fixes it for you as well.

 

The effect of the problem was that the plugin was functioning in terms of its background tasks, but the Settings page was not displaying.

 

  • Thanks 1
Link to comment
  • 4 weeks later...

@itimpi Thank you for this awesome plugin! 

 

Question for ya. I just kicked off my first parity check with the PCT plugin installed and have my parity checks scheduled from 1am to 9pm until the check is done.

 

My first increment paused at 28% and my question to you is can you transfer, remove, add files to the data drives during this pause and not cause issues with the pending parity check resumption?

 

 

Link to comment
2 hours ago, DaveHavok said:

@itimpi Thank you for this awesome plugin! 

 

Question for ya. I just kicked off my first parity check with the PCT plugin installed and have my parity checks scheduled from 1am to 9pm until the check is done.

 

My first increment paused at 28% and my question to you is can you transfer, remove, add files to the data drives during this pause and not cause issues with the pending parity check resumption?

 

 

 

Yes - I do this all the time.   The Unraid parity operations handle this correctly.  You can even do it if you are rebuilding a parity or data drive without it causing problems.

Link to comment
On 1/4/2024 at 12:59 AM, itimpi said:

 

Yes - I do this all the time.   The Unraid parity operations handle this correctly.  You can even do it if you are rebuilding a parity or data drive without it causing problems.


Excellent! Many thanks and so far, no issues with the plug-in. 

- Tonight is the last night it will run for my quarterly parity check

- I have the parity check running from 1am - 9am to reduce impact to PLEX traffic
- It will take a total of 4 evenings to finish a parity check of a 14TB drive
- I do have the heat pause options configured and enabled - this is an excellent option and it's been working great! 
- When temperatures are within 2 degrees of warning threshold, it pauses until an 8 degree drop and then starts back up again.

I would even say this should be a required plugin, given how powerful and useful it is. 

Link to comment
6 hours ago, DaveHavok said:


Excellent! Many thanks and so far, no issues with the plug-in. 

- Tonight is the last night it will run for my quarterly parity check

- I have the parity check running from 1am - 9am to reduce impact to PLEX traffic
- It will take a total of 4 evenings to finish a parity check of a 14TB drive
- I do have the heat pause options configured and enabled - this is an excellent option and it's been working great! 
- When temperatures are within 2 degrees of warning threshold, it pauses until an 8 degree drop and then starts back up again.

I would even say this should be a required plugin, given how powerful and useful it is. 

 

Thanks for the great feedback and confirmation that all is working well.   

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...