[Plugin] Parity Check Tuning


367 posts in this topic Last Reply

Recommended Posts

Since I have remote plex users these days I decided I needed to to something about my monthly parity check.

 

I manually paused the monthly during the day, and after restarting and completing it on the 2nd night, I installed this plugin.

 

I get the error mentioned when checking History, but the plugin hasn't actually needed to do anything yet since there haven't been any parity checks since I installed it.

 

Don't know if that is useful to debugging this issue or not.

Link to post
  • Replies 366
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Popular Posts

Parity Check Tuning plugin   The Parity Check Tuning plugin is  primarily designed to allow you to split a parity check into increments and then specify when those increments should be run.

i have been working through all cases where this can happen in the code and I think I now have them all fixed in the version running on my test server.  There was a number of places in the code where

I am currently working on the code to allow array operations to be restarted (resumed) from where they were as long as: the array was shutdown cleanly there have been no changes to the

Posted Images

3 hours ago, trurl said:

Since I have remote plex users these days I decided I needed to to something about my monthly parity check.

 

I manually paused the monthly during the day, and after restarting and completing it on the 2nd night, I installed this plugin.

 

I get the error mentioned when checking History, but the plugin hasn't actually needed to do anything yet since there haven't been any parity checks since I installed it.

 

Don't know if that is useful to debugging this issue or not.


The error displayed trying to look at the parity check history is completely divorced from the main operation of the plugin so in that sense you should not be affected by it.

 

in addition it does not seem to be fatal even when trying to display the history.  However since the line it complains about is in a file that handles multi-language support in unRaid this may not be true if you have a language other than English set in unRaid (I have not tested if this is the case).   I am still trying to track down why it happens in the first place on one of my unRaid systems and not another both of which are running unRaid 6.9.1.

Link to post
10 hours ago, itimpi said:

 

Just thought I would let you know that tracking down the cause of this bug is proving more elusive than I had expected.  As I mentioned I can reproduce this bug on one 6.9.1 system and not another.  I have checked and they both have identical copies of the script for displaying the history, and on one it works without error and the other displays the error you see.  Still trying to pin down the difference that causes this issue.

I have this message too.  Is there anything that I can provide that may help?

Link to post
22 minutes ago, S80_UK said:

I have this message too.  Is there anything that I can provide that may help?

Not at the moment as I have a system that can reproduce the symptoms.

Link to post

Finally tracked down what has been causing the error line when displaying the Parity Check history.    Turned out to be a hidden ESC character that had crept into the script file.   Was not enough to cause a syntax check of the file to show an error, but it was enough to upset displaying the history results (deep inside one of the unRaid GUI support functions).   I will shortly issue an update to fix this.

Link to post

Hi @itimpi - I just thought I would update you.  I have run several scheduled and manual checks over the past few days  using 6th April version and 2nd April version, and I have tested with and without scheduled pauses of the checking.  All tests have worked and run to completion exactly as they should.  The only cases I have not tested are for unplanned checks after an unclean shutdown. 

 

So thanks again for your work on this, it's looking very good. 

Link to post
1 minute ago, S80_UK said:

Hi @itimpi - I just thought I would update you.  I have run several scheduled and manual checks over the past few days  using 6th April version and 2nd April version, and I have tested with and without scheduled pauses of the checking.  All tests have worked and run to completion exactly as they should.  The only cases I have not tested are for unplanned checks after an unclean shutdown. 

 

So thanks again for your work on this, it's looking very good. 


Thanks for the feedback.   Nice to know there is no issue outstanding that I should be looking at with any sense of urgency.  
 

The unplanned checks I think should be OK as since it was a new feature it got a bit more testing in the recent releases than previously existing functionality.

Link to post
On 4/2/2021 at 1:12 PM, itimpi said:

 

Not sure why you should get this if on the latest release of the plugin :( 

 

To help with diagnosing the cause can you install the new version of the plugin (2021-04-02) I have just released and then in the plugin's settings set the logging level to "Testing" and select the one of the options for logging to flash.   The plugin will now create a file called 'parity.tuning.log' in the plugins folder on the flash drive when it is running so if you can recreate the issue and let me have that file it should help me pinpoint where things are going wrong.

 

 

Hi, since I had not really the time in the past weeks to do so, I had now.  I installed the newest Version 2021.04.13 and set it to Testing and writing Syslog and Flash. 

 

0,7% later I get the error while the drives aren't hot (36-37°C). 

 

You said the log is the "parity.tuning.log", but I've only a "parity.check.tuning.log". Is that the one? It is in the folder "/boot/config/plugins/parity.check.tuning". I saved all of the files from this specific folder.

 

I attached the file and hope this can help to make tings clear. Maybe you need another file oder something else? Just ask and I can provide it. Thank you for your help. The Temperature is THE feature, why I installed this plugin. 

 

Kind Regards

 

Nils

 

 

EDIT:

What I don't understand. When the check is running with about 260MB/s, why should it need over 13 hours, when it reaches 0,7% in less than a minute? When 0,7% was exact a minute the estimated time should be about 143 minutes.

 

Edit 2:

I resumed and it stopped again at 1,4% (2x0,7...). with disk temperatures of 38, 39 and 40°C. 

What I noticed: I could not click on resume, because the button showed "pause". Refreshing the site helps and the button showed "resume".

 

Edit 3:

Same with 2,1%. Same temperatures as with 1,4%. It could be a very big coincidence, but it seems to pause with every 0,7% of progress.

 

Edit 4:

2,8% was no problem, but it stopped at 2,9%. Maybe some rounding? I resumed and disabled pause when overheating. At the check reaches 4,2% I reenabled the pause when overheating, again.

0,4% later at 4,6% it pause again. I attached the log, too.

 

parity.check.tuning.log

parity.check.tuning.log

Edited by Marino
Link to post
52 minutes ago, Marino said:

but I've only a "parity.check.tuning.log"

That is the correct file - sorry about giving you the wrong name initially.

54 minutes ago, Marino said:

What I don't understand. When the check is running with about 260MB/s, why should it need over 13 hours, when it reaches 0,7% in less than a minute? When 0,7% was exact a minute the estimated time should be about 143 minutes

Early on the speed is just a rough estimate - it gets more accurate as you go further.

 

56 minutes ago, Marino said:

What I noticed: I could not click on resume, because the button showed "pause". Refreshing the site helps and the button showed "resume".

 

You are correct - you have to do a refresh to get it to show correctly.   I have asked if there is any way to force a refresh from within the plugin but had no reply that gives a way to do this.

 

58 minutes ago, Marino said:

Same with 2,1%. Same temperatures as with 1,4%. It could be a very big coincidence, but it seems to pause with every 0,7% of progress.

It will be a coincidence as there is nowhere in the plugin that is monitoring the percentage - it is just used for display purposes.

 

I'll give more feedback when I have had a chance to look at that log

Link to post

@Marino Found out what looks like the cause of your temperature problems.   I think the plugin is working correctly but you have misunderstood the way the temperature values are used for the temperature related pause/resume :(   From the log I think that you have entered actual temperatures rather than the amount away from the warning threshold set for the drive in the drive's settings?

The reason you do not get an immediate pause is that the task that looks for over-heating drives only runs at regular intervals (currently set to be every 7 minutes). 

 

As an example if the warning threshold on a particular drive is 50C then entering values of 

Pause=2 means pause at 48C (50-2)

Resume=7 means resume at 43C (50-7)

Unraid provides a global value for the warning threshold under Settings->Disk Settings but allows you to override the global value at the individual drive level by clicking on it on the Main tab. The plugin works this way as different drives can have different values set at the unRaid level so using relative values means each drive can potentially have different pause/resume temperatures. 

 

Can you please confirm that my analysis is correct?   If it is I will enhance the built-in help with a worked example of the type given above.  I will also add some sort of upper limit to the values that can be entered to try and pick this type of misunderstanding up from the outset on the plugin's settings page.

Link to post

Yes, this is correct. I entered 47°C for pause and 42°C for resume. Could it be that this was a correct way to select the temperature in the past? I haven't used the unraid server not very much in the several year. Before that I used the plugin and it was working.

 

Maybe these are my old settings on a newer plugin? I also used time for pause and resume in the past. Now I see it is in crontab format. This has changed too. Maybe the settings for temperature too?

 

Thanks for looking at the log. I am in the middle of checking parity as I am getting a new drive which shoud be the new parity drive and the "old 12 TB" drive should be a data disk then. In the past the temperatures were increasing very much while building parity and I don't know why. This plugin should help me to get this job done without any damage.

 

The first time I am building parity on this (3x12TB) the disks are reaching 53°C while in the middle of airflow. I build first parity with an open Window in winter (cold and dry outside)... Thats why the plugin is important for me. Now 60% of the parity was checked and the temperatures are the same as at beginning (39-40°C). Don't know why they were increasing that much on building first parity. 

Link to post
31 minutes ago, Marino said:

Yes, this is correct. I entered 47°C for pause and 42°C for resume. Could it be that this was a correct way to select the temperature in the past? I haven't used the unraid server not very much in the several year. Before that I used the plugin and it was working

It has always been specified this way.   At one point the temperature option was not working properly so it may well have been that when you had that setting and the setting was simply having no effect.

 

32 minutes ago, Marino said:

I also used time for pause and resume in the past. Now I see it is in crontab format. This has changed too. Maybe the settings for temperature too?


If you have specified Daily (which is the default) for the frequency then you specify time in hours + minutes.   Originally this was the only option.     If you specify Custom as the frequency then you can use cron tab format which gives you more control at the expense of not being as convenient to use.   This was added some time ago now and is very useful to me when testing as it allows for options that are not simply daily.

Link to post

Maybe thats is the fact. I set it to 47°C and it wasn't working flawlessly and wasn't noticed because it runs without problems and the disks weren't hot. The last time i checked the parity is over a year ago (most time the server was not running).

 

Good to hear that daily has the right time. Because the server ist most of the time switched off, I'll start the parity check manually, so I the "normal" time format fits better for me ;)

 

Thank you for your help and for you awesome plugin!

Link to post
  • 3 weeks later...

Having a problem with parity check reporting overheated, but I cannot figure out why:

 

Quote

2021 May 01 12:39:15 TOWER Parity Check Tuning: TESTING: ----------- UPDATECRON begin ------
2021 May 01 12:39:15 TOWER Parity Check Tuning: TESTING: Deleted cron marker file 
2021 May 01 12:39:15 TOWER Parity Check Tuning: DEBUG:   created cron entry for scheduled pause and resume
2021 May 01 12:39:15 TOWER Parity Check Tuning: DEBUG:   created cron entry for default monitoring 
2021 May 01 12:39:15 TOWER Parity Check Tuning: TESTING: updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
2021 May 01 12:39:15 TOWER Parity Check Tuning: TESTING: ----------- UPDATECRON end ------
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: ----------- MONITOR begin ------
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: progress marker file present
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: disks marker file present
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: hot marker file present
2021 May 01 13:17:02 TOWER Parity Check Tuning: DEBUG:   Parity check appears to be paused
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: plugin temperature settings: Pause 3, Resume 8
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: Drive 84 appears to be critical
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: parity temp=84, status=critical (drive settings: hot=27, cool=18)
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: Drive 84 appears to be critical
2021 May 01 13:17:02 TOWER Parity Check Tuning: TESTING: disk1 temp=84, status=critical (drive settings: hot=27, cool=18)
2021 May 01 13:17:03 TOWER Parity Check Tuning: TESTING: Drive 82 appears to be critical
2021 May 01 13:17:03 TOWER Parity Check Tuning: TESTING: disk2 temp=82, status=critical (drive settings: hot=27, cool=18)
2021 May 01 13:17:03 TOWER Parity Check Tuning: TESTING: Drive 91 appears to be critical
2021 May 01 13:17:03 TOWER Parity Check Tuning: TESTING: disk3 temp=91, status=critical (drive settings: hot=27, cool=18)
2021 May 01 13:17:03 TOWER Parity Check Tuning: DEBUG:   array drives=4, hot=4, warm=0, cool=0
2021 May 01 13:17:03 TOWER Parity Check Tuning: Paused Correcting Parity Check  (3.7%% completed): Following drives overheated: parity(84) disk1(84) disk2(82) disk3(91) 
2021 May 01 13:17:03 TOWER Parity Check Tuning: TESTING: PAUSE (HOT) record to be written
2021 May 01 13:17:04 TOWER Parity Check Tuning: TESTING: written PAUSE (HOT) record to  progress marker file 
2021 May 01 13:17:04 TOWER Parity Check Tuning: TESTING: ----------- MDCMD begin ------
2021 May 01 13:17:04 TOWER Parity Check Tuning: TESTING: progress marker file present
2021 May 01 13:17:04 TOWER Parity Check Tuning: TESTING: disks marker file present
2021 May 01 13:17:04 TOWER Parity Check Tuning: TESTING: hot marker file present
2021 May 01 13:17:04 TOWER Parity Check Tuning: DEBUG:   detected that mdcmd had been called from sh with command mdcmd nocheck PAUSE 
2021 May 01 13:17:04 TOWER Parity Check Tuning: TESTING: CANCELLED record to be written
2021 May 01 13:17:04 TOWER Parity Check Tuning: TESTING: written CANCELLED record to  progress marker file 
2021 May 01 13:17:05 TOWER Parity Check Tuning: TESTING:  array operation still running - so not time to analyze progess
2021 May 01 13:17:05 TOWER Parity Check Tuning: TESTING: Deleted cron marker file 
2021 May 01 13:17:05 TOWER Parity Check Tuning: DEBUG:   created cron entry for scheduled pause and resume
2021 May 01 13:17:05 TOWER Parity Check Tuning: DEBUG:   created cron entry for default monitoring 
2021 May 01 13:17:05 TOWER Parity Check Tuning: TESTING: updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
2021 May 01 13:17:05 TOWER Parity Check Tuning: TESTING: ----------- MDCMD end ------
2021 May 01 13:17:05 TOWER Parity Check Tuning: TESTING: Heat notification message: Pause: Following drives overheated: parity(84) disk1(84) disk2(82) disk3(91) 
2021 May 01 13:17:06 TOWER Parity Check Tuning: TESTING: Send notification: Pause: Following drives overheated: parity(84) disk1(84) disk2(82) disk3(91) <br>Correcting Parity Check (3.7%% completed)
2021 May 01 13:17:06 TOWER Parity Check Tuning: TESTING: ... using /usr/local/emhttp/webGui/scripts/notify -e 'Parity Check Tuning' -i 'normal' -s '[TOWER] Pause' -d 'Following drives overheated: parity(84) disk1(84) disk2(82) disk3(91) <br>Correcting Parity Check (3.7%% completed)'
2021 May 01 13:17:10 TOWER Parity Check Tuning: TESTING: ----------- MONITOR end ------
 

My drive settings have temp warning at 45C and each drive shows the same.  I dont know where the 27C value for the shutdown is coming from.

 

Any Ideas where to check?

Link to post

Strange - it looks like some of those messages may be reported in Celsius and others in Fahrenheit?    I wonder if that is causing an inconsistency somewhere?   What temperature unit do you have set under the unRaid Display settings?   Also, can you check which version of the plugin is installed?

 

it might also be useful if you can provide the contents of your config/plugins/dynamix/dynamix.cfg file from the flash drive so I can check the temp unit setting on your system.   I would expect the temperatures in the log messages to have C or F appended and that does not appear to be happening.   On checking the code the only way I can see this happening I if a temperature unit type is not set in the .cfg file as the plugin does not assume any default (which I can change so it does).

 

EDIT:  I have found that you san definitely get some unexpected behaviour if the temperature unit is not set at the unRaid level.  I am ready to push out an update for the plugin that applies a default if not set at the unRaid level.  I would be grateful to know if you can let me know if going into Settings -> Display Settings and setting Celsius and hitting Apply fixes your problem as that would confirm I am fixing the correct issue.

 

 

 

 

Link to post
Quote

[parity]
mode="3"
day="6"
hour="0 0"
write=""
dotm="1"
[ssmtp]
service="::NO:NO:none"
SetEmailPriority="True"
Subject="unRAID Status: "
port="465"
UseTLS="YES"
UseSTARTTLS="NO"
UseTLSCert="NO"
[notify]
entity="1"
normal="3"
warning="3"
alert="3"
unraid="3"
plugin="3"
docker_notify="3"
report="3"
display="0"
date="d-m-Y"
time="H:i"
position="top-right"
path="/tmp/notifications"
system="*/1 * * * *"
unraidos="11 0 * * 1"
version="10 0 1 * *"
docker_update="10 0 1 * *"
status="20 0 * * 1"
[display]
font=""
date="%c"
number=".,"
scale="-1"
tabs="1"
users="Tasks:3"
resize="0"
wwn="0"
total="1"
usage="0"
header=""
background=""
banner=""
dashapps="icons"
theme="white"
text="1"
unit="C"
 

 

Here is the dynamix.cfg (some entries removed email/password etc).

 

I did try to make sure that the Display settings were C.  When I viewed the settings, it was C but I toggled it to F and back to C.  Then resumed parity check.  It failed again.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.