Jump to content

rsbuc

Members
  • Posts

    26
  • Joined

  • Last visited

Posts posted by rsbuc

  1. Hello! I was having an issue before where my Incremental parity checks were not reading the disk temperatures correctly when the disks had spun down (they were reporting "=*".

     

    I have updated to the latest version of the Parity Tuning Script, and now the script doesn't appear to be collecting/detecting the disk temperature at all anymore.

     

    here is a snippet from the syslog (with Testing logs enabled)

     

    ***

     

     

    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR begin ------
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR /boot/config/forcesync marker file present
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR manual marker file present
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR parityTuningActive=1, parityTuningPos=886346616
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR appears there is a running array operation but no Progress file yet created
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR MANUAL record to be written
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Current disks information saved to disks marker file
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written header record to  progress marker file
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... appears to be manual parity check
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR written MANUAL record to  progress marker file
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR Creating required cron entries
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Created cron entry for scheduled pause and resume
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Created cron entry for 6 minute interval monitoring
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   updated cron settings are in /boot/config/plugins/parity.check.tuning/parity.check.tuning.cron
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR CA Backup not running, array operation paused
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ... no action required
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR global temperature limits: Warning: 50, Critical: 55
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR plugin temperature settings: Pause 3, Resume 8
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   array drives=0, hot=0, warm=0, cool=0, spundown=0, idle=0
    Mar 11 13:30:22 219STORE Parity Check Tuning: DEBUG:   Array operation paused but not for temperature related reason
    Mar 11 13:30:22 219STORE Parity Check Tuning: TESTING:MONITOR ----------- MONITOR end ------
     

    ***

     

    the parity check tuning clearly shows "Warm=0, Cool=0, Spundown=0" but there are several disks above 55c.

     

    and heres a screenshot of the disk temps in the webui.

     

    (thanks again for reading this message)

    2023-03-11_diskTemps_Screenshot 2023-03-11 133428.png

  2. 2 hours ago, itimpi said:

    The plugin always treats the case where the temperature is returned as '*' due to spindown as 'cool' so there needs to be something else going on.   I will see if I can work out what it is from the syslog you provided.

    No worries, I appreciate the effort, if you'd like more info let me know.

  3. On 12/16/2022 at 8:16 AM, itimpi said:

    If you think the plugin is not correctly resuming when drives cool down, then perhaps you can try turning on the Testing level of logging in the plugin and sending me the resulting logs as that will allow me to see the fine detail of what the plugin is doing under the covers.   Testing the temperature related stuff is extremely tricky as my systems do not suffer from heat issues so I have to artificially try to set up tests to simulate temperature issues.

    I've finally had a few mins to test this out with the TESTING log mode enabled. I think you were hinting at what I've seen.

     

    When the array goes into 'overheat mode' and the parity check pauses, the disks eventually spin down and the temperature value in the log goes to "Temp=*" instead of showing an actual Temperature value, so the Parity Check Tuning script doesn't see a valid numerical temperature value to resume the parity check process.

     

    after waiting ~12minutes, I manually clicked 'spin up disks' and then 6minutes later the parity check process resumed as it was able to see the temperature values when the disks were spun up.

     

    I'm attaching my syslog.

    syslog.txt

  4. 2 hours ago, itimpi said:

    This is the basic behaviour as long as the time is within the overall time slot set for an increment.   How long a temperature related pause will last depends on how quickly your drives cool down to reach the resume temperature threshold.  The plugin will take into account if you have set specific temperature threshold settings at the Unraid level on a drive over-riding the global ones.  You may find the Debug logging level helps with a basic understanding of what the plugin is doing without having to know too much detail of the underlying mechanisms being used.

     

    Once you get outside the time slot for the overall increment then the plugin will pause the check and the temperature related pause/resume will stop happening (until the time comes around to start the next increment).

     

    If I can provide any further clarification then please ask.  As a new user if you can think of items I could add to the built-in help that would have helped you then please feel free to suggest them.

    Interesting, I've enabled Debug logging, and that totally demystifies a lot of what the plugin is doing (Thanks for that). Here is what I'm seeing (I'm sure I have a bad setting or something) -- I start the parity check, it runs for an hour or so, then the hard drives hit their temperature limit, and the parity check pauses. The drives spin down, and the drives cool off, but the plugin doesn't seem to resume the parity operations.

     

    If I "Spin up all disks" it will detect the drive temperatures as being cool again and resume the parity check.

     

    are there special disk settings that I need to enable for this to work properly?

    (also, thanks again for trying to helping me out!)

    Screenshot 2022-12-15 152350.png

    Screenshot 2022-12-15 152505.png

  5. 17 hours ago, itimpi said:

    I think you are over-thinking this!     You only want to set the increment pause/resume times to define the maximum time period you want the parity check to potentially run.

     

    You then set the temperature related pause resume values and as long as you are within the increment period the plugin will pause/resume the check based on disk temperatures.    You may also want to have aggressive spin down times on the drives as experience has shown that simply keeping them spinning even if no I/O is taking place significantly extends the cool down time.

    Hello! Am I understanding this correctly? The plugin will pause the parity operation when the disks reach the temperature threshold and wait until the temps fall below the temperature threshold value -- then the script immediately resume the parity operations? or will it only attempt to resume after the 'Increment resume time' schedule?

  6. Hey Everyone! I've been trying to get the "Increment Frequency/Custom" working for what I need, but I'm struggling.

     

    I have cooling issues with my Unraid, and what my goal is to allow the the Parity Check to 'Pause when disks overheat', then have the Custom Increment frequency pause the Parity operations for ~30mins to let the disks cool down, and then resume (or at least check if the disks are cooled down enough) and then Resume parity operations.

     

    Clearly my cron skills are weak, is there an "Increment Resume Time" and "Increment Pause Time" that someone can suggest?

     

    (thanks again for all the awesome features in the Parity Check Tuning plugin!)

     

×
×
  • Create New...