SMART test spindown reschedule/delay


Recommended Posts

Please consider adding some form of spin down reschedule or delay if an active SMART test is in progress.

A method to test would be to check the smart status right before triggering the hdparm -y

 

root@rgclws:/home/rcotrone $ smartctl -c -lselftest /dev/sdb
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-573.3.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline 
data collection:                (  617) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 143) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1081) SCT Status supported.


SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%      1403         -
# 2  Extended offline    Completed without error       00%        54         -
# 3  Short offline       Completed without error       00%        48         -
# 4  Conveyance offline  Completed without error       00%        48         -

 

Using these parameters you can tell if a test is in progress and possibly reschedule the spindown or poll for it at some interval.

i.e.

Extended self-test routine recommended polling time:        ( 143) minutes.

 

$ smartctl -lselftest /dev/sdb | egrep -i 'in progress'

Self-test execution status:      ( 249) Self-test routine in progress...

# 1  Extended offline    Self-test routine in progress 90%      1403        -

 

Currently the only way to safely do a long surface test is to disable the spindown timer completely, trigger the test and re-enable it later.

This prevents a user from scheduling a test automatically with smartd or via cron jobs.

If a user forgets to disable the spindown timer, the test gets interrupted.

 

 

From what I've seen smart access does not update the  /proc/diskstats.

 

$ cat /proc/diskstats  | grep sdb

8      16 sdb 255 52 2456 38 0 0 0 0 0 38 38

8      17 sdb1 36 0 288 3 0 0 0 0 0 3 3

 

$ smartctl -c -lselftest /dev/sdb | egrep -i 'in progress'

Self-test execution status:      ( 249) Self-test routine in progress...

# 1  Extended offline    Self-test routine in progress 90%      1403        -

 

$ cat /proc/diskstats  | grep sdb                         

8      16 sdb 255 52 2456 38 0 0 0 0 0 38 38

8      17 sdb1 36 0 288 3 0 0 0 0 0 3 3

 

 

Therefore the only other way would be to do periodic reads or writes to the device, which seems counter productive.

While this might work for a data drive. i.e. touching /mnt/disk#/. periodically it would not work for the parity drive.

In addition, that would force 2 drives to stay spinning for the duration of the test.

 

 

Potential tests might be to do a fdisk -l on the device, but from what I remembered in the past, sometimes this data is cached and doesn't update the /proc/diskstats as well.

 



unraid 5


root@unRAID ~ $cat /proc/diskstats | grep sde    
   8      64 sde 44403954 1801819464 1885201847 811529720 1725092 19086787 166616088 56446660 0 110102400 867974200
   8      65 sde1 44403934 1801819434 1885201447 811528780 1725092 19086787 166616088 56446660 0 110101440 867973250
root@unRAID ~ $fdisk -l /dev/sde >/dev/null 2>&1 
root@unRAID ~ $cat /proc/diskstats | grep sde    
   8      64 sde 44403954 1801819464 1885201847 811529720 1725092 19086787 166616088 56446660 0 110102400 867974200
   8      65 sde1 44403934 1801819434 1885201447 811528780 1725092 19086787 166616088 56446660 0 110101440 867973250


unraid 6

root@unRAIDm:~# cat /proc/diskstats | grep sdj
   8     144 sdj 87628334 1377506127 11721076108 30880980 208 2958 25344 133 0 13203439 30867744
   8     145 sdj1 87628277 1377506127 11721075316 30880818 208 2958 25344 133 0 13203260 30867494
root@unRAIDm:~# sfdisk -l /dev/sdj >/dev/null 2>&1 
root@unRAIDm:~# cat /proc/diskstats | grep sdj     
   8     144 sdj 87628334 1377506127 11721076108 30880980 208 2958 25344 133 0 13203439 30867744
   8     145 sdj1 87628277 1377506127 11721075316 30880818 208 2958 25344 133 0 13203260 30867494

 

 

If this doesn't seem feasible, then at least let us configure an alternate program to trigger the spindown so an agent can be dropped in to do the test logic or an emhttp api that adds a configurable number of minutes of delay or an external method to turn off/on the specific drive's spindown timer.

i.e if we know the recommended polling time: we can  submit via emhttp http api call to delay the spin down.

Short self-test routine recommended polling time:        (   1) minutes.
Extended self-test routine recommended polling time:        ( 143) minutes.
Conveyance self-test routine recommended polling time:        (   2) minutes.

 

 

Ideally I want to schedule these tests automatically on some interval without having to alter the timer manually via the webgui and not having the test interrupted.

Link to comment

... Ideally I want to schedule these tests automatically on some interval without having to alter the timer manually via the webgui and not having the test interrupted.

 

I have added an automatic disable/enable of the disk spindown delay when a SMART self-test is started from the GUI. This will prevent the running self-test to get aborted unintentionally.

 

I also have created a script which allows to change the setting of a disk spindown delay outside the GUI, and this can be used in a bash script or cron entry. This script is called "spindowndelay".

 

It's syntax is: spindowndelay <device> [<delay>]

 

To set a new spindown delay value, include the parameter <delay> (e.g. 0 = disable delay). To restore the original spindown delay value, omit the second parameter.

 

A rudimentary example of the script test.sh below:

#!/bin/bash
spindowndelay $1 0
smartctl -t $2 /dev/$1
while true; do
  sleep 30
  if ! smartctl -c /dev/$1|grep -Pom1 '\d+%'; then
    spindowndelay $1
    break
  fi
done

The above script is called as: test.sh sdb short and performs a 'short' self-test on disk 'sdb'. Prior to the self-test it will disable the disk spindown delay, and restores it when the self-test is finished.

 

Is this helpful for your situation?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.