Spin down SAS drives


doron

Recommended Posts

Awesome indeed.

I'm now running this script off of rsyslog - whenever the spindown message shows up. Short testing so use at your own risk but - seems to work like a charm.

 

Woo hoo!

 

EDIT: This indeed works, but it seems that the drive doesn't stay spun down for very long - "someone somewhere" appears to spin it back up a few moments later. Obviously the dot remains grey and not green, but the disk does start revolving again... Hmm. This does not seem to happen with SATA drives.

 

#!/bin/bash

#
# Spin down SAS drives - stopgap script until Unraid does organically.
#
# This script is initiated via syslog - when Unraid issues the "spindown n" message.
# If the drive is SAS, the scipt will issue the commands to spin down a SAS drive.
#
# Spin up is not implemented - assumed to "just happen" when i/o is directed at drive.
#
# @doron 2020-08-30

MDCMD=/usr/local/sbin/mdcmd
SG_MAP=/usr/bin/sg_map
SG_START=/usr/bin/sg_start
SMARTCTL=/usr/sbin/smartctl

grep -qe "mdcmd.*spindown" <<< "$1" || exit 0

# Get syslog line without line breaks and whatnot
LINE=$(paste -sd ' ' <<< $1)

# Obtain Unraid slot number being spun down, from syslog message
SLOTNUM=$(sed -r 's/.*: *spindown ([[:digit:]]*).*/\1/' <<<  $LINE)

# Get the device name from the slot number
RDEVNAME=$($MDCMD status | grep "rdevName.$SLOTNUM" | sed 's/.*=//')

if [ "$($SMARTCTL -i /dev/$RDEVNAME |
        grep protocol |
        sed -r 's/.*protocol: *(.*) .*/\1/')" == "SAS" ]
        then

  # Figure out /dev/sgN type name from /dev/sdX name
  SGDEVNAME=$($SG_MAP | grep "/dev/$RDEVNAME" | sed -r 's/(.*)[[:space:]].*/\1/' )

  if [ "$SGDEVNAME" != "" ] ; then

        # Do the magic
        $SG_START --pc=3 $SGDEVNAME
        logger -t "SAS Assist" "spinning down slot $SLOTNUM, device $SGDEVNAME"

  fi

fi

 

Triggering it - you can do by placing the script somewhere permanent and adding something like this into a conf file in /etc/rsyslog.d:

:msg,contains,"spindown" ^PATH-TO-SCRIPT

 

Edited by doron
Link to comment

am I doing something wrong (highly possible), or will this not affect unassigned devices?

 

root@Cube:~# sg_start -vvv --pc=3 /dev/sg13
open /dev/sg13 with flags=0x802
    start stop unit command: 1b 00 00 00 30 00
      duration=0 ms
root@Cube:~# sdparm --command=sense /dev/sg13
    /dev/sg13: SEAGATE   DKS2E-H4R0SS      7FA6
Additional sense: Standby condition activated by command

 

looks fine. but if I touch the Main page and refresh unassigned devices, the dot stays green, and a re-run of sense gives me...

root@Cube:~# sdparm --command=sense /dev/sg13
    /dev/sg13: SEAGATE   DKS2E-H4R0SS      7FA6
root@Cube:~# <end>

 

I'm guessing the refresh "touched" the device?

Link to comment

i found that smartctl requests can spin up the drives. I have set my device poll to 18000 which means they wake up every 5 hours. Not sure about UD as dont have a SAS drive as UD only SATA

 

I have also found a way to use the sdX names by using the --readonly so the device is opened as readonly and not readwrite.

 

sg_start  --readonly --pc=3 /dev/sdd

 

also sg_raw can be used also.

 

sg_raw -v -R  /dev/sdd 1b 00 00 00 10 00
    cdb to send: [1b 00 00 00 10 00]
SCSI Status: Good 


root@Tower:~# sg_raw -v -R  /dev/sdd 1b 00 00 00 30 00
    cdb to send: [1b 00 00 00 30 00]
SCSI Status: Good 

 

Edited by SimonF
Link to comment
On 8/30/2020 at 6:25 AM, SimonF said:

i found that smartctl requests can spin up the drives. I have set my device poll to 18000 which means they wake up every 5 hours. Not sure about UD as dont have a SAS drive as UD only SATA

 

I have also found a way to use the sdX names by using the --readonly so the device is opened as readonly and not readwrite.

 

sg_start  --readonly --pc=3 /dev/sdd

 

also sg_raw can be used also.

 

sg_raw -v -R  /dev/sdd 1b 00 00 00 10 00
    cdb to send: [1b 00 00 00 10 00]
SCSI Status: Good 


root@Tower:~# sg_raw -v -R  /dev/sdd 1b 00 00 00 30 00
    cdb to send: [1b 00 00 00 30 00]
SCSI Status: Good 

 

 
 
 
 
 
 

@SimonF how do you change your smart device poll time for just the SAS drive?

Edited by keshavdaboss
Link to comment

For the users that have tried this, how many watts do you see your UPS dropping per disk that you put in standby? and total? I have 17 disks total (15 data + 2 parity) of which 4 are SAS disks. When I put mine in standby using the newly found methodology, I don't really see any improvement in power consumption. Curious what others are seeing.

Link to comment

My array consists of 4 SAS drives and 2 SSDs. At idle with only my background processes and the SSDs it’s using 40W, at idle with the SAS array unnecessarily spinning it uses 80W, and with the drives active and working, it’s around a 100W.

I have a little current meter on my plug to measure and track energy usage and cost.

Link to comment
On 8/30/2020 at 4:25 PM, SimonF said:

i found that smartctl requests can spin up the drives. I have set my device poll to 18000 which means they wake up every 5 hours. Not sure about UD as dont have a SAS drive as UD only SATA

I took some time to do a longer-running test, monitoring all my SAS drives every few seconds, to try and get a more comprehensive picture as to what's actually going on.

 

Bottom line: The state 3 thing does work, and it spins down the drives, so that subsequent i/o wakes them up. However these SAS drives tend to spin back up for various other reasons, not all of which I can yet map. Indeed the periodical SMART instructions spins them up, but these aren't the only events causing that.

 

I have a script that spins a SAS drive down when the syslog message about it is spewed by Unraid. I wrote an additional small script that looks for all SAS drives that should currently be spun down (aka greyed in the UI), and if they're not really spun down - spins them back down. I had it run immediately after the SMART query (you can do that with a "plugin EVENT"), to eliminate that cause, and kept the monitor running.

What I found is that some of these drives keep spinning back up - after a few minutes or seconds - with no apparent reason (i.e. Unraid did not spin them up, and I don't see i/o being done against them). 

 

At this time I don't have a good guess as to why this happens.

Link to comment

@doron not sure if background media scans happen if drives in standby.

 

I have disabled on my drives for when i was using idle timers.

 

sdparm --clear=EN_BMS --save /dev/sdX

 

You can see if background scans are active on Smartctl.

 

Also maybe some other functions are accessing drive as I know Smartctl doesn't work with the standby option on SAS. Does the GUI check for standby?

 

Do you run the IPMI plugin and checking disk temps as this may cause spinups also.

Link to comment
9 hours ago, doron said:

I took some time to do a longer-running test, monitoring all my SAS drives every few seconds, to try and get a more comprehensive picture as to what's actually going on.

 

Bottom line: The state 3 thing does work, and it spins down the drives, so that subsequent i/o wakes them up. However these SAS drives tend to spin back up for various other reasons, not all of which I can yet map. Indeed the periodical SMART instructions spins them up, but these aren't the only events causing that.

 

I have a script that spins a SAS drive down when the syslog message about it is spewed by Unraid. I wrote an additional small script that looks for all SAS drives that should currently be spun down (aka greyed in the UI), and if they're not really spun down - spins them back down. I had it run immediately after the SMART query (you can do that with a "plugin EVENT"), to eliminate that cause, and kept the monitor running.

What I found is that some of these drives keep spinning back up - after a few minutes or seconds - with no apparent reason (i.e. Unraid did not spin them up, and I don't see i/o being done against them). 

 

At this time I don't have a good guess as to why this happens.

 @doron What is the best way to implement such scripts ? This reaches the limits of my Slackware knowledge 

Thanks!

Link to comment
46 minutes ago, Cilusse said:

 @doron What is the best way to implement such scripts ? This reaches the limits of my Slackware knowledge 

Thanks!

Ah. Note that both scripts are currently temp hacks, so I haven't yet bothered to install them properly (or at least ensure that they survive a reboot).

 

Re the syslog script: you can add a file into /etc/rsyslog.d named 99-<something>.conf, containing the following:

:msg,contains,"spindown" ^/path/to/script

after which you need to restart rsyslogd:

/etc/rc.d/rc.rsyslogd restart

 

Re the event script: This is based on the Unraid plugin event system (plugins can ask for an upcall from Unraid in case of certain "events", one of which is the SMART collection). To do it "properly" you need to have a plugin. Since this is a hack, I just piggy-backed on an existing plugin. Any will do; I used "User Script", which originally does not make use of this event (called "poll_attributes"). 

So: in /usr/local/emhttp/plugins/<your-selected-plugin>/event/ , place your script under the name "poll_attributes" and automagically, it will run right after every SMART poll.

Note that this script blocks emhttp (i.e. emhttp waits for it to complete). I paste mine below (I shared my syslog script in a previous message).

 

#!/bin/bash

SG_MAP=/usr/bin/sg_map
SG_START=/usr/bin/sg_start
SMARTCTL=/usr/sbin/smartctl
SDPARM=/usr/sbin/sdparm

DISKS_INI=/var/local/emhttp/disks.ini

(

# Get a list of disks that are expected to be spun down right now
DISKS_SPUN_DOWN=$(cat $DISKS_INI |
        paste -sd '^' |
        sed  -e 's/\^\[/\n\[/g' |
        tr "^" " " |
        grep DISK_OK |
        grep "color=\"green-blink\"" |
        sed -r 's/.* device=\"([a-z0-9]*).*/\1/' )

for RDEVNAME in $DISKS_SPUN_DOWN ; do

  # If it's a SAS device
  if [ "$($SMARTCTL -i /dev/$RDEVNAME |
        grep protocol |
        sed -r 's/.*protocol: *(.*) .*/\1/')" == "SAS" ]
        then

    # Figure out /dev/sgN type name from /dev/sdX name
    SGDEVNAME=$($SG_MAP | grep "/dev/$RDEVNAME" | sed -r 's/(.*)[[:space:]].*/\1/' )

    if [ "$SGDEVNAME" != "" ] ; then

        # If it's not currently spun down...
        if ! grep -iq "standby condition activated" <<< $($SDPARM --command=sense $SGDEVNAME) ; then

                # ... Do the magic
                $SG_START --pc=3 $SGDEVNAME
                logger -t "SAS Assist" "spinning device $RDEVNAME back down"

        fi
    fi

  fi

done
) &

 

 

Edited by doron
Link to comment

Another upvote from me, been having this problem for a couple of years now, and with 15 $TB SAS drives spinning day and night, paid a few quid more than I have needed/wanted to!

Currently run 15 14 Seagate ST4000NM0023 (SAS) and 2 ST4000DM005 (SATA) drives through 2 LSI 9207-8i in IT mode. Would love a simple way to put the drives into standby!

 

Link to comment
7 minutes ago, SimonF said:

You can use the sdX drive if you use -r this would simplify your script. 

Absolutely. I had the code done before you came up with the new and improved, so it's still there.

If / when I repack it as a plugin, I'll probably improve that aspect too.

  • Like 1
Link to comment

Vote from me. Just built my new Unraid server to replace two old NAS boxes. 4x SAS drives and 4x SATA drives running on an LSI 9211-8i in IT mode. Unraid will spin down the SATA drives but not the SAS drives. Lots of unnecessary heat and power consumption with drives spinning away when they aren't in use 90% of the time!

Edited by absolute_badger
Link to comment

Yes they are the same, how did you check standby? I found device polling would spin them up.

 

Using sdX without -r or --readonly I saw an entry in the syslog for the disk and no action on the drive spin down. sgX you do not need to specify readonly.

 

Command I use to see spindown is as follows.

 

sdparm --command=sense /dev/sdg

    /dev/sdg: HGST      HUS724030ALS640   A1C4
Additional sense: Standby condition activated by command

Link to comment
7 hours ago, SimonF said:

Yes they are the same, how did you check standby? I found device polling would spin them up.

 

Using sdX without -r or --readonly I saw an entry in the syslog for the disk and no action on the drive spin down. sgX you do not need to specify readonly.

 

Command I use to see spindown is as follows.

 

sdparm --command=sense /dev/sdg

    /dev/sdg: HGST      HUS724030ALS640   A1C4
Additional sense: Standby condition activated by command

I only used —readonly with sdX. Yes, that is the command I used to see if the drive was spun down. I’ve followed your previous posts very closely. Sometime this week I will revisit and see if it’s still acting up.

Link to comment
On 9/8/2020 at 9:00 PM, SimonF said:

@doron not sure if background media scans happen if drives in standby.

 

I have disabled on my drives for when i was using idle timers.

 

sdparm --clear=EN_BMS --save /dev/sdX

 

You can see if background scans are active on Smartctl.

 

Also maybe some other functions are accessing drive as I know Smartctl doesn't work with the standby option on SAS. Does the GUI check for standby?

 

Do you run the IPMI plugin and checking disk temps as this may cause spinups also.

So it turns out that disabling this does further improve on the situation, but it's still happening: The SAS drives still do wake up from time to time, with no apparent reason or i/o done against them.

With my two scripts running constantly, the overall standby times are longer - but it's not perfect. Maybe we can figure out some more reasons for these drives to spin up and out of STANDBY state. @SimonF? 🙂

Link to comment
On 9/8/2020 at 7:13 AM, doron said:

I have a script that spins a SAS drive down when the syslog message about it is spewed by Unraid.

It doesn't seem wize to hitch your wagon to something that's been buggy for a very long time.  Especially when there's a very easy way to do this yourself -- just read /sys/block/sdX/stat directly, and when you notice that there's been no i/o activity for a certain period of time, then just go ahed and spin down the drive.  For example, I am attaching here my own little script that has been faithfully serving me for over five years now.  Just disable all spindown stuff in the UI, start my script from your "go" file, and forget about it.  (Note, the UI may also be buggy in the way it polls for smart data, thus spinning up your disks, so you may want to look into that too.)

 

#!/bin/bash
# spind: disk spin-down daemon
copy="Version 3.9 <c> 2020 by Pourko Balkanski"
prog=spind

####################################################################
MINUTES=${MINUTES:-60}  # the number of idle minutes before spindown
####################################################################

idleTimeout=$(($MINUTES*60)) # in seconds
loopDelay=61 # seconds

kill $(pidof -x -o $$ $0) 2>/dev/null # our previous instances, if any
[ "$1" = "-q" ] && exit 0 # Don't start a new daemon if called with -q

renice 5 -p $$ >/dev/null  # renice self
log () { logger -t $prog $@ ;}
log $copy

# Make a list of the disks that could be spun down
i=0
for device in /dev/[sh]d[aaa-zzz] ;do
   if proto=$(smartctl -i $device | grep -iE ' sas| sata| ide') ;then
      ((i++))
      devName[$i]=$device
      cmdStat[$i]="cat /sys/block/$(basename $device)/stat"
      devLastStat[$i]=$(${cmdStat[$i]})
      devSecondsIdle[$i]=0
      devError[$i]=0  # We'll use to flag disks that won't spin down
      cmdSpinStatus[$i]="hdparm -C $device"
      cmdStandby[$i]="hdparm -y $device"
      if grep -iq ' SAS' <<<$proto ;then
          # Switch from /dev/sdX to /dev/sgN
          devName[$i]=$(sg_map26 $device)
          cmdSpinStatus[$i]="sdparm --command=sense ${devName[$i]}"
          cmdStandby[$i]="sg_start --pc=3 ${devName[$i]}"
      fi
      theList+="${devName[$i]} "
   fi
done
devCount=$i

if [ "$theList" = "" ] ;then
  log 'No supported disks found. Exiting.'
  exit 1
fi
log "Will spin down disks after $MINUTES minutes of idling."
log "Monitoring: $theList"

while :;do
   sleep $loopDelay
   for i in $(seq $devCount) ;do
      [ ${devError[$i]} -gt 2 ] && continue  # this disk has previously failed to spin down.
      devNewStat[$i]=$(${cmdStat[$i]})
      if [ "${devNewStat[$i]}" != "${devLastStat[$i]}" ] ; then
         # Some i/o activity has occured since the last time we checked.
         devSecondsIdle[$i]=0
         devLastStat[$i]=${devNewStat[$i]}
      else # No new activity since we last checked...
          # ...So, let's check its spin status
          if ${cmdSpinStatus[$i]} | grep -iq standby ; then
              devSecondsIdle[$i]=0
          else # it's currently spinning
              let "devSecondsIdle[$i] += $loopDelay"
              # Check if it's been idling for long enough...
              if [ ${devSecondsIdle[$i]} -gt $idleTimeout ] ; then
                  # It is time to spin this one down!
                  log "spinning down ${devName[$i]} "
                  ${cmdStandby[$i]} >/dev/null 2>&1
                  devSecondsIdle[$i]=0
                  sleep 1 # no need to worry about race conditions here.
                  # Check if the drive actually spun down as a result of our command
                  if ${cmdSpinStatus[$i]} | grep -iq standby ;then
                     devError[$i]=0
                  else
                     ((devError[$i]++))
                     [ ${devError[$i]} -gt 2 ] && log "${devName[$i]} fails to spin down."
                  fi
              fi
          fi
      fi
   done
done &
disown
exit 0

 

 

spind-3.9.zip

Edited by Pourko
Link to comment

I have written the standby function into smartctl for SCSI(SAS) devices as its only currently available for ATA and submitted to owner for inclusion.

 

example outputs not sure if that would be of use.

 

root@Tower:~# smartctl -i -n standby  /dev/sdg
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.8-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in STANDBY BY COMMAND mode, exit(2)
root@Tower:~# smartctl -i -n standby  /dev/sdh
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.8-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in STANDBY BY TIMER mode, exit(2)
root@Tower:~# smartctl -i -n never  /dev/sdh
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.8-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS724030ALS640
Revision:             A1C4
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca027baa9a8
Serial number:        
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Fri Sep 11 20:28:44 2020 BST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Power mode was:       STANDBY BY TIMER

 

I have also provided some code to devs for changes to mdcmd to include within Unraid

Link to comment
1 hour ago, Pourko said:

It doesn't seem wize to hitch your wagon to something that's been buggy for a very long time. 

Hi. You are coming to a thread after a long discussion - please read over. In short, hanging the action on the syslog thing (and the other script/hack I posted) is by no means a "solution" - it's a stopgap testing mechanism, to test our assumptions about the feasibility and the efficacy of the STANDBY command solution. 

 

So far, it proves to generate mixed results: The drives do spin down, but after a while they spin back up. You'd not see it unless you monitor the STANDBY status of the drive rather closely.

Looking at your script, it seems to be susceptible to the same issue.

 

By the way, to do what you are doing, you don't need to read i/o counters - you can tell the drive to automatically spin down after a certain amount of idle time. That, in turn, has two drawbacks:

(a) Unraid's spin up/down management is not aware of this, so no UI display and settings dialog control.

(b) Same problem as above - the drives do spin back up, with no apparent i/o, for reasons yet to be understood.

1 hour ago, Pourko said:

 (Note, the UI may also be buggy in the way it polls for smart data, thus spinning up your disks, so you may want to look into that too.)

See previously in this threat.

Link to comment
1 minute ago, doron said:

you don't need to read i/o counters - you can tell the drive to automatically spin down after a certain amount of idle time.

Right. But I have a bunch of disks that disregard that setting. Which was the main reason I wrote my script.

 

Anyway, I was only trying to help.  For myself, I have a solution that has been working flawlessly on my server for years.  If you don't like it -- forget I posted it.

 

Cheers.

 

Link to comment
Just now, Pourko said:

Anyway, I was only trying to help.  For myself, I have a solution that has been working flawlessly on my server for years.  If you don't like it -- forget I posted it.

On the contrary, it would in fact be good that you take part - you seem to have relevant experience - it'd just be good to get in sync with the discussion.

 

Link to comment

See, I have the feeling that you are not correctly identifying the problem.  The way I see it, the problem is not how to spin down disks, the problem is that some buggy scripts in the UI don't know how to properly query a disk without waking it up, and they don't know when to rightfully display a green ball (or whatever other collor).  Personally, I rarely use the UI for anything, and on my server disks spin down when they are supposed to, and they stay spun down.  From reading the posts in this thread, I have the impression that you are trying to fix things kind of backwards, i.e., you take some info from the UI (that does not match reality) and try to make the disk status match that unreal info from the UI.  That is why I suggested that maybe you shouldn't bother, doing it that way, and instead plead with the UI people to fix their UI, if the UI is that important to you.  I hope this explaination makes some sense. :-)

Edited by Pourko
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.