Spin down SAS drives


doron

Recommended Posts

Makes sense and certainly a valid view. That being said, I think you're mixing a few things up.

 

The UI thing is not a goal here. The goal (as stated in the first post of this thread) is to enhance Unraid's existing spin up/down mechanism, which works quite reliably and effectively for SATA drives, to do the same for SAS drives. Once this is done, part of the result will be that the UI will also reflect the correct state of these SAS drives (which currently it certainly does not).

 

While this thread was happening, we zeroed down on a good way to do it, technically (sending the drive to state 3). In the process of assessing its viability, I hacked (a) a script that piggybacks on Unraid's spindown mechanism (using a rather clumsy trick) to "extend" that functionality for SAS drives too, (b) another script that sends all quasi-sleeping SAS drives back to sleep after a SMART poll (because of the bug you mention - which btw seems to affect only SAS drives); in this script, indeed I'm reading the UI "color" to obtain the presumed state of the drive, which for SAS drives does not reflect real state but desired state; and (c) a long-running monitor to closely monitor the true state of the drive.

 

All of that is just a study, if you will - the permanent solution must be in the Unraid code (not UI - unless you define UI as anything out of kernel). Part (c) above taught me some things - mainly, that the SCSI/SAS drives keep spinning up for reasons we don't yet understand (while SATA drives do not). My comment re your script was that even with it, you may think you're spinning down a SAS drive until next i/o, but it might spin up for no apparent reason and you won't know it.

 

Bottom line, the goal is to map the behavior properly, so that Limetech devs can "do the right thing" when they finally put the official fix in place.

 

Sounds good?

Link to comment

I understand what you are trying to do, and I applaud your efforts.

27 minutes ago, doron said:

part of the result will be that the UI will also reflect the correct state of these SAS drives (which currently it certainly does not).

My point is, a disk can spin down for different reasons, including it can decide to spin down all by itself, and it doesn't need to give any explanation about that to the UI.  So, if the UI can't make heads or tails of it, then the problem is in the UI. The UI should learn to properly sense the disk status and to display it correctly.

32 minutes ago, doron said:

My comment re your script was that even with it, you may think you're spinning down a SAS drive

I not only think I'm spinning down the disks, I know I am, as I'm watching their status refresh in real time in a ssh window -- basically the same thing that you are trying to watch inside a browser with colored balls, only I have extensively tested my "monitor" script to know that it reflects the real status correctly.

 

Anyway, I'll be watching the progress of this discussion, and I'll chime in if I think that I have something more to contribute.

 

Good luck!

 

Link to comment
1 minute ago, Pourko said:

My point is, a disk can spin down for different reasons, including it can decide to spin down all by itself, and it doesn't need to give any explanation about that to the UI.  So, if the UI can't make heads or tails of it, then the problem is in the UI. The UI should learn to properly sense the disk status and to display it correctly.

The UI does not pretend to know the real status of the drive. It shows the status of the drive as an Unraid array device - i.e. if the Unraid code has or has not spun the device down. No more, no less.

For SATA drives, this seems to work quite well. For SCSI drives, it does not. But still, that's all that does.

And again, my business is not with the UI. The UI is only what it is - a UI.

1 minute ago, Pourko said:

I not only think I'm spinning down the disks, I know I am, as I'm watching their status refresh in real time in a ssh window -- basically the same thing that you are trying to watch inside a browser with colored balls, only I have extensively tested my "monitor" script to know that it reflects the real status correctly.

Not sure why you'd think I'm watching colored balls. The monitoring I mentioned is done with script code that reads the real state of the drive, every 5 seconds, logs and sums up over time. 

We all spin the drives down (even using the same command); my point was that you may think you're spinning it until the next i/o, while this is not always the case; it does spin down, and then, perhaps 10 minutes later, with no i/o happening, it may spin back up. The end game - saving energy and heat - is kind of missed.

 

Link to comment
2 minutes ago, doron said:

perhaps 10 minutes later, with no i/o happening, it may spin back up.

Now this is something interesting, and we should get to the bottom of it!  I don't see that on my server, but then again, my UI is stock vanilla with no plugins, and even custom-crippled a bit. :-)  You should definitely try to prove the "no i/o happening" claim.  I suggest you make your monitoring script read the device's stat file, and demonstrate that the reading before and after the disk spins up are exactly the same.  Now that would be really interesting!

Link to comment
3 minutes ago, Pourko said:

As I'm thinking more about that, proving or disproving that claim is really important, as it will lead our investigation in two completely different directions. (Borrow snippets from my code if you will, as I am doing something similar there.)

Yes. It's a good idea. I've already updated my monitoring script to include that data (actually, a single sum of all 11 fields of the kernel stat file). We'll see.

Link to comment

 

12 minutes ago, Pourko said:

As I'm thinking more about that, proving or disproving that claim is really important, as it will lead our investigation in two completely different directions. (Borrow snippets from my code if you will, as I am doing something similar there.)

.. and the jury's back in.

Indeed, the case is as I said above: The state of one of my drives has just changed (i.e. drive spun up) while the i/o number (sum of 11 fields in /sys/block/sdX/stat) hasn't changed.

 

(fun fact: The particular drive that has just spun up has no actual data (i.e. no files), just the Unraid dir structure. It's empty)

Link to comment
1 hour ago, doron said:

. and the jury's back in.

Indeed, the case is as I said above: The state of one of my drives has just changed (i.e. drive spun up) while the i/o number (sum of 11 fields in /sys/block/sdX/stat) hasn't changed.

You didn't need to bother adding those up -- just read the whole thing as one string, and then compare the two strings from before and after.  Same thing though.

 

This thing keeps getting curiouser and curiouser.  :-)   It begs the question, why I am not seeing this on my server?  Can you maybe restart your server, but this time without starting emhttpd, and see if the spinup happens again?  Could something (maybe from the UI?) be sending some weird request to the disk that wakes up SAS disks?... Like a smart info request maybe?  I'm just speculating.

Edited by Pourko
Link to comment

I have by now traced all of the unsolicited spin-ups of my SAS drives to various invocations of smartctl. Turns out a few plugins issue assorted smartctl status queries (e.g., disklocation). After tracing smartctl and logging its callers, I can now account for 100% of the spin ups.

 

So - concur, it all boils down to smartctl 7.1 not dealing with "-n standby" for non-ATA drives.

This to me is great news.

On 9/11/2020 at 10:37 PM, SimonF said:

I have written the standby function into smartctl for SCSI(SAS) devices as its only currently available for ATA and submitted to owner for inclusion.

Fantastic! Thank you for that. BTW I looked for a ticket on smartmontools.org and couldn't find it - can you share?

I presume your code makes it honor "-n POWERMODE" for SCSI/SAS devices?

On 9/11/2020 at 10:37 PM, SimonF said:

I have also provided some code to devs for changes to mdcmd to include within Unraid

That's really great. Thank you.

 

As per the stopgap "solution", my current setup is as follows:

 

1. A script triggered by the syslog spindown message, as described previously in this thread, performing the spin-down if the device is SAS.

2. A wrapper I made around the smartctl command, which, if (a) "-n standby" was requested AND (b) the device is a SAS drive, skips the call to smartctl (just exits). 

 

This seems to work perfectly well now, with no unexpected spin ups.

 

If anyone is interested I can share the smartctl wrapper.

Link to comment
43 minutes ago, doron said:

I have by now traced all of the unsolicited spin-ups of my SAS drives to various invocations of smartctl. Turns out a few plugins issue assorted smartctl status queries (e.g., disklocation). After tracing smartctl and logging its callers, I can now account for 100% of the spin ups.

 

So - concur, it all boils down to smartctl 7.1 not dealing with "-n standby" for non-ATA drives.

This to me is great news.

Fantastic! Thank you for that. BTW I looked for a ticket on smartmontools.org and couldn't find it - can you share?

I presume your code makes it honor "-n POWERMODE" for SCSI/SAS devices?

That's really great. Thank you.

 

As per the stopgap "solution", my current setup is as follows:

 

1. A script triggered by the syslog spindown message, as described previously in this thread, performing the spin-down if the device is SAS.

2. A wrapper I made around the smartctl command, which, if (a) "-n standby" was requested AND (b) the device is a SAS drive, skips the call to smartctl (just exits). 

 

This seems to work perfectly well now, with no unexpected spin ups.

 

If anyone is interested I can share the smartctl wrapper.

Change has not been accepted by moderators as yet.

 

Attached is my version use at your own risk

 

On my machine spin ups still happen if I have device poll enabled. But cannot trace where or what is called to do those checks. It may be in emhttpd but I dont have source to review/check. Again it may be that check is looking just for standby but I am outputting different status depending on how its spins down.

 

smartctl is in my boot/extras

 

added to go (This is on my test server not live one  test is on beta25)

 

cp /boot/extras/smartctl /usr/local/sbin

chmod +x /usr/local/sbin/smartctl

 

 

root@Tower:~# smartctl -i -n standby  /dev/sdg
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.8-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in STANDBY BY COMMAND mode, exit(2)
root@Tower:~# smartctl -i -n standby  /dev/sdh
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.8-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in STANDBY BY TIMER mode, exit(2)
root@Tower:~# smartctl -i -n never  /dev/sdh
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.8-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS724030ALS640
Revision:             A1C4
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca027baa9a8
Serial number:        
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Fri Sep 11 20:28:44 2020 BST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Power mode was:       STANDBY BY TIMER

smartctl

Edited by SimonF
Link to comment
48 minutes ago, SimonF said:

Attached is my version use at you on risk

Thanks!

48 minutes ago, SimonF said:

On my machine spin ups still happen if I have device poll enabled. But cannot trace where or what is called to do those checks. It may be in emhttpd but I dont have source to review/check. Again it may be that check is looking just for standby but I am outputting different status depending on how its spins down.

For that you can use my wrapper (below). If you leave DEBUG set to true you will get a syslog message with both the parent and grandparent processes (callers). It helped me track all of those calls down.

 

#!/bin/bash

SDPARM=/usr/sbin/sdparm
REALCMD=/usr/sbin/smartctl

DEBUG=false
DEBUG=true

IsSAS () {
        [ "$($REALCMD -i $1 |
        grep protocol |
        sed -r 's/.*protocol: *(.*) .*/\1/')" == "SAS" ]
}

IsSBY () {
        grep -iq "standby condition activated" <<< $($SDPARM --command=sense $1)
}

log () { logger -i -t "smartctl wrapper" -- "$@" ; }

DEVICE="${@: -1}"

        $DEBUG && log "caller is $(cat /proc/$PPID/comm), grandpa is $(cat /proc/$(cat /proc/$PPID/stat | cut -d" " -f4)/comm), device $DEVICE, args \"$@\""


if grep -iq -- "-n standby " <<< "$@" && [ -b $DEVICE ] && IsSAS $DEVICE ; then

        if IsSBY $DEVICE ; then

                log  "Device $DEVICE is spun down, smartctl evaded"
                $DEBUG && echo "Device $DEVICE is spun down, smartctl evaded"

                exit 0

        fi

fi

$REALCMD "$@"

 

  • Thanks 1
Link to comment
14 minutes ago, SimonF said:

where did you put the wrapper

Basically, you can just place it as "smartctl" in /usr/local/sbin (just like you did with your binary version), taking care of the x permission.

For that to "take", you'll need to reboot Unraid(*).

 

If you want to try it without reboot, do this:

 

1. Put it in /usr/sbin as smartctl.wrapper, make sure it has x permission

2. Change the REALCMD def (line 4) to read: REALCMD=/usr/sbin/smartctl.real

3. cd /usr/sbin && mv smartctl smartctl.real && mv smartctl.wrapper smartctl

 

This will work immediately.

 

(*) Reason is that long-running shells do not rehash their search path and hence will not find the new command until restarted.

  • Thanks 1
Link to comment
On 9/13/2020 at 10:43 PM, doron said:

Basically, you can just place it as "smartctl" in /usr/local/sbin (just like you did with your binary version), taking care of the x permission.

For that to "take", you'll need to reboot Unraid(*).

 

If you want to try it without reboot, do this:

 

1. Put it in /usr/sbin as smartctl.wrapper, make sure it has x permission

2. Change the REALCMD def (line 4) to read: REALCMD=/usr/sbin/smartctl.real

3. cd /usr/sbin && mv smartctl smartctl.real && mv smartctl.wrapper smartctl

 

This will work immediately.

 

(*) Reason is that long-running shells do not rehash their search path and hence will not find the new command until restarted.

Found my issue emhttpd wasnt picking one up in /usr/local/sbin. So no spinups now with device poll 🙂

 

I have also made some addition changes to my revised smartctl changes.

 

-n standby options as before.

 

But now have added -s standby, now so same command could be used for both ATA and SCSI.

 

Also added function -s active to spinup both ATA and SCSi drives.

 

Awaiting feedback from smartctl team as yet no ticket is logged.

 

Link to comment

your patch also fixes the green ball in the UI?

BTW, did you guys opened emhttpd binary? there we can see al the commands that are running  under the hood.

some things I found

/usr/sbin/hdparm -S0 /dev/%s &> /dev/null
/usr/sbin/hdparm -y /dev/%s &> /dev/null
/usr/sbin/smartctl -n standby %s %s -AH /dev/%s

 

I really need this working :D, owner of 2 SAS 4kn disks

Edited by segator
Link to comment

hdparm is only used from emhttpd for pools I think. mdcmd spins down array drives, I have sent some sample code that could be used to limetech for mdcmd.

 

I have submitted changes to smartctl to enable standby for SAS drives, it already supports SATA/ATA, I am looking to check will work with usb attached drives. if this is the case will ask if hdparm option could be changed to smartctl and once changes have been incorporated by smartctl team them it will support both SAS and SATA.

Link to comment

We have indeed made a lot of progress in this thread.

 

I now have a temporary stopgap solution running on my system that seems to work very well (SAS drives spin down in sync with Unraid's schedule, no sporadic / unexpected spin-ups). Since quite a few people expressed interest in this, I thought I'd share this stopgap. So I packaged it into a single run-and-forget script.

We can use it until Limetech puts the permanent solution into standard Unraid code.

 

To use, simply place the attached script somewhere on your flash drive (e.g. /boot/extra) and run it like so:

bash /boot/extra/unraid-sas-spindown-pack

It should be effective immediately. Assuming it works well for you, you can add a line in your "go" script to run it upon system boot.

 

Essentially, it does the following:

1. Install a script that spins down a SAS drive. The script is triggered by the Unraid syslog message reporting this drive's (intended) spin down, and actually spins it down.

2. Install an rsyslog filter that mobilizes the script in #1.

3. Install a wrapper for "smartctl", which works around smartctl's deficiency of not supporting the "-n standby" flag for non-ATA devices. When this flag is detected and the target device is SAS, smartctl is bypassed.

 

As always, no warranty, use at your own risk. It works for me. 

With that said, please report any issue.

 

Thanks and credit points go to this great community, with special mention to @SimonF and @Cilusse.

 

EDIT: Just uploaded an updated version. Please use this one instead; previous one had a small but nasty bug that sneaked in during final packing. Apologies.

 

 

unraid-sas-spindown-pack

Edited by doron
  • Like 3
  • Thanks 1
Link to comment

I’ve been running @doron’s script for a few days and it has been perfect so far! Absolutely no unwanted spin ups, and my SAS drives are always sent to standby when needed.

 

Thank you so much to everyone who contributed to this thread, especially @SimonF and @doron (and thanks for crediting me in your script even if I really just suggested something and haven’t written a single line of code 😆).

 

Now let’s push this to the devs and get it included in Unraid!

Stay safe and keep up the positive vibes 😉

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.