[Plugin] Spin Down SAS Drives


doron

Recommended Posts

23 hours ago, doron said:

Thanks for reporting that.

Have you at any time tried the manual command to spin a drive down? Such as


sg_start -r --pc=3 /dev/sdX

I wonder whether it gets stuck the same way, and then gets a "task abort" a bit later. I haven't seen similar reports up until now.

killed one of my drives

Link to comment
23 hours ago, Titus said:

 

2020-10-30 09:34Unraid Disk 10 errorAlert [123456] - Disk 10 in error state (disk dsbl) (sdm)alert

 

will no come back online

When a drive is marked disabled in the array (gets the red x), it will not come back up automatically. It needs to be rebuilt; there are a few guides as to how to do that. However the drive is not physically disabled - it is probably in good shape - just needs to be reintroduced into the array and rebuilt.

Link to comment
On 11/3/2020 at 8:04 PM, odirneto said:

Is the same mensage from tryng to spun down, but a lot of times in a row

 

 

This is weird - unless this reflects your pushing the green buttons for disk 2 and 3 repeatedly several times in quick succession. Is this what happens?

 

BTW these messages are generated due to Unraid trying ATA spindown commands against a SAS drives.

The next version of the plugin (I've been sitting on it for a while, hoping to collect more data on drive/controller combos with which there's failures) filters these messages out from your log.

Edited by doron
Link to comment
  • 2 weeks later...

My experience in my specific array is very good.

Running 0.6 without errors in any drives and when I clic on the drives I notice the slow spin up to read the directories, so it is really working well.

I have been moving files and even deleting folders, no errors in the drives.

My main issue is that even with that feature enabled my shelve seem to be still power hungry... but I will need to investigate more.

I'm using this PCI card:

 

[1000:0087] 01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)

 

with an EMC shelve from a VNX 5200 and the original 15 hitachi sas drives that came with it:

 

[7:0:0:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdd 3.00TB

[7:0:1:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdf 3.00TB

[7:0:2:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdg 3.00TB

[7:0:3:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdh 3.00TB

[7:0:4:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdi 3.00TB

[7:0:5:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdk 3.00TB

[7:0:6:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdl 3.00TB

[7:0:7:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdm 3.00TB

[7:0:8:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdn 3.00TB

[7:0:9:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdo 3.00TB

[7:0:10:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdp 3.00TB

[7:0:11:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdq 3.00TB

[7:0:12:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdr 3.00TB

[7:0:13:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sds 3.00TB

[7:0:14:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdt 3.00TB

 

aside from other 3 sata drives connected to the booard integrated controller

 

Processor is a Xeon E3-1200 in an Intel motherboard

 

Let me know if I can help by posting any more details.

Edited by Golfonauta
typo
  • Thanks 1
Link to comment

After installing the plugin I tried spinning down drives and nothing seemed to be different from before the plugin was installed, i.e. Normal errors and drives not actually spinning down. I uninstalled and reinstalled (as well as a reboot) but the plugin still doesn't seem to be working at all.

 

There are no SAS Assist entries in syslog on either manual spin-down or Unraid's spin-down, Only the normal barrage of errors.

One thing to note is that command line spin-down of the drives is working using:

sg_start -r --pc=3 /dev/sdX

and starting back up again with:

sg_start -r --pc=0 /dev/sdX

So I'm sure the issue is somewhere on the OS side and not the communication with the disks or the disks themselves. I cannot think of a reason that the plugin wouldn't even show in any of the logs like all other logs I've seen in this thread. I have done a couple migrations in the life of my install including about a week ago after the server had been off for about 150 days (had to update everything recently) if that is of any relevance. I can confirm the install of the plugin.

 

Please let me know of any other information that might be helpful!

Screenshot 2020-11-16 105533.png

Link to comment
4 hours ago, stigs said:

There are no SAS Assist entries in syslog on either manual spin-down or Unraid's spin-down, Only the normal barrage of errors.

Thanks for reporting this. The next 0.7 version should(...) address your case. I'll hopefully push in within the next couple days.

When you update to it, please report again.

Link to comment

Just pushed out version 0.7 of the plugin.

 

There are many changes, a few notable ones listed below. The main method of spinning a SAS drive down remains the same - meaning, that if you had issues (or worse, red x's) following spindown attempts in previous versions, there's a good chance this version will not improve this particular situation, so please test with care.

 

- Adapt the syslog hook to various Unraid configs (between Unraid and Dynamix there are several different forms of syslog configs, which vary among them if you config syslog settings, so there's now a mechanism that will reconfigure the hook per different situations and will dynamically respond to changes in settings). @stigs, I'm guessing this might address your issue as well.

- Filter out syslog lines (aka "spam"...) from some SAS devices rejecting ATA standby op (e0)

- Introduce an exclusion list, which should gradually contain drive/controller combinations that are known to not respond favorably to spindown command

- More consistent log messages and tags

- Add new debug and testing tools

- Many other changes, major code reorg

 

Enjoy! Please report issues (or success).

  • Thanks 1
Link to comment
37 minutes ago, doron said:

Just pushed out version 0.7 of the plugin.

 

Woohoo!  Finally getting syslog hook messages and it's working.


 

Nov 18 09:22:25 teraserver kernel: mdcmd (34010): spindown 7
Nov 18 09:22:28 teraserver SAS Assist v0.7: spinning down slot 7, device /dev/sdh (/dev/sg7)

 

The green dot doesn't turn gray, but checking the disk status from the console, it's showing spun down.

 

root@teraserver:~# sdparm --command=sense /dev/sdh
    /dev/sdh: HGST      HUH721010AL4200   A21D
Additional sense: Standby condition activated by command

 

Link to comment
1 hour ago, doron said:

Just pushed out version 0.7 of the plugin.

1 hour ago, doron said:

@stigs, I'm guessing this might address your issue as well.

Good work!! I can report all is well after updating.

  • No more log spam: CHECK
  • SAS ASSIST log entries showing: CHECK
  • Drives spin down/up properly: CHECK

Thanks a lot, this plugin fixes all my issues and will help save me hundreds of dollars this year undoubtedly. I'll keep you updated if any more issues arise.

 

H7230AS60SUN3.0T Success
ST33000SSSUN3.0T Success

 

Edited by stigs
Link to comment

Really good work @doron! Just tried the commands against my SAS drives after upgrading firmware of the SAS controller. But "sg_start  --readonly --pc=3 /dev/sgX" disables the drive, and i need a reboot to get it back.

 

However, the plugin loops the "drive not supported" messages see below.

 

 

 

Nov 18 20:13:45 Tower kernel: mdcmd (931): spindown 3
Nov 18 20:13:45 Tower kernel: mdcmd (932): spindown 4
Nov 18 20:13:45 Tower SAS Assist v0.7: disk 3 (/dev/sdh) not supported by SAS spindown plugin (excluded), not spun down
Nov 18 20:13:46 Tower SAS Assist v0.7: disk 4 (/dev/sdi) not supported by SAS spindown plugin (excluded), not spun down
Nov 18 20:13:46 Tower kernel: mdcmd (933): spindown 3
Nov 18 20:13:46 Tower kernel: mdcmd (934): spindown 4
Nov 18 20:13:47 Tower SAS Assist v0.7: disk 3 (/dev/sdh) not supported by SAS spindown plugin (excluded), not spun down
Nov 18 20:13:47 Tower SAS Assist v0.7: disk 4 (/dev/sdi) not supported by SAS spindown plugin (excluded), not spun down
Nov 18 20:13:47 Tower kernel: mdcmd (935): spindown 3
Nov 18 20:13:48 Tower kernel: mdcmd (936): spindown 4
Nov 18 20:13:48 Tower SAS Assist v0.7: disk 3 (/dev/sdh) not supported by SAS spindown plugin (excluded), not spun down
Nov 18 20:13:48 Tower SAS Assist v0.7: disk 4 (/dev/sdi) not supported by SAS spindown plugin (excluded), not spun down

 

Link to comment
6 minutes ago, jowe said:

However, the plugin loops the "drive not supported" messages see below.

Okay that's actually a feature. Since your drives clearly can't be spun down properly, the plugin avoids them (they are on the exclude list). If it can't help, at least it avoids collateral damage...

 

Now, when you say "loops" - is there actually an endless loop of "spindown 3" - "spindown 4" - "spindown 3" - "spindown 4" or is this a result of your hitting the green button a few times in a row?

 

Actually, if all your SAS drives are on the exclude list, there's unfortunately little point for you to run this plugin at this time 😞

Edited by doron
Link to comment
10 minutes ago, doron said:

Okay that's actually a feature. Since your drives clearly can't be spun down properly, the plugin avoids them (they are on the exclude list). If it can't help, at least it avoids collateral damage...

Yes I understand that the message should display, when timer reaches for example 15min. Or hitting spin down button.

12 minutes ago, doron said:

Now, when you say "loops" - is there actually an endless loop of "spindown 3" - "spindown 4" - "spindown 3" - "spindown 4" or is this a result of your hitting the green button a few times in a row?

I did not push the button at all, just waited 15min, and the message loops every second. I could have provided a really long list with same messages.

14 minutes ago, doron said:

Actually, if all your SAS drives are on the exclude list, there's unfortunately little point for you to run this plugin at this time 😞

Yes I know, just wanted to give it a go with the latest FW, but unfortunately it didn't work. Nevertheless it's a great project!

Link to comment
22 minutes ago, jowe said:

I did not push the button at all, just waited 15min, and the message loops every second. I could have provided a really long list with same messages.

So this happens with the plugin installed; and when you remove the plugin - those "spindown 3"/"spindown 4" messages do not appear (or at least not in quick succession)?!

If so, this is puzzling and I would like to try to get to the bottom of it.

 

EDIT: I seem to be unable to reproduce it, and also can't see right now how the plugin will cause repeated "mdcmd spindown" to happen.

Edited by doron
Link to comment
13 hours ago, doron said:

So this happens with the plugin installed; and when you remove the plugin - those "spindown 3"/"spindown 4" messages do not appear (or at least not in quick succession)?!

If so, this is puzzling and I would like to try to get to the bottom of it.

 

EDIT: I seem to be unable to reproduce it, and also can't see right now how the plugin will cause repeated "mdcmd spindown" to happen.

I get the errors instead if i remove the plugin, so its more of an unraid thing, it tries to spin down the drive again and again... Not a problem for me as i have the disks to never spin down now.

image.thumb.png.c87a25aafc016a6a13a5a2a059360fb2.png

 

image.png.5859d3bbfcea6e231443fe00edec539b.png

 

 

 

 

Link to comment
1 minute ago, jowe said:

I get the errors instead if i remove the plugin, so its more of an unraid thing, it tries to spin down the drive again and again... Not a problem for me as i have the disks to never spin down now.

Thanks for confirming.

(still a weird thing that I have never seen elsewhere but...)

  • Like 1
Link to comment

Hi @doron - thank you for putting this plugin together.  I wanted to give you an outline of some changes in next 6.9-beta release that will make implementing SAS spin down a little easier, or at least make experimentation easier.

 

What I did was rip out all the spin up/down logic in the md/unraid driver.  Instead all spin up/down handling will be done by user-space emhttpd process.

 

When emhttpd needs to perform a spin up, spin down, or read SMART attributes, it will invoke a script named after the operation to perform and device transport.  These scripts are located in /usr/local/sbin and named as follows:

emhttp_device_<transport>_<operation>

These scripts are provided:

emhttp_device_ata_spinup
emhttp_device_ata_spindown
emhttp_device_ata_smart

The scripts are passed the device name, such as "sdb", and in the case of "smart" operation, an optional parameter (unused at the moment).

 

For example, suppose emhttpd decides to spin down disk1, where disk1 corresponds to device /dev/sdb.  In this case emhttpd will invoke:

/usr/local/sbin/emhttp_device_ata_spindown sdb

This command is actually executed in the background.  At present there is no code which actually verifies that the operation succeeded.

 

Here is the content of above script:

#!/bin/bash
# $1 device name, eg, "sdb"
/usr/sbin/hdparm -y /dev/$1

Pretty simple.

 

To use with SAS your plugin would install these three scripts:

emhttp_device_scsi_spinup
emhttp_device_scsi_spindown
emhttp_device_scsi_smart

I expect to be publishing 6.9.0-beta36 "soon".  Since the 'mdcmd' error messages are no longer present, your current plugin won't work.

Link to comment

Hi @limetech - thanks for the heads up! (hehe I was just reading that other thread when your message came in).

This sounds cool and exactly the right approach (the plugin will shrink to a few lines but hey - it was supposed to be a temporary stopgap anyway).

Getting rid of the syslog dependency would be a blessing (btw I bumped into a few issues with Unraid's handling of rsyslog config but will deal with it in a separate thread - the plugin has an elaborate work around).

 

One question: Where exactly is the value of <transport> derived from for this exercise?

 

Thanks again for doing this.

Link to comment
9 minutes ago, doron said:

One question: Where exactly is the value of <transport> derived from for this exercise?

I looks at entries in /dev/disk/by-id which correspond to the device name.  Each entry has a prefix, e.g., "ata-" or "nvme-", etc.  For SAS it should be "scsi-".  That prefix, with trailing '-' removed, is <transport>.

Link to comment
7 minutes ago, limetech said:

I looks at entries in /dev/disk/by-id which correspond to the device name.  Each entry has a prefix, e.g., "ata-" or "nvme-", etc.  For SAS it should be "scsi-".  That prefix, with trailing '-' removed, is <transport>.

Thanks. I can work with that.

Note btw that in this schema, all SAS drives will be "scsi" but not all "scsi" will be SAS.

The only dependable way I found to pinpoint a SAS drive is via smartctl -i, parsing out "Transport protocol" - a field which is returned only for SAS drives (an example pasted below). Found nothing similar in neither /sys nor /dev .

I'll use that as a filter in the script, ergo "can work with that".

 

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUH721212AL4200
Revision:             A3D0
Compliance:           SPC-4
User Capacity:        12,000,138,625,024 bytes [12.0 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca2708b9bf8
Serial number:        xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Nov 22 00:44:17 2020 IST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.