[Plugin] Spin Down SAS Drives

October 6, 20205 yr

Author

13 minutes ago, dansonamission said:

The drive is a ST4000NM0023 Seagate, revision GS0D and the card HP H220 LSI 9205-8i 9207 with P20 firmware.

This manual for this drive (in fact, the entire Constellation ES.3 series) seems to indicate (sec 6.1) that an explicit NOTIFY needs to be sent to the device to recover from the spindown mode we're sending the device into (Standby_Z).

This is in contrast with other devices tested (e.g. WD/HGST) that automatically spin up when sent to this state.

Has anyone seen positive results with this drive and this plugin?

We might end up having to enumerate the drive types where this works well vs. those that fail, and build a white list in the plugin. Nasty 😞

Can I get brief messages here from anyone who's using (or tried using) the plugin, reporting success/failure? Just a one liner with:

<HDD Model> <Success/Failure> (<optional comment>)

would be great. Example;

HUH721212AL4200  Success

PM would also work if you don't want to post. Thanks!

Quote

October 6, 20205 yr

My drives are as follows, not tested with the plugin but work with manual commands.

HGST HUS724030ALS640 A1C4 Success
HITACHI HMRSK2000GBAS07K 3P02 Success

Will get a seagate drive and do some testing and feedback

Edited October 6, 20205 yr by SimonF
Additional Info

Quote

October 7, 20205 yr

HITACHI HUC106060CSS600 A430 Failure

Read errors when spun down and trying to wake back up.

They do spin backup up but have had random (3 times) drives with red x.

Array is 20 drives of the above type.

Edited October 7, 20205 yr by SuperDan

Quote

October 7, 20205 yr

Author

39 minutes ago, SuperDan said:

HITACHI HUC106060CSS600 A430 Failure

Read errors when spun down and trying to wake back up.

They do spin backup up but have had random (3 times) drives with red x.

Array is 20 drives of the above type.

Thanks. So basically you're saying the result in your case is not consistent? Some (most?) of the time they do spin back up but at times they get the read errors?

Could you paste syslog lines from the time of such error that red-x-ed a drive?

Quote

October 7, 20205 yr

1 hour ago, doron said:

Thanks. So basically you're saying the result in your case is not consistent? Some (most?) of the time they do spin back up but at times they get the read errors?

That is correct.

I had to re enable the plug to get the log entries.

This time 2 drives red x'd on me.

The only entry I saw related to those drives are:

Oct 7 09:32:46 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 7 09:32:46 unNAS kernel: md: disk17 read error, sector=0
Oct 7 09:32:51 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Oct 7 09:32:51 unNAS kernel: md: disk17 write error, sector=0

Oct 7 09:41:37 unNAS kernel: blk_update_request: I/O error, dev sdu, sector 586549720 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 7 09:41:37 unNAS kernel: md: disk15 read error, sector=586549656
Oct 7 09:41:37 unNAS kernel: blk_update_request: I/O error, dev sdu, sector 586549720 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Oct 7 09:41:37 unNAS kernel: md: disk15 write error, sector=586549656
Oct 7 09:41:37 unNAS kernel: blk_update_request: I/O error, dev sdu, sector 64 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Oct 7 09:41:37 unNAS kernel: md: disk15 write error, sector=0

Quote

October 7, 20205 yr

1 hour ago, doron said:

Thanks. So basically you're saying the result in your case is not consistent? Some (most?) of the time they do spin back up but at times they get the read errors?

Could you paste syslog lines from the time of such error that red-x-ed a drive?

Actually I found more log entries for one of the drives that went red x:

Oct 7 09:31:30 unNAS SAS Assist v0.6[36184]: spinning down slot 17, device /dev/sdz (/dev/sg26)
Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 Sense Key : 0x2 [current] [descriptor]
Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 ASC=0x4 ASCQ=0x11
Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 CDB: opcode=0x28 28 00 00 00 00 40 00 00 08 00
Oct 7 09:32:46 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Oct 7 09:32:46 unNAS kernel: md: disk17 read error, sector=0
Oct 7 09:32:46 unNAS kernel: sd 2:0:8:0: Power-on or device reset occurred
Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 Sense Key : 0x2 [current] [descriptor]
Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 ASC=0x4 ASCQ=0x11
Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 CDB: opcode=0x2a 2a 00 00 00 00 40 00 00 08 00
Oct 7 09:32:51 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Oct 7 09:32:51 unNAS kernel: md: disk17 write error, sector=0

Quote

October 7, 20205 yr

Author

Thanks very much for this.

57 minutes ago, SuperDan said:

Oct 7 09:31:30 unNAS SAS Assist v0.6[36184]: spinning down slot 17, device /dev/sdz (/dev/sg26)
Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 Sense Key : 0x2 [current] [descriptor]
Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 ASC=0x4 ASCQ=0x11

Aha. That's the infamous/dreaded 02-04-11, same as we've seen on the Seagate. Will have to exclude this series as well 😞

57 minutes ago, SuperDan said:

Oct 7 09:32:46 unNAS kernel: sd 2:0:8:0: Power-on or device reset occurred

A different drive? Was something else happening at the same time?

Quote

October 7, 20205 yr

1 hour ago, doron said:

Thanks very much for this.

Aha. That's the infamous/dreaded 02-04-11, same as we've seen on the Seagate. Will have to exclude this series as well 😞

A different drive? Was something else happening at the same time?

Maybe a differnet drive(s) since my cache drives are SATA SSD's.

May as well add these drive as well since the are HITACHI HUC106060CSS600 drives rebranded to Netapp drives but still suffer the above problem,

NETAPP X422_TAL13600A10

NETAPP X422_HCOBD600A10

Edited October 7, 20205 yr by SuperDan

Quote

1

October 7, 20205 yr

Something I just noticed, the other user having this problem is using an HP controller

HP H220 LSI 9205-8i 9207

And I am using a DELL H310 LSI MegaRAID SAS 2008

Maybe proprietary firmware on the HBA may have something to do with it?

Just throwing it out there.

Quote

October 7, 20205 yr

Author

1 minute ago, SuperDan said:

Something I just noticed, the other user having this problem is using an HP controller

HP H220 LSI 9205-8i 9207

And I am using a DELL H310 LSI MegaRAID SAS 2008

Maybe proprietary firmware on the HBA may have something to do with it?

Just throwing it out there.

Hmm. Certainly a possibility, yes.

Currently it's second on my list of potential causes, only because the OEM manual of the Seagate described the drive's behavior in the Standby_Z state (this is what we're using to spin the drive down), and it explicitly stated that the drive will require an init op to spin up again. This is in contrast with the HGST/WD SAS drives, which (explicitly) specify that in the same state the drive is ready to spin back up on next I/O.

The manual for your Hitachi is silent about this - does not say either - but the 04-11 sense makes me think it does it like the Seagate.

You'd think that standards would be standards, and the drive behavior following a standard SAS/SCSI operation would be standard... go figure.

It could still be the controller, the above is just my current thinking.

Quote

October 7, 20205 yr

30 minutes ago, doron said:

Hmm. Certainly a possibility, yes.

Currently it's second on my list of potential causes, only because the OEM manual of the Seagate described the drive's behavior in the Standby_Z state (this is what we're using to spin the drive down), and it explicitly stated that the drive will require an init op to spin up again. This is in contrast with the HGST/WD SAS drives, which (explicitly) specify that in the same state the drive is ready to spin back up on next I/O.

The manual for your Hitachi is silent about this - does not say either - but the 04-11 sense makes me think it does it like the Seagate.

You'd think that standards would be standards, and the drive behavior following a standard SAS/SCSI operation would be standard... go figure.

It could still be the controller, the above is just my current thinking.

Hmm, this has given me the motivation I needed to replace the DELL controller with one that is flashed with the LSI firmware. Been meaning to do it for a while now.

I just ordered it and when I install it Ill report back if anything changed.

Quote

1

October 9, 20205 yr

[1000:0064] 07:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)

[10:0:0:0] disk ATA ST3320310CS SC14 /dev/sdl 320GB

[10:0:1:0] disk SEAGATE ST4000NM0023 XMGJ /dev/sdm 4.00TB

testing with SEAGATE drive works as expected so far, but I have only added as a standalone second pool drive. Spins up when accessed.

will test on the other controller below. in my system over the weekend and may add into pool as the parity.

[1000:0086] 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)

[3:0:0:0] disk HGST HUS724030ALS640 A1C4 /dev/sdd 3.00TB

[3:0:1:0] disk HITACHI HMRSK2000GBAS07K 3P02 /dev/sde 2.00TB

[3:0:2:0] disk HGST HUS724030ALS640 A1C4 /dev/sdf 3.00TB

[3:0:3:0] disk HGST HUS724030ALS640 A1C4 /dev/sdh 3.00TB

[3:0:4:0] disk HGST HUS724030ALS640 A1C4 /dev/sdj 3.00TB

[3:0:5:0] disk HITACHI HMRSK2000GBAS07K 3P02 /dev/sdk 2.00TB

Both controllers are in IT Mode on P20.

Quote

October 9, 20205 yr

Author

2 hours ago, SimonF said:

[10:0:1:0] disk SEAGATE ST4000NM0023 XMGJ /dev/sdm 4.00TB

testing with SEAGATE drive works as expected so far, but I have only added as a standalone second pool drive. Spins up when accessed.

That's very interesting.

Could it be that it just takes very long to spin back up? I'm trying to find a plausible explanation as to why the same drive works as expected in your case but failed in the previous poster's case.

I guess the next test needs to be in the array.

Or, it might, after all, be the controller. Or the firmware version, although I'd really doubt that.

I was just about to push out a new version of the plugin with some changes and improvements, and with a mechanism to filter out certain HDDs by name/vendor. I'll hold off until we have some more insight.

Edited October 9, 20205 yr by doron

Quote

October 10, 20205 yr

FYI not working for my 7 Seagate ST4000NM0023 on Dell H200 IT mode

hoping for a solution...

great job

thanks

Quote

October 10, 20205 yr

Author

26 minutes ago, Spazhead said:

FYI not working for my 7 Seagate ST4000NM0023 on Dell H200 IT mode

Thanks for reporting.

Can you elaborate on "not working" in this case - are you seeing i/o errors? Did you get a red x? etc.

Can you share the drive f/w version?

We're seeing different reports wrt this drive so trying to dig further in (f/w, controller etc.).

Quote

October 10, 20205 yr

Just a follow up.

I replaced the DELL H310 LSI MegaRAID SAS 2008 controller with a DELL H710 LSI 9207-8i P20 flashed with LSI IT mode firmware version P20 (20.00.07.00).

After 18 hours of running and doing some other tests the drives spin up/down as they should and no errors or red x drives.

So, at least in my case replacing the controller fixed it.

Edited October 10, 20205 yr by SuperDan

Quote

October 10, 20205 yr

50 minutes ago, SuperDan said:

controller fixed it

Thanks for the update please feedback if any errors happen in the future. Are all the drives you are using HITACHI

Testing with SEAGATE as parity drive performs as expected with Manual spin down via command on 9201-16e will test Serial Attached SCSI controller" "Broadcom / LSI" "SAS2308 PCI-Express Fusion-MPT SAS-2" -r05 "Super Micro Computer Inc" "Onboard SAS2308 PCI-Express Fusion-MPT SAS-2 tomorrow

Edited October 10, 20205 yr by SimonF
Additional Info

Quote

October 10, 20205 yr

Author

51 minutes ago, SuperDan said:

Just a follow up.

I replaced the DELL H310 LSI MegaRAID SAS 2008 controller with a DELL H710 LSI 9207-8i P20 flashed with LSI IT mode firmware version P20 (20.00.07.00).

After 18 hours of running and doing some other tests the drives spin up/down as they should and no errors or red x drives.

So, at least in my case replacing the controller fixed it.

Thanks - great stuff!

Other reports also begin to confirm that this is probably controller dependent.

Do you happen to have the PCI IDs of these two?

For the running one, you can obtain it via "lspci -vnnmm".

For the one you removed - well, if it is connected to another machine...

Quote

October 10, 20205 yr

8 minutes ago, SimonF said:

Thanks for the update please feedback if any errors happen in the future. Are all the drives you are using HITACHI

Testing with SEAGATE as parity drive performs as expected with Manual spin down via command.

Will do.

Yes, all of the SAS drives are HITACHI.

Quote

October 10, 20205 yr

6 minutes ago, doron said:

Thanks - great stuff!

Other reports also begin to confirm that this is probably controller dependent.

Do you happen to have the PCI IDs of these two?

For the running one, you can obtain it via "lspci -vnnmm".

For the one you removed - well, if it is connected to another machine...

Sorry, I did not record the ID for the one I removed.

Unfortunately these are the Mini Mono cards that require a proprietary PCI storage slot on Dell servers and I do not have another server I can plug the old one into.

output of lspci -vnnmm

Slot: 03:00.0
Class: Serial Attached SCSI controller [0107]
Vendor: Broadcom / LSI [1000]
Device: SAS2308 PCI-Express Fusion-MPT SAS-2 [0087]
SVendor: Dell [1028]
SDevice: SAS2308 PCI-Express Fusion-MPT SAS-2 [1f38]
Rev: 05
NUMANode: 0
IOMMUGroup: 18

Quote

1

October 10, 20205 yr

9 hours ago, doron said:

Thanks for reporting.

Can you elaborate on "not working" in this case - are you seeing i/o errors? Did you get a red x? etc.

Can you share the drive f/w version?

We're seeing different reports wrt this drive so trying to dig further in (f/w, controller etc.).

i guess it just doesn't sleep

log error

Oct 10 04:29:19 TOWER kernel: md: do_drive_cmd: disk13: ATA_OP e0 ioctl error: -5

Oct 10 04:29:19 TOWER emhttpd: error: mdcmd, 2723: Input/output error (5): write

drives still work

Quote

October 10, 20205 yr

Author

2 minutes ago, SuperDan said:

Sorry, I did not record the ID for the one I removed.

Unfortunately these are the Mini Mono cards that require a proprietary PCI storage slot on Dell servers and I do not have another server I can plug the old one into.

Yes, I know these Dell Minis all too well. No worries - you are providing really helpful data.

I'm going to assume the old one is 1000:0073 / 1028:1f51 . If anyone can corroborate or correct (Dell H310 Mini Mono) - please do.

Quote

October 11, 20205 yr

SEAGATE Disk works as expected on both controllers

Class: Serial Attached SCSI controller [0107]
Vendor: Broadcom / LSI [1000]
Device: SAS2308 PCI-Express Fusion-MPT SAS-2 [0086]
SVendor: Super Micro Computer Inc [15d9]
SDevice: Onboard SAS2308 PCI-Express Fusion-MPT SAS-2 [0691]
Rev: 05

and

Class: Serial Attached SCSI controller [0107]
Vendor: Broadcom / LSI [1000]
Device: SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [0064]
SVendor: Broadcom / LSI [1000]
SDevice: 9201-16e 6Gb/s SAS/SATA PCIe x8 External HBA [30d0]
Rev: 02

Quote

October 12, 20205 yr

Thanks for making this a plugin. Hope I can help with some diagnostics.

HUS724030ALS640 Fail with *Temp and drive not spun down. ~~Still operational, seemingly no effect.~~ Getting read errors after a reboot.

Oct 11 20:38:24 Beast kernel: mdcmd (47): spindown 7
Oct 11 20:38:24 Beast kernel: md: do_drive_cmd: disk7: ATA_OP e0 ioctl error: -5
Oct 11 20:38:24 Beast emhttpd: error: mdcmd, 2723: Input/output error (5): write
Oct 11 20:38:24 Beast SAS Assist v0.6[27532]: spinning down slot 7, device /dev/sde (/dev/sg4)

Slot:   06:00.0
Class:  Serial Attached SCSI controller [0107]
Vendor: Broadcom / LSI [1000]
Device: SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [0064]
SVendor:        Broadcom / LSI [1000]
SDevice:        SAS 9201-16i [30c0]
Rev:    02
NUMANode:       0

Edited October 12, 20205 yr by jenga201

Quote

October 12, 20205 yr

Author

Thanks for taking the time.

13 hours ago, jenga201 said:

Thanks for making this a plugin. Hope I can help with some diagnostics.

HUS724030ALS640 Fail with *Temp and drive not spun down. ~~Still operational, seemingly no effect.~~

Thanks. How did you determine that the drive is not spun down?

13 hours ago, jenga201 said:

Getting read errors after a reboot.


Oct 11 20:38:24 Beast kernel: mdcmd (47): spindown 7
Oct 11 20:38:24 Beast kernel: md: do_drive_cmd: disk7: ATA_OP e0 ioctl error: -5
Oct 11 20:38:24 Beast emhttpd: error: mdcmd, 2723: Input/output error (5): write
Oct 11 20:38:24 Beast SAS Assist v0.6[27532]: spinning down slot 7, device /dev/sde (/dev/sg4)


Slot:   06:00.0
Class:  Serial Attached SCSI controller [0107]
Vendor: Broadcom / LSI [1000]
Device: SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [0064]
SVendor:        Broadcom / LSI [1000]
SDevice:        SAS 9201-16i [30c0]
Rev:    02
NUMANode:       0

These two messages (OP e0 error and mdcmd error (5)) are not really read errors. They are the a the drive/controller's response to the ATA ops sent by Unraid, and are not an indication of a problem. I'm assuming you have been receiving them prior to installing the plugin (and keep receiving them if you remove the plugin) - please correct me if I'm wrong.

The next version of the plugin (not pushed out yet) includes a feature that just filters out these syslog messages.

Quote

[Plugin] Spin Down SAS Drives

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

doron

Squid

AnnabellaRenee87

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

doron

Squid

AnnabellaRenee87

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)