Spin down SAS drives


doron

Recommended Posts

  • 2 weeks later...

I hate to just say "+1", so I'll say a little more. I just got a bunch of disks that are SAS (for cheap and 8TB in size) and was wondering why the heck they don't spin down. This does make sense now. I have tried the sg_start -S commands and they honestly dont seem to work. I also messed around on with the settings Zack Reed mentions on his blog here as mentioned in another post. That didn't work either. Think about all the power that could be saved if @limetech implemented a solution here :).

 

Edit: Adding the -r parameter to the sg_start command does make the drives "spin down". Thats in quotes, because im not sure exactly what is happening to the drives. Unraid does not show them in a spun down state, but I am rebuilding part of my array right now because when unraid tried to write to the drives, they errored out.

Edited by JimJamUrUnraid
Added more information
Link to comment

@JimJamUrUnraid, @Golfonauta - indeed, as you can see in this post here, it appears as if the challenge is spinning up rather than down; if we spin the drive down using one of these methods, and Unraid is not made aware of it, the next time it will want to write to that drive it will get a timeout (takes time to wake up...) and will red-x it. Then it needs to be rebuilt.

 

No damage to the drive other than that - it's just that Unraid will think the data is bad (out of sync) and will need to rebuild.

 

In short, we need Limetech (@bonienl?...) to come to the rescue.

  • Like 1
Link to comment

You guys with SAS devices:  Please run some experiments for me.  These are the commands which may be used to spinup and spindown SAS devices:

sg_start -rs /dev/sdX  # spin up

sg_start -rS /dev/sdX # spin down

Here's what I need to know:

a) do those commands work reliably, ie, no hanging, no strange messages in system log?

 

b) after typing each command, does the command return immediately or does it return after the action has completed?  Could be that spindown is immediate but how about spinup, does command finish only after device has spun up?

 

After spinning down a drive, type this command:

smartctl -n standby -AH /dev/sdX

c) does the command report device in standby mode and does the device stay spun down?

 

d) After spinning down a drive, type this command to spin it back up via I/O; does it work?

dd of=/dev/null if=/dev/sdX

 

One user has reported case of on-demand I/O failing because a timeout somewhere is too short:

On 6/26/2020 at 8:37 AM, doron said:

if we spin the drive down using one of these methods, and Unraid is not made aware of it, the next time it will want to write to that drive it will get a timeout (takes time to wake up...) and will red-x it. Then it needs to be rebuilt.

Anyone else see this?

  • Like 1
  • Thanks 1
Link to comment
On 7/12/2020 at 1:10 PM, limetech said:

 

You guys with SAS devices:  Please run some experiments for me.  These are the commands which may be used to spinup and spindown SAS devices:


sg_start -rs /dev/sdX  # spin up

sg_start -rS /dev/sdX # spin down

Here's what I need to know:

a) do those commands work reliably, ie, no hanging, no strange messages in system log?

The commands are reliably spinning up/down my SAS devices.

 

On 7/12/2020 at 1:10 PM, limetech said:

 

b) after typing each command, does the command return immediately or does it return after the action has completed?  Could be that spindown is immediate but how about spinup, does command finish only after device has spun up?

 

Each command only finishes after the device is spun up/down.

 

On 7/12/2020 at 1:10 PM, limetech said:

 

After spinning down a drive, type this command:


smartctl -n standby -AH /dev/sdX

c) does the command report device in standby mode and does the device stay spun down?

 

I had to use smartcrl -i /dev/sdX to get any shutdown data.  It reports the following:

 

root@xxxxxx:~# smartctl -i /dev/sdq

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS724030ALS640
Revision:             A1C4
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
LU is resource provisioned, LBPRZ=0
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca0581ad3e8
Serial number:        P9GGSBXW
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Jul 13 16:49:00 2020 EDT
device is NOT READY (e.g. spun down, busy)

 

and the device stays spun down until manually spun up.

On 7/12/2020 at 1:10 PM, limetech said:

 

d) After spinning down a drive, type this command to spin it back up via I/O; does it work?


dd of=/dev/null if=/dev/sdX

 

spun down:

 

root@xxxxxx:~# dd of=/dev/null if=/dev/sdq
dd: error reading '/dev/sdq': Input/output error
40+0 records in
40+0 records out
20480 bytes (20 kB, 20 KiB) copied, 0.000502602 s, 40.7 MB/s

 

and the HD does not spin up.

 

spun up:

 

root@xxxxxxx:~# dd of=/dev/null if=/dev/sdq
^C3937545+0 records in
3937544+0 records out
2016022528 bytes (2.0 GB, 1.9 GiB) copied, 10.3798 s, 194 MB/s

 

On 7/12/2020 at 1:10 PM, limetech said:

One user has reported case of on-demand I/O failing because a timeout somewhere is too short:

Anyone else see this?

 Not tested.

Link to comment

Super stoked to see this issue get some attention.  I too use SAS disks and would love the ability to spin them down.  I tried the start/stop commands above on a 6.8.3 installation.

The spin down command seemed to work as smartctl reported that the disk was not ready afterwards.  Unfortunately, the disk in question refused come back up using the spin up command and was subsequently disabled by unraid.  I suppose it's possible that I didn't wait long after issuing the command though.  

 

I'm currently doing a read check to try and re-enable the disabled disk.  

Link to comment
On 7/12/2020 at 8:10 PM, limetech said:

You guys with SAS devices:  Please run some experiments for me.  These are the commands which may be used to spinup and spindown SAS devices:


sg_start -rs /dev/sdX  # spin up

sg_start -rS /dev/sdX # spin down

 

Thanks for taking a shot at this!!

 

I'm away from my server until a bit later so will be able to test again when back; quick question though -- why is the -r flag there? A-la manpage this seems to put the drive in read-only mode - is this what you intended?

When I tried this command previously (see first post in this thread), I used -s / -S, without the -r.

 

Link to comment
8 hours ago, Lhank said:

spun down:

 

root@xxxxxx:~# dd of=/dev/null if=/dev/sdq
dd: error reading '/dev/sdq': Input/output error
40+0 records in
40+0 records out
20480 bytes (20 kB, 20 KiB) copied, 0.000502602 s, 40.7 MB/s

 

and the HD does not spin up.

Sorry to say, this is a show-stopper.  When a scsi device goes into standby, it has to be told explicitly to "wake up" (spin up), unlike ATA which auto-spins up HDD's if a read/write command is received.  This represents a serious amount of coding to deal with this behavior.

 

Another experiment if someone wants to try:  Create a 2-device btrfs pool and copy some files to it.  Then reboot, use the sg_start -S command to spindown one of the devices and then write a new file to the pool - what happens?

Link to comment
20 minutes ago, doron said:

Thanks for taking a shot at this!!

 

I'm away from my server until a bit later so will be able to test again when back; quick question though -- why is the -r flag there? A-la manpage this seems to put the drive in read-only mode - is this what you intended?

When I tried this command previously (see first post in this thread), I used -s / -S, without the -r.

 

It doesn't put the drive in read-only mode, it opens the device file descriptor in read-only mode for purposes of sending the scsi spin-up/down commands.  Probably not necessary but shouldn't hurt.

  • Thanks 1
Link to comment
On 7/14/2020 at 1:45 AM, limetech said:

Sorry to say, this is a show-stopper.  When a scsi device goes into standby, it has to be told explicitly to "wake up" (spin up), unlike ATA which auto-spins up HDD's if a read/write command is received.  This represents a serious amount of coding to deal with this behavior.

 

Another experiment if someone wants to try:  Create a 2-device btrfs pool and copy some files to it.  Then reboot, use the sg_start -S command to spindown one of the devices and then write a new file to the pool - what happens?

Using 6.9.0-beta25

 

Pretty much as you would expect:

ErrorWarningSystemArrayLogin


Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2692 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2692 ASC=0x4 ASCQ=0x2
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2692 CDB: opcode=0x8a 8a 00 00 00 00 00 00 6b 64 40 00 00 04 00 00 00
Jul 16 12:21:52 Test kernel: blk_update_request: I/O error, dev sdd, sector 7038016 op 0x1:(WRITE) flags 0x104000 phys_seg 127 prio class 0
Jul 16 12:21:52 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 5332, rd 0, flush 3, corrupt 0, gen 0
Jul 16 12:21:52 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 5333, rd 0, flush 3, corrupt 0, gen 0
Jul 16 12:21:52 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 5334, rd 0, flush 3, corrupt 0, gen 0
Jul 16 12:21:52 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 5335, rd 0, flush 3, corrupt 0, gen 0
Jul 16 12:21:52 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 5336, rd 0, flush 3, corrupt 0, gen 0
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2693 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2693 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2693 ASC=0x4 ASCQ=0x2
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2693 CDB: opcode=0x8a 8a 00 00 00 00 00 00 6b 68 40 00 00 04 00 00 00
Jul 16 12:21:52 Test kernel: blk_update_request: I/O error, dev sdd, sector 7039040 op 0x1:(WRITE) flags 0x104000 phys_seg 128 prio class 0
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2694 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2694 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2694 ASC=0x4 ASCQ=0x2
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2694 CDB: opcode=0x8a 8a 00 00 00 00 00 00 6b 6c 40 00 00 07 80 00 00
Jul 16 12:21:52 Test kernel: blk_update_request: I/O error, dev sdd, sector 7040064 op 0x1:(WRITE) flags 0x104000 phys_seg 126 prio class 0
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2695 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2695 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2695 ASC=0x4 ASCQ=0x2
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2695 CDB: opcode=0x8a 8a 00 00 00 00 00 00 6b 73 c0 00 00 08 00 00 00
Jul 16 12:21:52 Test kernel: blk_update_request: I/O error, dev sdd, sector 7041984 op 0x1:(WRITE) flags 0x100000 phys_seg 119 prio class 0
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2707 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2707 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2707 ASC=0x4 ASCQ=0x2
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2707 CDB: opcode=0x8a 8a 00 00 00 00 00 00 6b 7b c0 00 00 04 00 00 00
Jul 16 12:21:52 Test kernel: blk_update_request: I/O error, dev sdd, sector 7044032 op 0x1:(WRITE) flags 0x104000 phys_seg 118 prio class 0
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2709 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2709 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2709 ASC=0x4 ASCQ=0x2
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2709 CDB: opcode=0x8a 8a 00 00 00 00 00 00 6b 7f c0 00 00 0a 00 00 00
Jul 16 12:21:52 Test kernel: blk_update_request: I/O error, dev sdd, sector 7045056 op 0x1:(WRITE) flags 0x104000 phys_seg 44 prio class 0
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2710 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2710 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2710 ASC=0x4 ASCQ=0x2
Jul 16 12:21:52 Test kernel: sd 11:0:2:0: [sdd] tag#2710 CDB: opcode=0x8a 8a 00 00 00 00 00 00 6b 89 c0 00 00 0a 00 00 00
Jul 16 12:21:52 Test kernel: blk_update_request: I/O error, dev sdd, sector 7047616 op 0x1:(WRITE) flags 0x104000 phys_seg 92 prio class 0
Jul 16 12:21:57 Test kernel: scsi_io_completion_action: 1065 callbacks suppressed
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2921 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2921 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2921 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2921 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 62 60 00 00 00 60 00 00
Jul 16 12:21:57 Test kernel: print_req_error: 1065 callbacks suppressed
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7365216 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: btrfs_dev_stat_print_on_error: 2558 callbacks suppressed
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7893, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2924 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2924 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2924 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2924 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 62 c0 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7365312 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7894, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2926 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2926 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2926 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2926 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 63 40 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7365440 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7895, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2929 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2929 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2929 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2929 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 63 c0 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7365568 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7896, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#3120 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#3120 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#3120 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#3120 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 64 40 00 00 01 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7365696 op 0x1:(WRITE) flags 0x100000 phys_seg 3 prio class 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7897, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7898, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7899, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2729 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2729 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2729 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2729 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 65 c0 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7366080 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7900, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2732 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2732 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2732 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2732 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 66 40 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7366208 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7901, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2735 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2735 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2735 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2735 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 66 c0 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7366336 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 7902, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2737 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2737 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2737 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2737 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 67 40 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7366464 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2740 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2740 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2740 ASC=0x4 ASCQ=0x2
Jul 16 12:21:57 Test kernel: sd 11:0:2:0: [sdd] tag#2740 CDB: opcode=0x8a 8a 00 00 00 00 00 00 70 67 c0 00 00 00 80 00 00
Jul 16 12:21:57 Test kernel: blk_update_request: I/O error, dev sdd, sector 7366592 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
Jul 16 12:22:17 Test kernel: scsi_io_completion_action: 6404 callbacks suppressed
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1794 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1794 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1794 ASC=0x4 ASCQ=0x2
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1794 CDB: opcode=0x8a 8a 00 00 00 00 00 00 00 08 40 00 00 01 00 00 00
Jul 16 12:22:17 Test kernel: print_req_error: 6404 callbacks suppressed
Jul 16 12:22:17 Test kernel: blk_update_request: I/O error, dev sdd, sector 2112 op 0x1:(WRITE) flags 0x1800 phys_seg 32 prio class 0
Jul 16 12:22:17 Test kernel: btrfs_dev_stat_print_on_error: 6539 callbacks suppressed
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14442, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14443, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1796 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1796 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1796 ASC=0x4 ASCQ=0x2
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1796 CDB: opcode=0x8a 8a 00 00 00 00 00 00 00 09 60 00 00 03 e0 00 00
Jul 16 12:22:17 Test kernel: blk_update_request: I/O error, dev sdd, sector 2400 op 0x1:(WRITE) flags 0x5800 phys_seg 115 prio class 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14444, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14445, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14446, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14447, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14448, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14449, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14450, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): bdev /dev/sdd1 errs: wr 14451, rd 2, flush 3, corrupt 0, gen 0
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1798 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1798 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1798 ASC=0x4 ASCQ=0x2
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#1798 CDB: opcode=0x8a 8a 00 00 00 00 00 00 00 0d 40 00 00 00 a0 00 00
Jul 16 12:22:17 Test kernel: blk_update_request: I/O error, dev sdd, sector 3392 op 0x1:(WRITE) flags 0x1800 phys_seg 20 prio class 0
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#820 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#820 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#820 ASC=0x4 ASCQ=0x2
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#820 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
Jul 16 12:22:17 Test kernel: blk_update_request: I/O error, dev sdd, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#829 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#829 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#829 ASC=0x4 ASCQ=0x2
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#829 CDB: opcode=0x8a 8a 00 00 00 00 00 20 00 00 40 00 00 00 08 00 00
Jul 16 12:22:17 Test kernel: blk_update_request: I/O error, dev sdd, sector 536870976 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
Jul 16 12:22:17 Test kernel: BTRFS warning (device sdb1): lost page write due to IO error on /dev/sdd1 (-5)
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#828 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#828 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#828 ASC=0x4 ASCQ=0x2
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#828 CDB: opcode=0x8a 8a 00 00 00 00 00 00 02 00 40 00 00 00 08 00 00
Jul 16 12:22:17 Test kernel: blk_update_request: I/O error, dev sdd, sector 131136 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
Jul 16 12:22:17 Test kernel: BTRFS warning (device sdb1): lost page write due to IO error on /dev/sdd1 (-5)
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#826 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#826 Sense Key : 0x2 [current] [descriptor]
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#826 ASC=0x4 ASCQ=0x2
Jul 16 12:22:17 Test kernel: sd 11:0:2:0: [sdd] tag#826 CDB: opcode=0x8a 8a 08 00 00 00 00 00 00 00 c0 00 00 00 08 00 00
Jul 16 12:22:17 Test kernel: blk_update_request: I/O error, dev sdd, sector 192 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
Jul 16 12:22:17 Test kernel: BTRFS warning (device sdb1): lost page write due to IO error on /dev/sdd1 (-5)
Jul 16 12:22:17 Test kernel: BTRFS error (device sdb1): error writing primary super block to device 2

The drive does not spin up, but the pool is intact and functioning. 

 

Link to comment
On 7/14/2020 at 8:45 AM, limetech said:

Sorry to say, this is a show-stopper.  When a scsi device goes into standby, it has to be told explicitly to "wake up" (spin up), unlike ATA which auto-spins up HDD's if a read/write command is received.  This represents a serious amount of coding to deal with this behavior.

 

Another experiment if someone wants to try:  Create a 2-device btrfs pool and copy some files to it.  Then reboot, use the sg_start -S command to spindown one of the devices and then write a new file to the pool - what happens?

Just got to do some experimenting. Findings --

 

1. Concur. When a SAS drives is actually spun down, i/o directed at it won't spin it up - it needs to be explicitly spun up.

2. On my system I see a difference I can't explain between

sg_start -S /dev/sdX

and

sg_start -rS /dev/sdX

In the first case, immediately after I issue the stop command, my log shows:

Jul 16 22:39:35 Tower kernel: sd 4:0:4:0: [sdk] Spinning up disk...
Jul 16 22:39:47 Tower kernel: ............ready
Jul 16 22:39:47 Tower kernel: sdk: sdk1

This happens every time, and immediately after the stop command. Obviously, after that, i/o succeeds with no issue (since the drive is spun up - not sure who does that).

 

Conversely, when I issue the second stop command (the one with the -r flag), the drive stays spun down. Then, as already reported, all i/o attempts indeed fail with i/o error, until the start command is issued. 

 

Not sure what is causing the difference.

Link to comment
31 minutes ago, doron said:

Not sure what is causing the difference.

The man-page for sg-start includes this note:

 

      -r, --readonly
              open  the DEVICE in read-only mode. Maybe required in Linux to stop a nuisance spin-up if the DEVICE is an ATA disk. The nuisance spin-up may occur at the end of this command negating the effect of the --stop option.

 

Besides if "DEVICE is an ATA disk" may be other causes.  🤷‍♂️

 

But the no-spin-up on demand, as mentioned earlier, is a show-stopper.  Not sure how to deal with that.

Link to comment
On 7/16/2020 at 4:36 PM, limetech said:

The man-page for sg-start includes this note:

 

      -r, --readonly
              open  the DEVICE in read-only mode. Maybe required in Linux to stop a nuisance spin-up if the DEVICE is an ATA disk. The nuisance spin-up may occur at the end of this command negating the effect of the --stop option.

 

Besides if "DEVICE is an ATA disk" may be other causes.  🤷‍♂️

 

But the no-spin-up on demand, as mentioned earlier, is a show-stopper.  Not sure how to deal with that.

One interesting thing I'll add is that spinning down SAS drives on FreeNAS does result in them spinning back up on access. This could be a difference in the way FreeBSD or ZFS handles the drives, but I thought it might be worth mentioning. 

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.