Two disks disabled, need clarification on proper procedure to get them back online.

April 21Apr 21

Hi all,

So, briefly, I had people doing work in by basement and someone slid a table over which hit the latches on two Supermicro sleds causing the drives to slide forward enough to break their connection.

Unraid 7.2.4, Supermicro 846, BPN-SAS2-846EL1, dual parity-

Last night I noticed two drives with grey balls and would not spin up. Log showed the following:

Syslog

Apr 20 18:04:11 spock kernel: sd 11:0:1:0: device_block, handle(0x000b)

Apr 20 18:04:13 spock kernel: sd 11:0:1:0: device_unblock and setting to running, handle(0x000b)

Apr 20 18:04:13 spock kernel: sd 11:0:3:0: device_block, handle(0x000d)

Apr 20 18:04:13 spock kernel: sd 11:0:1:0: [sdg] tag#2754 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=DRIVER_OK cmd_age=1s

Apr 20 18:04:13 spock kernel: sd 11:0:1:0: [sdg] tag#2754 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00

Apr 20 18:04:13 spock kernel: sd 11:0:1:0: [sdg] tag#2755 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=DRIVER_OK cmd_age=0s

Apr 20 18:04:13 spock kernel: sd 11:0:1:0: [sdg] tag#2755 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00

Apr 20 18:04:13 spock kernel: sd 11:0:1:0: [sdg] Synchronizing SCSI cache

Apr 20 18:04:13 spock kernel: sd 11:0:1:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK

Apr 20 18:04:13 spock kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x500304800143178e)

Apr 20 18:04:13 spock kernel: mpt2sas_cm0: removing handle(0x000b), sas_addr(0x500304800143178e)

Apr 20 18:04:13 spock kernel: mpt2sas_cm0: enclosure logical id(0x50030480014317bf), slot(2)

Apr 20 18:04:15 spock kernel: sd 11:0:3:0: device_unblock and setting to running, handle(0x000d)

Apr 20 18:04:15 spock kernel: sd 11:0:3:0: [sdi] tag#2772 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=DRIVER_OK cmd_age=1s

Apr 20 18:04:15 spock kernel: sd 11:0:3:0: [sdi] tag#2772 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00

Apr 20 18:04:15 spock kernel: sd 11:0:3:0: [sdi] tag#2773 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=DRIVER_OK cmd_age=0s

Apr 20 18:04:15 spock kernel: sd 11:0:3:0: [sdi] tag#2773 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00

Apr 20 18:04:15 spock emhttpd: read SMART /dev/sdg

Apr 20 18:04:15 spock emhttpd: read SMART /dev/sdi

Apr 20 18:04:15 spock kernel: sd 11:0:3:0: [sdi] Synchronizing SCSI cache

Apr 20 18:04:15 spock kernel: sd 11:0:3:0: [sdi] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK

Apr 20 18:04:15 spock kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x5003048001431790)

Apr 20 18:04:15 spock kernel: mpt2sas_cm0: removing handle(0x000d), sas_addr(0x5003048001431790)

Apr 20 18:04:15 spock kernel: mpt2sas_cm0: enclosure logical id(0x50030480014317bf), slot(4)

Apr 20 18:34:13 spock emhttpd: spinning down /dev/sdg

Apr 20 18:34:13 spock emhttpd: spinning down /dev/sdi

Apr 20 18:34:13 spock emhttpd: sdspin /dev/sdg down: 2

Apr 20 18:34:13 spock emhttpd: sdspin /dev/sdi down: 2

Apr 21 01:26:39 spock webgui: Successful login user root from 192.168.1.20

Apr 21 01:27:25 spock emhttpd: offline: WDC_WD140EDGZ-11B1PA0_XXXXXXXX (sdg) 512 27344764928

Apr 21 01:29:14 spock emhttpd: offline: WDC_WUH721414ALE604_XXXXXXXX (sdi) 512 27344764928

Once I discovered the physical issue, I shut the server down and reinserted the disks. Once coming back online the disks are disabled and being emulated. I am currently running long S.M.A.R.T. test on both drives, but I expect them to be fine. There should not have been any write options during when it happened and my discovery of the physical cause this morning. I immediately disabled Mover when I noticed the situation. However, I cannot be 100% certain.

What is the proper procedure here to get the drives back online into the array? Do I stop and start the array again? Or do I unassign those drives first, start/stop, and then reassign? Will a data rebuild be necessary? I've never encountered this situation before.

Any help is appreciated. Thanks for reading.

Quote

April 22Apr 22

Community Expert

Attach Diagnostics ZIP to your NEXT post in this thread.

Quote

April 22Apr 22

Author

@trurl I didn't think it relevant as the cause is known because the drives were severed from their SATA connections. I'll attach both diagnostics. One prior to taking the server down to reinsert the drives, and the other as it is now.

I'm trying to figure out the proper procedure to get the array back online. The long S.M.A.R.T. tests are about 40% through at this point. As I said, I expect the drives will come back fine.

Thanks!

spock-diagnostics-20260421-0146.zip spock-diagnostics-20260421-2248.zip

Quote

April 22Apr 22

Community Expert
Solution

Since both emulated disks are mounting, and assuming the contents look correct, the recommendation would be to rebuild on top.

Quote

April 22Apr 22

Community Expert

https://docs.unraid.net/unraid-os/using-unraid-to/manage-storage/array/replacing-disks-in-array/#re-enabling-a-disabled-disk-rebuilding-onto-itself

Quote

April 22Apr 22

Community Expert

10 hours ago, spall said:
didn't think it relevant

We were missing this important info

6 hours ago, JorgeB said:
both emulated disks are mounting

Didn't want you to rebuild an unmountable filesystem.

Quote

April 22Apr 22

Author

@trurl @JorgeB Great. I appreciate the help.

A few questions since I'm still waiting for the long S.M.A.R.T. to finish:

1) Do I unassign/reassign both drives at the same time? Or is this a rebuild one and then the other?

2) Since I keep two spares, would it make more sense to swap in the spares for the rebuilds? I'm assuming I could mount the existing drives via Unassigned Devices or in another server and still get to the data. They could become my spares after rebuild.

3) I am in the unfortunate spot of having drain tile work in my basement starting Friday morning. I don't know that I can pull of a 14TB rebuild in that amount of time. Am I better shutting the server down until the work finishes? Or can I stop/resume a rebuild?

EDIT: I should add that for question 3, the basement work is going to require me tarping over and wrapping my server rack, so they need to go offline.

Thanks again, guys.

Edited April 22Apr 22 by spall

Quote

April 22Apr 22

Community Expert

I didn't examine SMART for any other disks since you have so many. Do any show SMART warning (👎) on the DASHBOARD page?

Rebuild will start over if you restart, so probably better if you just shutdown and wait.

You can rebuild both at the same time. Rebuilding to spares does give other options in case of problems.

My home office is undergoing some remodel currently, so I have moved my server to another room where I already have ethernet. Don't know if that is an option for you.

Quote

April 22Apr 22

Author

@trurl Everything is thumps up on the dashboard including the two disabled disks. One of the disabled, disk5/sdi, has finished the extended test and is all good except for 1 UDMA CRC error which I believe was a previous error. Disk13/sdg is 90% done with the test.

Unfortunately, moving is not an option for me. I have a spinal injury and cannot muscle the 4U upstairs even if I remove all the disks. I'm going to have to wrap and tarp the whole rack.

Ok. Sounds like I'll probably rebuild to my spares, but I'll shut everything down when this test finishes and wait until the work is done in the basement. I'm not sure how long it will take to rebuild 2x14TB drives.

Thank you.

Quote

April 22Apr 22

Author

The second one just finished the extended test. All good on that drive, as well.

I have to wrap the rack in about 30 hours. I have a feeling that would be cutting it real close if I started rebuild now, unfortunately.

Edited April 22Apr 22 by spall

Quote

May 8May 8

Author

@JorgeB @trurl My server rack was powered down longer than I anticipated. Anyway, dual rebuild to two new drives worked like a charm. Thanks for the help! Marking as solved.

Quote

1

Two disks disabled, need clarification on proper procedure to get them back online.

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)