Failed Controller, Failed Disk, New USB enclosure

dmangus · August 29, 2019

ok Gurus here is my dilemma. I have been failing desk left and right. I replaced disks, replaced sata cables, and then finally the system will not boot because it hangs on the controller. Before that I had replaced the failed drive and started a rebuild only for it to hang on constant read errors from disk 3. Disk 2 is the failed drive.

Anyway i decided i needed a new controller and that I still need a better enclosure so I went with a USB3.0 eight disk encloser. problem is I need to get the system to recognize the new configuration with the disks coming off USB. I know i can use "new config" tool, but from what i read, once done i will not be able to rebuild the failed drive 2. Is this so?

I read some where i can keep the assignments, as well as flag the parity disk as good parity. If so why would I not be able to rebuild disk2?

Is there any workarounds on this or am I going to just have to face the fact that disk2 and its data will now be lost?

JonathanM · August 29, 2019

7 minutes ago, dmangus said:

so I went with a USB3.0 eight disk encloser

USB, while it may work, is extremely problematic for array drives. It's just not a robust enough protocol for the demanding environment of multi disk RAID, or unraid.

Did you originally have all your drives directly connected to SATA ports?

Describe your hardware in much more detail, and maybe we can find a solution that will work.

dmangus · August 29, 2019

yea, i have a dell optiplex 9020 as my server, and I cant remember the addition scsi controller. I believe its a marvel chipset though, 4 port. Server been running for a good 5 years. 5 drives, parity, 3x DATA, and cache. the last 3 months have been hell with failed drives. I basically had to restore files once as 2x drives failed at one point, all with high CRC errors. It start to become super coincidental that the same drive numbers would fail. It wasn't until the actual card went kaput that i new for sure it was the pcie scsi controller.

I have a 4 bay disk enclosure which i have to run the power from the workstation out the back as well as the sata cables. so since i have to go with something different, I might as well go with a enclosure with its own power, 8 bays so i can go double parity. going through the whole process of restoring files from discovered chunks on the disk was hellish and I didn't want to go through that again.

I figured i would give usb3.0 a try, as I would have to find a good HBA/Controller, go though flashing it etc, I just wanted to keep it simple I guess. Plus i would still need to find a enclosure with it'ts own power and can support 5+ drives.

None the less it wouldn't change where i am now. I still need to get a new configuration and some how rebuild disk 2 off the current parity if I can.

JorgeB · August 29, 2019

4 hours ago, dmangus said:

I still need to get a new configuration and some how rebuild disk 2 off the current parity if I can.

You can do that with the invalid slot command, but need more details, like what Unraid release you are running and if using single or dual parity.

dmangus · August 29, 2019

Just now, johnnie.black said:

You can do that with the invalid slot command, but need more details, like what Unraid release you are running and if using single or dual parity.

hey Ti-Ti, im running 6.6.7 with single parity drive. I know exactly what drive is parity and what data drive 1, 2, and 3 are. Drive 2 is brand new drive which needs to be rebuilt. Any help would be greatly appreciated.

😇

JorgeB · August 29, 2019

Assuming parity is valid you can do this:

-Tools -> New Config -> -Assign all disks, including new disk2
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 2 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk2 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

dmangus · August 29, 2019

Just now, johnnie.black said:
Assuming parity is valid you can do this:

-Tools -> New Config -> -Assign all disks, including new disk2
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):
mdcmd set invalidslot 2 29
-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk2 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

hmmm sounds like a plan, i will let you know how it turns out. Thanks for the direction.

dmangus · August 31, 2019

sigh... well I did the new config, and started to assign drives. When i assigned drive3, usb device 2, it vanished from the list. I was like wtf! ok so i thought maybe it spun down or something. I rebooted and now no USB disks are found, only my cache drive which is on the on board sata controller. tower-syslog-20190831-0029.zip

scratching my head hear. the device i have is Mediasonic-H82-SU3S2-ProBox

JorgeB · August 31, 2019

USB enclosures are not recommend for Unraid, one of the many reasons is that they have tendency to drop devices, though yours is showing a constant problem, disks are detect correctly, e.g.:

Aug 30 23:55:10 Tower kernel: scsi 7:0:0:0: Direct-Access     ST4000DM 004-2CV104       0125 PQ: 0 ANSI: 6
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: Attached scsi generic sg2 type 0
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: [sdc] 4096-byte physical blocks
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: [sdc] Write Protect is off
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: [sdc] Mode Sense: 67 00 10 08
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: [sdc] No Caching mode page found
Aug 30 23:55:10 Tower kernel: sd 7:0:0:0: [sdc] Assuming drive cache: write through

Then there's this error:

Aug 30 23:55:10 Tower kernel: usb 4-6: Device not responding to setup address.
Aug 30 23:55:10 Tower kernel: usb 4-6: Device not responding to setup address.
Aug 30 23:55:11 Tower kernel: usb 4-6: device not accepting address 4, error -71
Aug 30 23:55:15 Tower kernel: usb usb4-port6: Cannot enable. Maybe the USB cable is bad?
Aug 30 23:55:19 Tower kernel: usb usb4-port6: Cannot enable. Maybe the USB cable is bad?
Aug 30 23:55:23 Tower kernel: usb usb4-port6: Cannot enable. Maybe the USB cable is bad?
Aug 30 23:55:23 Tower kernel: usb 4-6: USB disconnect, device number 4

And all disks drop out and capacity can't be read:

Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 20 00 00 00 08 00 00
Aug 30 23:55:23 Tower kernel: print_req_error: I/O error, dev sdc, sector 32
Aug 30 23:55:23 Tower kernel: Buffer I/O error on dev sdc, logical block 4, async page read
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] Read Capacity(16) failed: Result: hostbyte=0x01 driverbyte=0x00
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] Sense not available.
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] Read Capacity(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] Sense not available.
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] 0 512-byte logical blocks: (0 B/0 B)
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] 4096-byte physical blocks
Aug 30 23:55:23 Tower kernel: sd 7:0:0:0: [sdc] Attached SCSI disk

Then the process starts again and keeps repeating, for all disks.

Edited August 31, 2019 by johnnie.black

dmangus · September 2, 2019

great, story of my life, thanks. I guess i have no choice but to find a sata controller and a decent enclosure.

my current usb enclosure also supports esata, how about this?

Edited September 2, 2019 by dmangus

itimpi · September 2, 2019

1 hour ago, dmangus said:

great, story of my life, thanks. I guess i have no choice but to find a sata controller and a decent enclosure.

my current usb enclosure also supports esata, how about this?

It is definitely worth trying eSATA as that is more likely to function reliably from a connection viewpoint than USB.

dmangus · September 2, 2019

ok i just put a quick purchase on SYBA SI-PEX40076 2 eSATA III from amazon. I know its now on the list of compatible or incompatible devices, but its low profile and relatively cheep. If any one have any suggestions please feel free to let me know too. I think i can always send this back. Not sure about my enclosure.

Kevek79 · September 2, 2019

Didn't you start this whole adventure because of a non working Marvel controller?

So why replacing it with another one of those ?

JonathanM · September 2, 2019

That card uses a marvell controller, NOT recommended. I think amazon B006SF68OS would be a better pick.

dmangus · September 9, 2019

On 9/2/2019 at 3:49 PM, Kevek79 said:

Didn't you start this whole adventure because of a non working Marvel controller?

So why replacing it with another one of those ?

yea right, well because it didn't give problems until it completely started to die, so I see that as manufacture issue, not a compatibility issue. Anyway that controller didn't come through, amazon ran out of it. I picked up I/O Crest 2 Port SATA III and 2 Port eSATA, Yes another marvel chipset.

New issue though. So all the drives picked up this time except one. I am getting the following error for this one drive.

Sep  9 11:50:06 Tower kernel: ata7.02: READ LOG DMA EXT failed, trying PIO
Sep  9 11:50:06 Tower kernel: ata7.02: failed to read log page 10h (errno=-5)
Sep  9 11:50:06 Tower kernel: ata7.02: exception Emask 0x1 SAct 0x800 SErr 0x0 action 0x0
Sep  9 11:50:06 Tower kernel: ata7.02: configured for UDMA/133 (device error ignored)

soon as I assign the drive to disk 3, it vanishes. Any ideas?

tower-syslog-20190909-1151.zip

JorgeB · September 9, 2019

Enclosure uses a SATA port multiplier and those are not recommended, though some combinations work, IMHO if you need external get SAS.

dmangus · December 14, 2019

Total update, replaced controller, replace cables. replaced harddrives, updated unraid. I would still get failed drives from brand new harddrives right out the wrapper. I am like pulling out my hair now. even after a successful preclear drives will then get disabled on parity sync. here is my diagnostics, any suggestions?

tower-diagnostics-20191214-0932.zip

JorgeB · December 15, 2019

Parity disk dropped offline so there's no SMART, check connections and post new diags, migh as well replace/swap cables with another disk to rule them out.

dmangus · January 6, 2020

do you think i need to flash the new controller? LSI Logic SAS 9207-8i Storage Controller LSI00301 if so can any one point me where i can get what i need to do so?

JorgeB · January 6, 2020

It would be recommended, you can get the package from Broadcom's support site.

Failed Controller, Failed Disk, New USB enclosure

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation