New drive won't stay connected

Luke787 · July 23, 2017

I am currently running the trial version of UnRaid on my R710 for a few weeks, and have been pretty happy so far. However, I bought a brand new 2 TB drive on prime day (Seagate ST2000LX001). I was going to add it to my array of two 4 TB disks but things aren't working out as smoothly as the previous drives. At first when I went to perform anything on the drive, it disappeared from unassigned devices. I switched the slot it was in just in case that was an issue and it appeared.

I ran a preclear then afterwards was unable to use it. Unraid wouldn't let me run a smart test as it had to be spun up. However it wouldn't spin up. Everything was greyed out. After reset, it wouldn't even show up. I ended up removing the drive and formatting it outside of the unraid box and reinserting it before it would pop up. I then formatted it in Unraid and added it to the array. I thought everything was fine, then after a couple hours the drive becomes missing. It won't recognize it again. I have tried drive slots from both SAS bays.

I have been having this battle for days. After a format it shows up almost every time only to disappear when I try to perform any actions such as adding it to the array and starting a rebuild. The drive is recognized in the BIOS and shows up in UnRaid system devices. The SMART results seem fine with no pending or reallocated sectors and passes every test I have thrown at it when unraid would let me. I have also tested the drive multiple times on other computers without issue.

I looked at the system log and found this around the time that it disappeared after trying to do a rebuild on it:

Jul 23 13:52:22 R710 emhttp: err: ckmbr: read: Input/output error
Jul 23 13:52:22 R710 emhttp: ckmbr error: -1
Jul 23 13:52:22 R710 kernel: sd 2:0:5:0: Device offlined - not ready after error recovery
Jul 23 13:52:22 R710 kernel: blk_update_request: I/O error, dev sde, sector 0

Any help would be greatly appreciated.

r710-diagnostics-20170723-1432.zip

r710-syslog-20170723-1434.zip

JorgeB · July 23, 2017

Disk is not responding and it's being dropped:

Jul 23 13:33:40 R710 kernel: not responding...
Jul 23 13:33:40 R710 kernel: sd 2:0:2:0: attempting device reset! scmd(ffff8801a8c6a400)

...

Jul 23 13:33:40 R710 kernel: sd 2:0:2:0: Device offlined - not ready after error recovery
Jul 23 13:33:40 R710 kernel: sd 2:0:2:0: rejecting I/O to offline device

Try replacing cables or backplane, if it's the same then it's probably a bad disk.

Luke787 · July 23, 2017

The cables and backplane are shared with the other drives that work fine. If the disk was bad, wouldn't it have similar issues with other OS's?

JorgeB · July 24, 2017

It's a hardware problem, either cable/backplane or a bad disk, only only can do the testing to confirm.

Edited July 24, 2017 by johnnie.black

SSD · July 24, 2017

These types of issues are usually caused by a bad or loose sata cable. Even having one a tiny bit askew can cause intermittent problems. The OS will attempt to reset the link, which is sometimes successful and sometimes not. Rebooting often results in the drive being active again, only to succumb to the same series of events.

I'd carefully examine the cabling - both ends.

I very much recommend hot-swap style drive cages. Once the cabling is burned in, you don't have these types of issues every time you add or replace disks.

Luke787 · July 30, 2017

I have since tried switching the cables, and the Firecuda drive with another Firecuda drive. Same issue persisted. I switched to a different hard drive and absolutely no issues. I'm not sure if it's a firmware or driver issue or what at this point...

ndemou · July 3, 2018

I dropped by to note that I have very similar issues with this HDD model on a Dell Server with a PERC R330 controller running CentOS 7. I've tried four different drives and all of them are getting marked as faulty almost every time I power cycle the server. They were all working _mostly_ fine on a Gigabyte Mobo and they've been extensively tested using good old badblocks, SMART long tests and Seagate's utilities. I say "mostly" because every a few days I was getting "ata errror...hard resetting link" in the logs but the users of the server never complained.

My conclusions:

Some DELL controllers don't work with this model (ST2000LX001)
This model _may_ have one or more quirks.

New drive won't stay connected

Recommended Posts

Luke787

Link to comment

JorgeB

Link to comment

Luke787

Link to comment

JorgeB

Link to comment

SSD

Link to comment

Luke787

Link to comment

ndemou

Link to comment

Join the conversation