Luke787 Posted July 23, 2017 Share Posted July 23, 2017 I am currently running the trial version of UnRaid on my R710 for a few weeks, and have been pretty happy so far. However, I bought a brand new 2 TB drive on prime day (Seagate ST2000LX001). I was going to add it to my array of two 4 TB disks but things aren't working out as smoothly as the previous drives. At first when I went to perform anything on the drive, it disappeared from unassigned devices. I switched the slot it was in just in case that was an issue and it appeared. I ran a preclear then afterwards was unable to use it. Unraid wouldn't let me run a smart test as it had to be spun up. However it wouldn't spin up. Everything was greyed out. After reset, it wouldn't even show up. I ended up removing the drive and formatting it outside of the unraid box and reinserting it before it would pop up. I then formatted it in Unraid and added it to the array. I thought everything was fine, then after a couple hours the drive becomes missing. It won't recognize it again. I have tried drive slots from both SAS bays. I have been having this battle for days. After a format it shows up almost every time only to disappear when I try to perform any actions such as adding it to the array and starting a rebuild. The drive is recognized in the BIOS and shows up in UnRaid system devices. The SMART results seem fine with no pending or reallocated sectors and passes every test I have thrown at it when unraid would let me. I have also tested the drive multiple times on other computers without issue. I looked at the system log and found this around the time that it disappeared after trying to do a rebuild on it: Jul 23 13:52:22 R710 emhttp: err: ckmbr: read: Input/output error Jul 23 13:52:22 R710 emhttp: ckmbr error: -1 Jul 23 13:52:22 R710 kernel: sd 2:0:5:0: Device offlined - not ready after error recovery Jul 23 13:52:22 R710 kernel: blk_update_request: I/O error, dev sde, sector 0 Any help would be greatly appreciated. r710-diagnostics-20170723-1432.zip r710-syslog-20170723-1434.zip Quote Link to comment
JorgeB Posted July 23, 2017 Share Posted July 23, 2017 Disk is not responding and it's being dropped: Jul 23 13:33:40 R710 kernel: not responding... Jul 23 13:33:40 R710 kernel: sd 2:0:2:0: attempting device reset! scmd(ffff8801a8c6a400) ... Jul 23 13:33:40 R710 kernel: sd 2:0:2:0: Device offlined - not ready after error recovery Jul 23 13:33:40 R710 kernel: sd 2:0:2:0: rejecting I/O to offline device Try replacing cables or backplane, if it's the same then it's probably a bad disk. Quote Link to comment
Luke787 Posted July 23, 2017 Author Share Posted July 23, 2017 The cables and backplane are shared with the other drives that work fine. If the disk was bad, wouldn't it have similar issues with other OS's? Quote Link to comment
JorgeB Posted July 24, 2017 Share Posted July 24, 2017 (edited) It's a hardware problem, either cable/backplane or a bad disk, only only can do the testing to confirm. Edited July 24, 2017 by johnnie.black Quote Link to comment
SSD Posted July 24, 2017 Share Posted July 24, 2017 These types of issues are usually caused by a bad or loose sata cable. Even having one a tiny bit askew can cause intermittent problems. The OS will attempt to reset the link, which is sometimes successful and sometimes not. Rebooting often results in the drive being active again, only to succumb to the same series of events. I'd carefully examine the cabling - both ends. I very much recommend hot-swap style drive cages. Once the cabling is burned in, you don't have these types of issues every time you add or replace disks. Quote Link to comment
Luke787 Posted July 30, 2017 Author Share Posted July 30, 2017 I have since tried switching the cables, and the Firecuda drive with another Firecuda drive. Same issue persisted. I switched to a different hard drive and absolutely no issues. I'm not sure if it's a firmware or driver issue or what at this point... Quote Link to comment
ndemou Posted July 3, 2018 Share Posted July 3, 2018 I dropped by to note that I have very similar issues with this HDD model on a Dell Server with a PERC R330 controller running CentOS 7. I've tried four different drives and all of them are getting marked as faulty almost every time I power cycle the server. They were all working _mostly_ fine on a Gigabyte Mobo and they've been extensively tested using good old badblocks, SMART long tests and Seagate's utilities. I say "mostly" because every a few days I was getting "ata errror...hard resetting link" in the logs but the users of the server never complained. My conclusions: Some DELL controllers don't work with this model (ST2000LX001) This model _may_ have one or more quirks. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.