HKR Posted December 2, 2015 Share Posted December 2, 2015 Hello, I have been lately having this problem with one of my Disks, it keeps getting disabled. Just a few hours ago it was disabled, then i stopped the array, re-started it and then the data re-build was completed just fine. Then after a few hours the same thing happend and this disk is again disabled. Upon checking the SMART status it shows everything as fine for that disk. Any clues as to what could be causing this? This is the disk log: Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB) Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] 4096-byte physical blocks Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Write Protect is off Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Mode Sense: 00 3a 00 00 Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 2 13:33:53 Tower kernel: sdi: sdi1 Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Attached SCSI disk Dec 2 13:34:07 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584 Dec 2 13:34:07 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532 Dec 2 13:34:08 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584 Dec 2 13:34:08 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532 Dec 2 13:34:08 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584 Dec 2 13:34:08 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532 Dec 2 13:34:12 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584 Dec 2 13:34:12 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532 Dec 2 13:34:12 Tower emhttp: shcmd (29): /usr/local/sbin/set_ncq sdi 1 &> /dev/null Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06 Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 78 0f 67 a8 00 00 04 00 00 00 Dec 2 18:13:29 Tower kernel: blk_update_request: I/O error, dev sdi, sector 6309242792 Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 78 0f 6b a8 00 00 04 00 00 00 Dec 2 18:13:29 Tower kernel: blk_update_request: I/O error, dev sdi, sector 6309243816 Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Read Capacity(16) failed: Result: hostbyte=0x04 driverbyte=0x00 Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Sense not available. Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Read Capacity(10) failed: Result: hostbyte=0x04 driverbyte=0x00 Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Sense not available. Dec 2 18:13:29 Tower kernel: sdi: detected capacity change from 4000787030016 to 0 Quote Link to comment
Squid Posted December 2, 2015 Share Posted December 2, 2015 You really should post the complete diagnostics rather than just snippets. But, if smart looks good you either down to HBA / Cabling issue / Power Issues Quote Link to comment
HKR Posted December 2, 2015 Author Share Posted December 2, 2015 When i checked now there are actually 3 disk showing some or the other SMART errors. One of them says Command Timed out:1 Other: Reported Uncorrect: 1 I am trying to get the complete log, however when i click the log button it opens a new window but nothing will show up on the window. it just keeps saying" waiting for xxx.xxx.xxx.xxx" The page does not load. Quote Link to comment
Squid Posted December 2, 2015 Share Posted December 2, 2015 Tools / Diagnostics then post the file Quote Link to comment
HKR Posted December 2, 2015 Author Share Posted December 2, 2015 Attached is the requested file. tower-diagnostics-20151202-2001.zip Quote Link to comment
HKR Posted December 3, 2015 Author Share Posted December 3, 2015 I connected the Disk in question to my PC and ran a couple of HDD test app's and they all showed the HDD as failed. I have attached the screenshots for the same. I guess i have to RMA the drive now. My question now is, currently i have stopped the array and shutdown the server. However when i connect this disk to my PC, it gets detected and shows up under the 'Disk management' app but no actual partition/data content is shown for that disk under 'My Computer' so it leaves me with no way to currently backup the data on this drive (just as a safe side, i know unRaid would re-built the data when the new disk is installed) Is there any way i can still backup the data? - If i want to continue using the unRaid server while i wait for the replacment disk to arrive, can i continue doing so or will it cause any data loss? - If a parity check is run without this Disk installed will it cause any data loss? - Should i just select 'No disk" and start the array? Screenshots: http://i67.tinypic.com/2j4ofu0.png http://i67.tinypic.com/6y08qr.png Thanks Quote Link to comment
trurl Posted December 3, 2015 Share Posted December 3, 2015 It is impossible to run a parity check with a disabled or missing disk. See the wiki for how parity works and if you understand it a lot of these sorts of questions answer themselves. If you start the array, whether without the disk or with it disabled or with it unselected, unRAID should emulate it. This means you can read the disks data from the parity array even without the disk. That is how it is able to rebuild it. However, none of the other disks have parity protection until that disk is rebuilt. Do you have backups of anything you consider irreplacable? Quote Link to comment
HKR Posted December 4, 2015 Author Share Posted December 4, 2015 I have about half the data backed-up from the damaged disk. Since you mentioend parity check cannot be run on a damaged disk, does that mean that the data which i had copied recently (around 1.5TB) on that disk is un-safe? Because a parity check was not run since i copied the data. Also, will the data get re-built after inserting the new disk or no? Quote Link to comment
HKR Posted December 4, 2015 Author Share Posted December 4, 2015 *Update* I just bought a new Disk to replace the old damaged one, should i just plug it in the slot of the old one and let the data re-built? Once again to my question in the old post, will the complete data re-built even if a parity check was not run after moving new data onto the array? Quote Link to comment
itimpi Posted December 4, 2015 Share Posted December 4, 2015 *Update* I just bought a new Disk to replace the old damaged one, should i just plug it in the slot of the old one and let the data re-built? Once again to my question in the old post, will the complete data re-built even if a parity check was not run after moving new data onto the array? Yes - as long as all the other disks plus parity are error free then the replacement disk will include the new data after the rebuild. When you write data to the array with a disk disabled, then the parity is updated to what it would be if the disk HAD been present and successfully written to. This is done by first reading all the other data disks and the parity disk to calculate what the contents of the disabled disk "should be" and then making the appropriate changes to the parity disk to reflect what should be on the disabled disk after the new change. The need to involve all the other disks in this process is why you can only handle a single disk being faulty at the same time and not lose data. A side-effect of this is that when you write to the array with a disk disabled you will find that all the other drives are spun up as they are now all involved in the steps required for a successful write. Quote Link to comment
HKR Posted December 4, 2015 Author Share Posted December 4, 2015 Ok, thanks a lot. From the screenshot of the disk errors i had attached in 2 post above can you understand the nature of the error? Is it a bad sector ? A bit off-topic, i am one of those members who are facing the slow parity check issues with the SAS2LP controllers. I read somewhere on the forums about a plugin called tuntables, would this really help me as well? Quote Link to comment
itimpi Posted December 4, 2015 Share Posted December 4, 2015 Ok, thanks a lot. From the screenshot of the disk errors i had attached in 2 post above can you understand the nature of the error? Is it a bad sector ? I do not think you can ever easily tell why a modern disk has problems. If it was just a bad sector, then it should be remapped by the firmware to one of the spare sectors and marked as reallocated. I always just assume that if the manufacturer's diagnostic software reports problems then a disk needs replacing. A bit off-topic, i am one of those members who are facing the slow parity check issues with the SAS2LP controllers. I read somewhere on the forums about a plugin called tuntables, would this really help me as well?Have you done any speed tests with the latest releases? The speed issue is now thought to be largely resolved. there is a tunables utility available in the User Customizations section of the forum but it is designed to be run from the command line and not as a plugin. Whether running it provides tuneable values to give any useful performance gain seems to be variable depending on each users exact system. Quote Link to comment
skyhawk Posted December 4, 2015 Share Posted December 4, 2015 Regarding the initial issue, I had the same problem last week. Ended up being a bad electric connector to the drive. Never had one fail before. I had swapped sata connectors, sata ports etc, and then my backup drive had the same failure. So, if smart is good but you are getting disabled, check your other hardware. Quote Link to comment
HKR Posted December 4, 2015 Author Share Posted December 4, 2015 I just noticed in the 'Dashboard' under the SMART status of 4 Disk it all shows a "Command Timeout:1 " error. What could this because off? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.