Disk Disabled!


HKR

Recommended Posts

Hello,

 

I have been lately having this problem with one of my Disks, it keeps getting disabled. Just a few hours ago it was disabled, then i stopped the array, re-started it and then the data re-build was completed just fine. Then after a few hours the same thing happend and this disk is again disabled.

 

Upon checking the SMART status it shows everything as fine for that disk.

 

Any clues as to what could be causing this?

 

This is the disk log:

 

Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] 4096-byte physical blocks
Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Write Protect is off
Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Mode Sense: 00 3a 00 00
Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 2 13:33:53 Tower kernel: sdi: sdi1
Dec 2 13:33:53 Tower kernel: sd 1:0:3:0: [sdi] Attached SCSI disk
Dec 2 13:34:07 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584
Dec 2 13:34:07 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532
Dec 2 13:34:08 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584
Dec 2 13:34:08 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532
Dec 2 13:34:08 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584
Dec 2 13:34:08 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532
Dec 2 13:34:12 Tower emhttp: WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT (sdi) 3907018584
Dec 2 13:34:12 Tower kernel: md: import disk5: [8,128] (sdi) WDC_WD40EZRX-00SPEB0_WD-WCC4E4XV65XT size: 3907018532
Dec 2 13:34:12 Tower emhttp: shcmd (29): /usr/local/sbin/set_ncq sdi 1 &> /dev/null
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 78 0f 67 a8 00 00 04 00 00 00
Dec 2 18:13:29 Tower kernel: blk_update_request: I/O error, dev sdi, sector 6309242792
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 78 0f 6b a8 00 00 04 00 00 00
Dec 2 18:13:29 Tower kernel: blk_update_request: I/O error, dev sdi, sector 6309243816
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Read Capacity(16) failed: Result: hostbyte=0x04 driverbyte=0x00
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Sense not available.
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Read Capacity(10) failed: Result: hostbyte=0x04 driverbyte=0x00
Dec 2 18:13:29 Tower kernel: sd 1:0:3:0: [sdi] Sense not available.
Dec 2 18:13:29 Tower kernel: sdi: detected capacity change from 4000787030016 to 0

Link to comment

When i checked now there are actually 3 disk showing some or the other SMART errors. One of them says

Command Timed out:1

 

Other:

Reported Uncorrect: 1

 

I am trying to get the complete log, however when i click the log button it opens a new window but nothing will show up on the window. it just keeps saying" waiting for xxx.xxx.xxx.xxx"

 

The page does not load.

 

 

Link to comment

I connected the Disk in question to my PC and ran a couple of HDD test app's and they all showed the HDD as failed. I have attached the screenshots for the same.

 

I guess i have to RMA the drive now.

 

My question now is, currently i have stopped the array and shutdown the server. However when i connect this disk to my PC, it gets detected and shows up under the 'Disk management' app but no actual partition/data content is shown for that disk under 'My Computer' so it leaves me with no way to currently backup the data on this drive (just as a safe side, i know unRaid would re-built the data when the new disk is installed) Is there any way i can still backup the data?

 

- If i want to continue using the unRaid server while i wait for the replacment disk to arrive, can i continue doing so or will it cause any data loss?

 

- If a parity check is run without this Disk installed will it cause any data loss?

 

- Should i just select 'No disk" and start the array?

 

Screenshots:

http://i67.tinypic.com/2j4ofu0.png

http://i67.tinypic.com/6y08qr.png

Thanks

Link to comment

It is impossible to run a parity check with a disabled or missing disk. See the wiki for how parity works and if you understand it a lot of these sorts of questions answer themselves.

 

If you start the array, whether without the disk or with it disabled or with it unselected, unRAID should emulate it. This means you can read the disks data from the parity array even without the disk. That is how it is able to rebuild it.

 

However, none of the other disks have parity protection until that disk is rebuilt.

 

Do you have backups of anything you consider irreplacable?

Link to comment

I have about half the data backed-up from the damaged disk. Since you mentioend parity check cannot be run on a damaged disk, does that mean that the data which i had copied recently (around 1.5TB) on that disk is un-safe? Because a parity check was not run since i copied the data.

 

Also, will the data get re-built after inserting the new disk or no?

Link to comment

*Update*

 

I just bought a new Disk to replace the old damaged one, should i just plug it in the slot of the old one and let the data re-built?

 

Once again to my question in the old post, will the complete data re-built even if a parity check was not run after moving new data onto the array?

Link to comment

*Update*

 

I just bought a new Disk to replace the old damaged one, should i just plug it in the slot of the old one and let the data re-built?

 

Once again to my question in the old post, will the complete data re-built even if a parity check was not run after moving new data onto the array?

Yes - as long as all the other disks plus parity are error free then the replacement disk will include the new data after the rebuild. 

 

When you write data to the array with a disk disabled, then the parity is updated to what it would be if the disk HAD been present and successfully written to.  This is done by first reading all the other data disks and the parity disk to calculate what the contents of the disabled disk "should be" and then making the appropriate changes to the parity disk to reflect what should be on the disabled disk after the new change.  The need to involve all the other disks in this process is why you can only handle a single disk being faulty at the same time and not lose data.

 

A side-effect of this is that when you write to the array with a disk disabled you will find that all the other drives are spun up as they are now all involved in the steps required for a successful write.

Link to comment

Ok, thanks a lot. From the screenshot of the disk errors i had attached in 2 post above can you understand the nature of the error? Is it a bad sector ?

 

A bit off-topic, i am one of those members who are facing the slow parity check issues with the SAS2LP controllers. I read somewhere on the forums about a plugin called tuntables, would this really help me as well?

Link to comment

Ok, thanks a lot. From the screenshot of the disk errors i had attached in 2 post above can you understand the nature of the error? Is it a bad sector ?

I do not think you can ever easily tell why a modern disk has problems.  If it was just a bad sector, then it should be remapped by the firmware to one of the spare sectors and marked as reallocated. I always just assume that if the manufacturer's diagnostic software reports problems then a disk needs replacing.

 

A bit off-topic, i am one of those members who are facing the slow parity check issues with the SAS2LP controllers. I read somewhere on the forums about a plugin called tuntables, would this really help me as well?
Have you done any speed tests with the latest releases?  The speed issue is now thought to be largely resolved.

 

there is a tunables utility available in the User Customizations section of the forum but it is designed to be run from the command line and not as a plugin.  Whether running it provides tuneable values to give any useful performance gain seems to be variable depending on each users exact system.

Link to comment

Regarding the initial issue, I had the same problem last week. Ended up being a bad electric connector to the drive. Never had one fail before. I had swapped sata connectors, sata ports etc, and then my backup drive had the same failure. So, if smart is good but you are getting disabled, check your other hardware.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.