HELP: DISK ERROR - DRIVE DISABLED


Recommended Posts

Hi smart folks

 

I'm a dumb medical professional, requesting you people's help!

 

Unraid 6.9.2

 

Set up:  Seagate 6 TB drive X 3 + WD 4 TB x 1 + a cache SSD.  One 6TB drive is the parity drive.  The drive disabled is "Disk3" the most recent drive that I added to the server.

 

Woke up this morning and one 6TB drive is "disabled" and in "emulation mode".  I can read files that are on that drive but I'm assuming "emulation mode" means it's because of the parity drive.  I don't know enough about any of this so please bear with my ignorance.  I looked around some of the posts and here's what I did

 

The disk "error log" is posted below.  The diagnostics is also posted below.  This happened once before months ago and at that time I inadvertently rebooted the system BEFORE collecting the diagnostics file but this time didn't do so (thanks to you guys reminding me).  At that time basically the "parity backup" kicked in and the disk was reconstructed and we could never figure out why this happened.  

 

What do I do now?  Is the disk gone bad?  This is a fairly NEW drive so I'm going to be pissed if it's a disk hardware error

 

Many thanks in advance!  

 

Snag_1f300378.thumb.png.cd1d8ca4700d5fa9c311cfc002c40eec.pngSnag_1f304052.thumb.png.9b87b52176b175235fea385311ac851f.png

klingon-diagnostics-20210620-0924.zip

Link to comment
2 hours ago, ptcadoc said:

Thank you so much for replying.  What do you suggest I do?  Pull out the cables and reconnect?

Thanks in advance

There is not much I can think off other than powering off, checking cable; rebooting to check drive comes back online and then rebuilding the drive.   You should post the diagnostics after the reboot so that we can check the SMART information for the drive.    You might want to also consider running an extended SMART test on the drive.

 

I was hoping someone else would have a ‘flash of insight’ as to what went wrong as not knowing that means it could easily happen again.

Link to comment
3 hours ago, itimpi said:

There is not much I can think off other than powering off, checking cable; rebooting to check drive comes back online and then rebuilding the drive.   You should post the diagnostics after the reboot so that we can check the SMART information for the drive.    You might want to also consider running an extended SMART test on the drive.

 

I was hoping someone else would have a ‘flash of insight’ as to what went wrong as not knowing that means it could easily happen again.

 

 

Thank you.  Will wait for other inputs and if nothing else is posted, do as you suggested.

 

Link to comment

It's a failure of the SATA link.

 

Jun 19 22:43:30 Klingon kernel: ata8: limiting SATA link speed to 3.0 Gbps
Jun 19 22:43:30 Klingon kernel: ata8: hard resetting link
Jun 19 22:43:35 Klingon kernel: ata8: softreset failed (1st FIS failed)
Jun 19 22:43:35 Klingon kernel: ata8: reset failed, giving up

 

It's most likely a cable/connector problem because plugs and sockets are the least reliable link in the controller - cable - drive electronics chain. The cable is fortunately also the cheapest part to replace.

 

Link to comment
12 minutes ago, John_M said:

It's a failure of the SATA link.

 


Jun 19 22:43:30 Klingon kernel: ata8: limiting SATA link speed to 3.0 Gbps
Jun 19 22:43:30 Klingon kernel: ata8: hard resetting link
Jun 19 22:43:35 Klingon kernel: ata8: softreset failed (1st FIS failed)
Jun 19 22:43:35 Klingon kernel: ata8: reset failed, giving up

 

It's most likely a cable/connector problem because plugs and sockets are the least reliable link in the controller - cable - drive electronics chain. The cable is fortunately also the cheapest part to replace.

 

 

 

OK thank you so you're suggesting the same thing - power down, reseat the cables and see what happens?

Link to comment

Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known.

Link to comment
6 hours ago, John_M said:

Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known.

 

 

OK thank you - will do and report back.  Thank you very much for the input guys

Link to comment
On 6/21/2021 at 2:55 AM, John_M said:

Power down, reseat (or preferably replace) the SATA cable and check the power cable to the drive while you're there, power up and grab new diagnostics, which will give an indication of the health of the drive, which is currently unknown, as @itimpi pointed out. The disk will remain disabled and emulated and requires manual intervention but not until the state of the drive is known.

 

 

Dear Smart People

 

OK I went out and bought new SATA cable for the drive that seemed to be on holiday.  Did a shut down, replaced the cable and then turned it on.  Got this message immediately (see below) and then, as suggested, ran a second diagnostic run (attached).  The drive is still 'disabled'.  Will be grateful for suggestions

 

image.png.cb982781d443ef0e0434e1bca69b7108.png

klingon-diagnostics-20210623-1519.zip

Link to comment
5 hours ago, JorgeB said:

 

 

 

Thank you that's great to know.  Here's what happened 

 

1 . What I then did was stop the array then make Disk 3 "unassigned" → restart the array then STOP again and went back to Disk 3 and reassigned it to the disk that was there.  See image

  image.png.ff4b29b0708941e2d72947c43d42b29d.png

 

The system gave out a few error messages (which I believe are expected) and the "rebuild" started. 

 

image.png.04f7f5be9eef2279ed7eb0e864c0a65b.png

image.png.7aacdc6a45bda4fc4e52801cdba5d8a1.png

 

Then pretty soon a series of error messages flashed and the "rebuild" was PAUSED

image.png.7f9fc7d9d3135b3c1f77d232fb5c783f.png

 

I stopped the array again and then went to Disk 3 and that disk was no longer there

 

Then I turned the array off, powered down, changed the SATA port to a different one and rebooted.  Turned the array off and behold, that missing disk was back.  Reassigned it to Disk 3, turned array on and got some confirmatory messages (see below)

 

I took another set of diagnostics (attached).   Now the array "rebuild" restarted and so far has been running for the last 4 hours 

image.png.dd9ee1d2c8e6dc4935b32f6c245ef19a.png

What is going on?  Is my SATA card bad?  We can be sure that the disk is not dying right?

 

Really REALLY appreciate the input and help

 

image.png

klingon-diagnostics-20210623-1552.zip

Link to comment
41 minutes ago, JorgeB said:

Disk dropped offline again, this is usually a connection/power problem, if you already replaced the SATA cable replace the power cable.

 

 

Thank you for the reply.  It's "rebuilding" right now (at 49%) - I'm guessing I should wait and see, let it complete (if it does) then change the power cable?  Or interrupt the process now?  

 

Thank you 

Link to comment

Meanwhile, can I ask another stupid question? Do recurrent "rebuild" events shorten the life of the hard drive?  I'm referring (I guess) to repeated long writes to the disk as (as far as I understand) the rebuild process involves the entire drive to be rewritten from the backup copy (parity).  This is the second time this has happened.  

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.