Failed Drive - check Logs Please


howiser1

Recommended Posts

Hi All,

 

Well I joined the club of a failed Drive this week.   I've read numerous post on how to proceed forward, so I'll post my server logs / diag and ask if someone with more experience could just review and provide any additional advice.

 

The failed drive is #3  ST4000DM004-2CV104_ZFN00H90 - 4 TB (sdf).   Yes it's a Seagate... I read plenty of bad stuff about these already.  BUT they have been working fine.   Note the Raw_Read_Error_Rate is pretty high on the SMART logs...  However there doesn't seem to be a definitive answer on whether this values is "bad" or suggest pre-fail.   SMART passes and others say you can just ignore this???   It is interesting that all my other disk are ZERO for this value.

 

I've been using UNRAID for about a year now and it's been working great in this configuration.

 

Here's what I think happened, we had a power outage.   I do have the server on a UPS.  But maybe it didn't shut down in time and there was a write error to Drive 3???    Thus -- failed.   This occurred right after the power outage.   However I don't have the old Syslog to determine exactly what happened.   I looked through this syslog and can't find any clues.   (I've since adjusted the shutdown timers so there will be more time to shutdown) 

I don't think the drive was just going along and failed.... nor is it a bad cable, controller, etc.   Really thinking it was the power outage.  Just looking to confirm the logs don't point to something else.  

 

For now I've copied the data from the emulated drive.   Then I've removed the disk and put it back in service.... it's rebuilding; interested to see if it fails during this process or if any of the SMART "bad" values go up...   12+ hours to rebuild.  

 

 

blackbox-diagnostics-20180914-1733.zip

Link to comment

Unfortunately diagnostics are after reboot so we can't see what happened, but the disk looks perfectly fine, Raw_Read_Erros_Rate is a multibyte attribute on these and it can't be read directly, the value you see represents the total number of reads, actual errors are still at zero.

 

Rebuild to same disk or to a spare if you have one to play it safer in case something goes wrong.

Link to comment

Thanks @johnnie.black   Appreciate the explanation of Raw_Read_Errors_Rate.   

 

It is frustrating that unRAID hasn't come up with a better way to preserve the Syslogs... like also writing to a cached drive.  Anything you've come across to do this other than a Syslog server? (which I'm thinking about).

 

As I mentioned I copied the emulated data, just in case, to other drives on the array.   I've added the failed disk back to the array and it is currently rebuilding.

 

Thanks again for your prompt response.

Link to comment
2 hours ago, howiser1 said:

As I mentioned I copied the emulated data, just in case, to other drives on the array.

A common reaction, but this isn't necessarily the best approach. Rebuilding is the way to get your array protected again. Copying to other drives in the array is a lot of extra activity on an unprotected array since it has to emulate the failed disk by reading all drives in order to get the data to be copied. And then more activity to write the data to the array and update parity.

 

If you absolutely must copy some data from the emulated disk, I recommend copying it to an Unassigned Device or, if not much data, another computer on the network. That way you only have to do the reads and no writing, which means parity also isn't touched before the rebuild.

 

But of course, the best approach is to have good backups of anything important and irreplaceable so you aren't overly concerned when something like this happens.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.