Hard drive health issues - disabled and SMART error count


Recommended Posts

I've got two devices that are experiencing issues and I'm not familiar with how to read and evaluate SMART reports.  I've attached the diagnostics information.

 

 ST3000DM001-9YN166_W1F0WCE7-20200706-2337 disk3 (sdd) - Currently disabled

ST3000DM001-9YN166_W1F16VWS-20200706-2337 disk2 (sdc) - Currently showing a thumbs down for SMART column in Array dashboard table with a hover note of "UDMA CRC error count: 1"

tower-diagnostics-20200706-2337.zip

Link to comment

From the SMART information I would say:

  • Disk2 is probably fine.   CRC errors are connection related and are typically related to the power/SATA cabling to the drive rather that the drive itself.   Occasional errors are not really of concern but regular ones mean you should carefully check the cabling.   The point to note is that the CRC errors count never gets reset, so if you click on the orange icon for the drive on the Dashboard and select the Acknowledge option Unraid will then only tell you about it again if it changes.
  • disk3 looks as if it may be on the way out.    It has lots of reallocated sectors and pending sectors.    You could run the extended SMART test to confirm it is not well.

the syslog in the diagnostics were taken just after a reboot so there is nothing to show what lead up to the current situation.

Link to comment
13 minutes ago, itimpi said:

From the SMART information I would say:

  • Disk2 is probably fine.   CRC errors are connection related and are typically related to the power/SATA cabling to the drive rather that the drive itself.   Occasional errors are not really of concern but regular ones mean you should carefully check the cabling.   The point to note is that the CRC errors count never gets reset, so if you click on the orange icon for the drive on the Dashboard and select the Acknowledge option Unraid will then only tell you about it again if it changes.
  • disk3 looks as if it may be on the way out.    It has lots of reallocated sectors and pending sectors.    You could run the extended SMART test to confirm it is not well.

the syslog in the diagnostics were taken just after a reboot so there is nothing to show what lead up to the current situation.

 

Thanks for the reply and helpful info!

 

I'll check the cable on disk2 and/or replace it with another one.  I have acknowledged the notification and will monitor to see if/when it happens again and keep checking the frequency if it repeatedly happens.

 

I tried running the short and long tests for disk3, but it won't get past "self-test in progress, 10% complete" before showing "Errors occurred - Check SMART report".  I was able to have it fully run through the short test but the same error message appeared afterwards.

 

I only found out about this stuff when I went to perform a reboot and it wouldn't boot into the OS.  After I finally got it up to the point I could get into the UI, I saw that the one disk was disabled and the investigation started.

Link to comment
1 hour ago, trurl said:

On mobile now so haven't looked at Diagnostics, but from what has already been said you need to replace that disk. Of course that is the whole point of parity, rebuilding the content of a failed disk to a new one. Do you know how to proceed? 

I have found the wiki article for "Replacing a Data Drive".  This is going to turn into a bigger project though as my parity drive is only 4TB so I'm going to want to swap in something much larger there so I can then swap in some larger data drives.  Is there a recommended order of operations on how to replace both a dead data drive as well as the parity drive?  I'm not sure if I should replace the data drive first (temporary), then the parity drive, then replace the data drive again with the larger one or if there's an easier way.

 

Edit:  It sounds like this wiki article will walk me through doing this "drive shuffle", correct?
https://wiki.unraid.net/The_parity_swap_procedure

Edited by snowborder714
Link to comment
5 hours ago, trurl said:

Could you post a new diagnostic with the array started? Just want to make sure everything is mountable, including the emulated disk3.

 

In the meantime and while waiting for further response, read this:

 

https://wiki.unraid.net/The_parity_swap_procedure

 

 

Here is a new diagnostic after starting the array.

 

Yup, that's the same link I had found - thanks for the confirmation!

tower-diagnostics-20200707-1448.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.