Jump to content

[SOLVED] Hard drive disabled after hard power shut off


Recommended Posts

I've just recently run into an issue with a hard drive being disabled on my Unraid server that I was hoping someone could help me out with. A few days ago my server was shut down abruptly because the surge protector was removed from the wall. I only noticed recently. After turning the server back on, unraid started doing a parity check. At around 10% of the way through the parity check, I received errors saying that the UDMA crc error count for one of my drives was now 6 and then shortly after that the disk was in an error state and taken offline.

  

Attached are the syslogs from after I received the errors. It did not contain the SMART reports for the drive in question. I rebooted and obtained the SMART report for the drive (thankfully), also attached.

 

My first impression after doing some searching and reading the wiki is that it may simply be a benign error and that I do not need to replace the drive. I've read that this particular error may be caused by a cabling issue. I unfortunately won't be able to check the cabling or replace the SATA cable for a handful of days because I am not near the server. Out of an abundance of caution, I have turned off the array. I am also worried about the errors that show up at the end of the SMART log in the section for the most recent errors. I can't make heads or tails for what they mean or whether I should be worried.

 

My main questions then are:

1. After looking at the syslog and smart files, how bad is the drive? Did shutting it off ungracefully mess it up permanently?

2. Should I prepare for an impending hard drive failure by purchasing another drive right away to replace it?

3. Can I turn the array back on and spin the disk back up without endangering the data on the drive? Should I wait to do so until after I can check that the cables are okay?

 

---

 

I am running Unraid 6.8.3 with two 10 TB WD Easystores in the array, one parity one the main drive, along with an old Samsung 840 EVO SSD as a cache drive. I have a WD black as an NVME drive that I pass through to windows VM which was likely running at the time of the hard power off. I run an assortment of dockers, with most of the array action happening from qbittorrent, sonarr, and radarr. From what I can tell of when the server went offline, my guess is that there was no data being actively/newly written to the array.

tower-diagnostics-20200819-2342.zip tower-smart-20200819-2348.zip

Edited by sticke4
Marked topic solved
Link to comment

I finally got back around to this. I came back, turned off the server, and checked the SATA cables. They did look like they had gotten unseated so I simply put them back into the hard drive. I decided against replacing the cables since I figured it was just a reseating issue. After turning the server back on I ran an extended SMART test (which took nearly 24 hours) and then ran the parity rebuild (which took another 24 hours). The server seems to be running fine now. I have kept an eye on the UDMA crc error count for the drive in question and it hasn't increased in the last few days of the server running.

 

Thanks for all the help.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...