Jump to content

Do I need to replace a drive?


tnorman

Recommended Posts

Posted

Hello I'm fairly new to Unraid so I just wanted to check with those more knowledgeable to figure out what might be going on. I think I may need to replace one the the drives in my Unraid system. I'm seeing disk errors reported for one of the disks in the display. 42 thousand right now. I went and checked the Unraid system after having problems with my backup software losing track of the Unraid Tower during a backup. It kept going offline and then online according to the software (GoodSync). (I do realize that issue may not be harddrive related but instead might be related to a network issue. It's what prompted me to open up the Unraid menu.)

 

I've included the syslog and the SMART report for the drive in question. The other two drives in the system don't seem to be reporting any disk errors nor SMART errors.

 

Should i swap out the hard drive in question or is there something else I should do? I'm guessing swapping out is the route to take.

syslog-2011-11-27.zip

sdc.txt

Posted

The G-Sense errors relate to physical shock sustained by the drive.  I have no idea what the Program_Fail means.

 

However, the UDMA_CRC errors almost certainly relate to a bad connection/cabling.  Try re-seating the cables, swapping the cables, moving the drive onto another controller port, etc.

Posted

Thank you. Fixed the error issue after moving the drive in the system and connecting it to a different power wire and sata wire. I was using a long sata wire that was being stretched to it's limits. Must have just been stretching it a bit too far.

Posted

Well that lasted for about an hour. I decided to run a parity check and it was running smooth for about an hour. No errors at all. Then I went to check how the progress was going and it shows 576 errors on that drive and a little little red circle. :( I'm checking the connections again. Maybe me stomping around vibrated something out of place.

 

Posted

And as a follow up, I have no freakin clue what is going on. I moved the sata cables around and the power cables. Nothing should have been causing a problem as far as I can tell. Nothing appeared loose. When I plugged everything back together and started the system up, the drive that was having issue is no longer being listed as a part of the array. I restarted it a couple times to see if it would fix itself but with no luck. I had to put the drive back into the array, and it came up as a blue icon meaning it thinks it wasn't part of the original array. So now I've started a rebuild of the array. I don't get what it going on but I'm wondering if there isn't a larger issue at play. Thankfully everything that is important on the unraid system is backed up elsewhere if this all crashes and burns.

 

Posted

I looked over your syslog and SMART report.  Unless PeterB knows something about the G-Sense and Program_Fail SMART attributes that I don't, I believe those values can be safely ignored.  Neither one has budged from the default position, and neither is anywhere close to the threshold amount.  You can ignore the RAW values, those are only meaningful to the drive manufacturers.  You need to look at the Value and Worst numbers and compare them to the Thresh number to evaluate the health of the drive.  If either the Value or Worst starts approaching the Thresh, then the drive is likely going bad.  I don't see any red flags on this drive, it looks healthy to me.

 

The CRC errors that your syslog is full of are a definite issue.  As PeterB already said, these errors are caused by bad SATA cabling.  A stretched SATA cable could certainly be the culprit.  Since you've already moved the drive to a new location and are using a new SATA cable, hopefully you've eliminated that problem.

 

Everything you described in the previous post is normal unRAID behavior.  When the server was powered down, you move things around so that the drives are connected to different ports and using different cables.  When you powered back up, unRAID attempted to assign each drive to the disk assignment that it thought was correct (parity, disk1, disk2, etc.) based on the drive's serial number.  In some cases it doesn't get each one perfectly right, which is likely what happened to you.  Before starting the array, you should have manually assigned each drive to the correct disk assignment on the devices page.  Instead, you restarted several times, making unRAID forget about any unassigned drives, in this case your Samsung drive.  When you finally did assign the Samsung into its correct spot again, unRAID thought it was a new drive (hence the Blue dot).  You have now started the array, allowing unRAID to reconstruct the drive's data onto itself (unRAID has no knowledge that the data it is reconstructing is already on the drive).  All of this is completely normal given the steps you have taken.  The data rebuild could have been avoided by assigning the drive to the correct disk slot manually before restarting several times.  However, since it is already underway, just let it finish and don't worry about it.  Once the data rebuild finishes, run a parity check to make sure no new errors have cropped up.

 

If you continue to see CRC errors in your syslog, then there's a good chance you have a bad SATA cable somewhere in the mix.  Buy replacement cables (monoprice.com is a good source) and continue testing.

Posted

Thank you Rajahal. Everything seems to be working fine now. No CRC errors to report. I moved the drive lower in the box and used a shorter Sata cable and everything seems to be running fine. I will run a parity check just to be sure.

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...