Jump to content

SMART Warning


johnje

Recommended Posts

Hello,

 

Just got a notification email this morning indicating the following:

 

Event: unRAID Disk 2 SMART health [187]

Subject: Warning [TOWER] - reported uncorrect is 1

Description: ST3000DM001-1CH166_W1F2VFMM (sdc)

Importance: warning

 

How concerned should I be at this point? Should I go purchase a new disk in case of failure?

Are there any checks / troubleshooting steps I can perform?

 

Thanks in advance!

Link to comment

I have attached a screenshot of the attribute screen for that disk.

I imagine Raw Read Error Rate & Seek Error Rate being so high is not a good thing.  :P

Curious why these 2 things would not get reported, both of these are 0 for all my other disks in this setup.

unraid.PNG.f3b872fcec5c67a9d4c68f270a342aad.PNG

Link to comment

I imagine Raw Read Error Rate & Seek Error Rate being so high is not a good thing.  :P

Curious why these 2 things would not get reported, both of these are 0 for all my other disks in this setup.

 

Seagate drives report garbage for those values, and are ignored.

Link to comment

If attribute 187 continues to increase I would question the drive's integrity.

I would suggest capturing md5 hashes of all files with md5sum or md5deep so you can validate the integrity going forward.

There are a number of tools on the forum (bitrot and bunker come to mind).

I would suggest that a backup procedure be implemented on this drive and/or running the manufacturers validation software.

 

 

http://en.wikipedia.org/wiki/S.M.A.R.T.

http://www.extremetech.com/computing/194059-using-smart-to-accurately-predict-when-a-hard-drive-is-about-to-die

https://www.backblaze.com/blog/hard-drive-smart-stats/

Link to comment

Thank you for the quick & concise replies!

All mission critical files were moved the moment I noticed this flag pop up.

I will begin capturing / comparing checksums of files on this disk.

I am replacing a few smaller disks in my setup anyways, so may pick up an extra drive in the event this one dies out.

Link to comment

If you have an GOOD extra disk the same size (or bigger) then I would immediately let unRAID rebuilt that disk onto the extra. Then, you can run tests on that disk to validate it's health. That could an indication of a bad sector which is a bad thing to have. If the drive checks out OK you can put it back into the array when the new one was going to go.

 

Link to comment

According to wikipedia, the meaning of that attribute is ...

 

The count of errors that could not be recovered using hardware ECC

 

I can only suspect that a second read was attempted and was successful. If not it would have generated a pending sector, because the SMART system would not just allow a known flawed read to the host OS without an error.

 

I would not freak over this incident. Comparing MD5 is a good idea. Running a NON-CORRECTING parity check or two to see if there are signs of the condition getting worse. None of the ideas presented are bad ideas, but I personally would not pull it from the array based on a single reported uncorrect with no apparent data corruption or further problems. I have a Seagate drive with a runtime_bad_block of 1 that has been rock solid ever since. I would put that in a similar category.

 

Don't like these onesy and twosy SMART attribute issues - don't buy Seagate. I have never seen these kinds of things with HGST and they outnumber Seagate drives in my array 10 to 1.

 

YMMV

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...