SMART Warning

johnje · March 12, 2015

Hello,

Just got a notification email this morning indicating the following:

Event: unRAID Disk 2 SMART health [187]

Subject: Warning [TOWER] - reported uncorrect is 1

Description: ST3000DM001-1CH166_W1F2VFMM (sdc)

Importance: warning

How concerned should I be at this point? Should I go purchase a new disk in case of failure?

Are there any checks / troubleshooting steps I can perform?

Thanks in advance!

Squid · March 12, 2015

Post the output of the disk attributes screen (click on the disk in Main, then hit disk attributes)

johnje · March 12, 2015

I have attached a screenshot of the attribute screen for that disk.

I imagine Raw Read Error Rate & Seek Error Rate being so high is not a good thing.

Curious why these 2 things would not get reported, both of these are 0 for all my other disks in this setup.

bubbaQ · March 12, 2015

I imagine Raw Read Error Rate & Seek Error Rate being so high is not a good thing.

Curious why these 2 things would not get reported, both of these are 0 for all my other disks in this setup.

Seagate drives report garbage for those values, and are ignored.

WeeboTech · March 12, 2015

If attribute 187 continues to increase I would question the drive's integrity.

I would suggest capturing md5 hashes of all files with md5sum or md5deep so you can validate the integrity going forward.

There are a number of tools on the forum (bitrot and bunker come to mind).

I would suggest that a backup procedure be implemented on this drive and/or running the manufacturers validation software.

http://en.wikipedia.org/wiki/S.M.A.R.T.

http://www.extremetech.com/computing/194059-using-smart-to-accurately-predict-when-a-hard-drive-is-about-to-die

https://www.backblaze.com/blog/hard-drive-smart-stats/

johnje · March 12, 2015

Thank you for the quick & concise replies!

All mission critical files were moved the moment I noticed this flag pop up.

I will begin capturing / comparing checksums of files on this disk.

I am replacing a few smaller disks in my setup anyways, so may pick up an extra drive in the event this one dies out.

lionelhutz · March 13, 2015

If you have an GOOD extra disk the same size (or bigger) then I would immediately let unRAID rebuilt that disk onto the extra. Then, you can run tests on that disk to validate it's health. That could an indication of a bad sector which is a bad thing to have. If the drive checks out OK you can put it back into the array when the new one was going to go.

SSD · March 13, 2015

According to wikipedia, the meaning of that attribute is ...

The count of errors that could not be recovered using hardware ECC

I can only suspect that a second read was attempted and was successful. If not it would have generated a pending sector, because the SMART system would not just allow a known flawed read to the host OS without an error.

I would not freak over this incident. Comparing MD5 is a good idea. Running a NON-CORRECTING parity check or two to see if there are signs of the condition getting worse. None of the ideas presented are bad ideas, but I personally would not pull it from the array based on a single reported uncorrect with no apparent data corruption or further problems. I have a Seagate drive with a runtime_bad_block of 1 that has been rock solid ever since. I would put that in a similar category.

Don't like these onesy and twosy SMART attribute issues - don't buy Seagate. I have never seen these kinds of things with HGST and they outnumber Seagate drives in my array 10 to 1.

YMMV

SMART Warning

Recommended Posts

johnje

Link to comment

Squid

Link to comment

johnje

Link to comment

bubbaQ

Link to comment

WeeboTech

Link to comment

johnje

Link to comment

lionelhutz

Link to comment

SSD

Link to comment

Archived