Jump to content

SMART attributes warning message


Recommended Posts

Hi. I got a warning message from unraid about one of my drives having a Current_Pending_Sector count of 2. I logged into the server and ran an exxtended SMART test and as soon as I started it I received another warning message tell me that the Current_Pending_Sector count on that drive was now 9. The test has finished and the count has remained at 9. I don't fully understand SMART reports but I think that Offline_Uncorrectable also needs to be looked at? This is 0. The extended SMART test did not come back as a fail - the info on that drive in unraid says "Last SMART test result: completed without error". 

 

Do I need to worry? Start planning to replace this drive maybe? Learn more about SMART reports? (I know the answer to the 3rd question already and I plan to do this.) 

Link to comment
23 hours ago, JonathanM said:

yes, sector error counts increasing rapidly is bad, if another drive fails...

if you can get the pending count back to zero and no increasing reallocated, it may be ok, but you need to be ready to replace

Thank, how would I try to get the count back to zero? What steps would I need to take? 

Link to comment

A pending sector should be reallocated or removed from the pending list when new data is written to that spot. It's difficult to determine which files occupy which sectors, so the easiest way is to run a non-correcting parity check which will read from all the disks, when the pending sector is read, it should (hopefully) be overwritten by good data calculated from the rest of the disks + parity after it fails to read, causing the Unraid error count to increase, and moving that sector from pending to reallocated.

 

The issue is if another drive happens to fail before this disk is healthy, the pending sectors will likely fail to read causing any other disk being rebuilt to have errors at that sector as well as probably being corrupt itself.

 

Poor power or other environmental conditions can also cause pending sectors, so the drive isn't always at fault when these show up, but if more sectors keep showing up pending or reallocated chances are the drive is dying.

 

Because drives can fail without warning, it's always prudent to replace drives that you can't trust as soon as possible, in case one of the "good" drives suddenly decides to die unexpectedly. Having dual parity can reduce the stress a little as it can tolerate 2 drive failures, but it's not bulletproof.

 

Unraid's ability to rebuild drives is NOT backup, it's high availability. Always keep current backup of any files you can't afford to lose.

Link to comment
On 7/30/2023 at 2:05 PM, JonathanM said:

A pending sector should be reallocated or removed from the pending list when new data is written to that spot. It's difficult to determine which files occupy which sectors, so the easiest way is to run a non-correcting parity check which will read from all the disks, when the pending sector is read, it should (hopefully) be overwritten by good data calculated from the rest of the disks + parity after it fails to read, causing the Unraid error count to increase, and moving that sector from pending to reallocated.

 

I don't fully understand everything you've said here but I've just started a parity with the "write corrections" box unchecked. Funilly enough a scheduled parity check had just finished but of course the "write corrections" box was checked. When this has finished presumbly I need to run and extended SMART test on the drive in question again and then see if anything has changed? 

 

I think I'll be replacing the drive regardless, it's  one of the oldest in my array, it's a WD green, and it's only 2TB. It's showing 66,900 hours of run time so I feel like I've had decent value out of it to be fair. Since it's a WD Green I can only assume that it lived a previous life in one of my other machines before I added it to my server, but I can't be sure, I don't tend to record the history of my drives. 

Link to comment
13 minutes ago, jj0076 said:

When this has finished presumbly I need to run and extended SMART test on the drive in question again and then see if anything has changed? 

Keep an eye on the pending and reallocated counts as the check progresses. An extended smart test isn't a bad idea, but could still pass even if the drive is steadily getting worse. What you are looking for is stability in the SMART attributes indicating health. Increasing pending and reallocated counts mean the drive is dying. If the progression stops, you can assume the current bad spot has been fully taken care of, and the drive may possibly stay ok for a while.

18 minutes ago, jj0076 said:

it's only 2TB. It's showing 66,900 hours of run time

Probably smart to replace it with a much higher capacity more efficient model.

20 minutes ago, jj0076 said:

it's  one of the oldest in my array,

Perhaps you can copy the data from the other old drives to this new replacement and reduce you spindle count. Fewer drives = fewer failure points.

 

How healthy are the rest of your drives?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...