Any way to permanently acknowledge SMART errors?


Recommended Posts

I have pulled two drives from a system that had some bad/cheap SATA cables in it, and added them to my unRIAD. When I pulled the drives, the SATA cables were obviously coming apart where they mount to the drive, so I was not surprised to find that unRAID alerted me these drives had positive values in "UDMA CRC error count" (12 on one drive, 4 on the other).

 

As I'm pretty confident this was due to the bad linkage they previously had, I'd like to just acknowledge this error, unless those values start increasing all of a sudden. But it seems every time I stop / start the array, I have to re-acknowledge those errors. Is there any way to permanently let the system know "Yep - 12 and 4 are acceptable values for those two drives"?

Link to comment

Click on the SMART status entry on the dashboard for the disk in question and select the Acknowledge option.
 

CRC entries are never actually reset back to zero, but if the value changes again you will be warned.

 

if you are already doing this, then you may have some issue saving the saved state back to the flash drive.

Edited by remotevisitor
Link to comment
14 minutes ago, technorati said:

that's what I'm doing

21 minutes ago, remotevisitor said:

then you may have some issue saving the saved state back to the flash drive.

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post.

Link to comment

Based on that FAQ, I believe this is because there was another computer on the network that still had the unRAID web UI open across multiple reboots. It wasn't clear to me, though, whether you're saying this is the reason that it's not saving the SMART acks?

 

Thanks for looking into this!

Link to comment
On 5/9/2020 at 7:23 PM, trurl said:

So have you tried it again after fixing that problem?

Yes, I am still seeing the SMART status come back to error again intermittently:

image.png.83b7c25a37bceb4b09632c8bcadcc188.png

 

(I just confirmed there have been no csrf_token errors in my logs since my last reboot)

 

Edited by technorati
add more detail
Link to comment

I have one disk with 2 CRC errors. I acknowledged them long ago by clicking on the "thumbs down" on the Dashboard. Since they haven't increased I don't get any further warnings.

 

Is that what you have done to acknowledge them? Have they increased?

Link to comment
15 minutes ago, trurl said:

I have one disk with 2 CRC errors. I acknowledged them long ago by clicking on the "thumbs down" on the Dashboard. Since they haven't increased I don't get any further warnings.

 

Is that what you have done to acknowledge them? Have they increased?

Yes, I clicked the "Thumbs down" icon and chose "Acknowledge" from the menu. The values (UDMA CRC Count) have not increased on the drives, but certain events (stopping the array, rebooting the server) sometimes cause unRAID to alert me again on the same value in the same attribute. No other attribute on either drive shows any indicators of an alert.

Edited by technorati
Link to comment
9 minutes ago, technorati said:

No other attribute on either drive shows any indicators of an alert.

From Main, click on a drive to get to its page, then go to the Attributes section. Do any of the other attributes have yellow highlight?

 

I am sure the acknowledged count must be stored on flash somewhere but I don't know the details. Since it isn't happening all the time I wonder if you don't have an intermittent flash problem. When it happens again post new diagnostics.

Link to comment
3 minutes ago, trurl said:

From Main, click on a drive to get to its page, then go to the Attributes section. Do any of the other attributes have yellow highlight?

No - that's exactly what I meant by "no other attribute on either drive shows any indicators of an alert" - the only field in yellow on either drive is that UDMA CRC Count.

juggernaut-diagnostics-20200511-1001.zip

Attached is an updated diagnostics.zip from when I had the error this morning.

Link to comment
2 hours ago, technorati said:

No - that's exactly what I meant by "no other attribute on either drive shows any indicators of an alert" - the only field in yellow on either drive is that UDMA CRC Count.

The yellow highlight will not go away when you access the drives Attributes. It is the Dashboard SMART warning icon and associated Notifications that are supposed to no longer happen once you have acknowledged, until the count increases. Are you still getting those?

 

Nothing obvious in diagnostics that makes me think there is a Flash problem. Has Fix Common Problems ever told you there was a Flash problem?

 

Not ideal, but you can configure each disk regarding which SMART attributes get monitored by clicking on the disk to get to its page, then go to SMART Settings and uncheck the box for the attribute. Of course that means it will never check that attribute again, though you can always take a look at it yourself in the Attributes section as before.

 

 

Link to comment
3 hours ago, trurl said:

The yellow highlight will not go away when you access the drives Attributes. It is the Dashboard SMART warning icon and associated Notifications that are supposed to no longer happen once you have acknowledged, until the count increases. Are you still getting those?

 

Yes, I'm still getting them - there's a screenshot a few posts above here showing that it came back again this morning.

 

Quote

Nothing obvious in diagnostics that makes me think there is a Flash problem. Has Fix Common Problems ever told you there was a Flash problem?

No, I just ran "Fix Common Problems" and it doesn't report any issues with flash; nor have I ever seen anything in the logs or other notifications suggesting there was.

 

Quote

Not ideal, but you can configure each disk regarding which SMART attributes get monitored by clicking on the disk to get to its page, then go to SMART Settings and uncheck the box for the attribute. Of course that means it will never check that attribute again, though you can always take a look at it yourself in the Attributes section as before.

I'd prefer to live with it, so at least I get notified if the count does increase, indicating that a larger problem exists. Mostly, I was just curious why it would keep notifying me of something I'd already acknowledged, but it sounds like there's not a ready/obvious answer to that.

Link to comment
  • 2 years later...

I was not sure about the state of my Disk because I did not remember if I hit acknowledge half a year ago, and it did degrade since then or if I did not hit acknowledge in the past.

Because the UI does not tell you the stored values from the last acknowledgement, I did search for the config file.

This topic was listed in the Google search, so I will provide the answer here:
/boot/config/plugins/dynamix/monitor.ini this file tells you the old values and helps you to decide if your disk did degrade significantly or just had a hick up.
I recommend making a copy of this file with the current timestamp because It does not tell you at which date these values got recorded and the extended smart error log does only save errors.

Edited by Falcosc
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.