How to respond to new S.M.A.R.T. Warnings or Errors

June 3, 20224 yr

Why SMART?

Why do I care about SMART? I don't like waiting for Data loss to get noticed, and I don't want to execute a Parity Check every day. So I like to monitor SMART to may detect something before the parity check gets executed.
And for me as a private person, restoring backups is an inconvenient and slow process with a lot of manual tasks. It doesn't make sense to spend effort to make the backup restore more convenient. Or maybe it is just less interesting than spending time with SMART
So I like to try to close the gap between the parity checks with smart monitoring to notice some issues before the be-monthly parity check.

Feedback needed

Did anybody already document or talked about this topic?
I didn't find a procedure, so I thought about it and based on my limited knowledge I made some assumptions and created the following process.

Could you give me feedback about these steps, and in best case point out where I had made wrong assumptions? That would be great.
Maybe we can use the discussion result to share a documentation for people how want to add SMART data in their monitoring process.

My Process

check SMART notification (is a count change of some key fields part of email notifications?)

SMART counter change is critical: New Pending Sectors or other critical indicators

replace Disk immediately (replacement disk was precleaned in the past, that is good enough for this case, no time left for an additional preclean validation)
meanwhile, preclean the broken disk to see if the counter recovers or if the count is stable to reuse the disk (I never had a pending sector disk which did pass the stress test)

SMART counter change is a warning: new relocated sectors

start a parity check nocorrect as stress test
while the check is running, start preclean stress test on the replacement disk again to check if it is still healthy since the last preclean
repeat the parity check if relocated sector does not change to get up to 3 time to get 3 read operations to all sectors

Parity Check result: Relocated Sector Count did not change on all 3 parity checks (read only tests)

don't replace the disk if it was the first time
if the disk had that same issue in the past, replace it to start preclean write stress test on the problematic data disk

Parity check result: Relocated Sector count did raise at any of the parity check runs

disk has a persistent issue, replace it with a precleaned replacement disk

Preclean of the replacement disk did create new relocated sectors

select a different replacement disk and make one preclean run to validate it (even if it already was precleaned in the past)
meanwhile, stress test the suspicious replacement disk with 5 preclean runs to check if it needs to be thrown away (because this takes long, you should select a different replacement disk to fix the array)

Quote

How to respond to new S.M.A.R.T. Warnings or Errors

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)