How bad is my drive?

May 28, 20188 yr

I see this on my unraid today.

Here are my diagnostics.

tower-diagnostics-20180528-2227.zip

Please advise.

Thanks.

Quote

May 28, 20188 yr

Community Expert

Run an extended SMART test to confirm but looks like it's failing.

You should also improve your cooling, disks should be below 40C, 45C max.

Quote

May 31, 20188 yr

Author

SMART extended self test result : Completed without error.

So i guess, i still have few months left on the drive.

Quote

May 31, 20188 yr

7 minutes ago, publicENEMY said:

So i guess, i still have few months left on the drive.

Years, months, weeks, days, hours.... how lucky do you feel?

All drives fail eventually, the trick is predicting when.

Keep in mind if you have another drive fail suddenly without warning, you will be relying on this drive to rebuild it. Do you feel confident enough in this drive to trust the rest of your array data to it?

Quote

May 31, 20188 yr

When you see a low count of pending or uncorrectable sectors, it's more or less impossible to know if the sectors have problems because of wear making the drive bad at reading the data. Or if there was some disturbance when the drive did write the data (vibration, power surge or similar).

All you know, is that the drive can read these four sectors but there are too many bit errors for the drive to recompute the correct content and rewrite the sector. That's why the sectors are offline uncorrectable. The only way the drive might be able to flag the sectors as good again is if you happen to overwrite the sectors by updating the disk contents that is stored on these four sectors. If the surface itself is good, then the rewrite can manage perfectly readable data. Alas, unRAID doesn't have any logic to do a full surface read to try to locate these four sectors and then use the parity and the other drives to try to rewrite these sectors.

Anyway - in the end, it's a gamble to try to guess exactly what was the reason the drive got four offline uncorrectable sectors.

If a standalone drive got this issue, I might make sure I have a fresh backup and then continue to use the drive while keeping track of the counters to see if they climb. If they climb, then you really have to assume there is an issue. Or possibly I would change the disk to a backup/archival disk and replace with a new disk for online use.

But in your case, it's a disk in a protected array with only a single parity. That means that this single disk also affects the ability to repair any other disk in case of a second disk issue. Do you really want to continue to run your array with a partially imperfect state? Remember that the four sectors can't be read. So a rebuild of a different disk would result in four sectors with wrong content on the rebuilt disk. And if this drive is marginal, then it could add 10 or 100 or 1000 new uncorrectable sectors during that rebuild. So definitely think twice about keeping the drive in the array.

Quote

May 31, 20188 yr

Community Expert

You might actually find that your problem with this disk might be 'fixed' by getting the temperatures DOWN!!! You need to address the air flow through the case or the relocate the server to a spot where the ambient temperatures are lower.

As I recall, a rebuild of this disk at more reasonable temperatures could fix those errors. OR you could just replace it and then run a couple of preclear cycles on it to see what its health condition is.

Quote

June 1, 20188 yr

Mine started failing (?) 2-3 days ago, but the email settings are out of date and i didn't get notified in the beginning.

I see that the value is increasing

Quote

187 Reported uncorrect

Below 30 in the morning and now 34

I have attached smart report.

It's my (only) cache drive and I already backed up everything important (appdata, vdisks for VMS...)

I will shut it down now, replace cables and try another extended test.

Any ideas?

Quote

June 1, 20188 yr

1 hour ago, karateo said:

Mine started failing (?) 2-3 days ago, but the email settings are out of date and i didn't get notified in the beginning.

The goal is to constantly supervise your email functionality - by having unRAID send you a mail type "Notice [N54L-3] - array health report [PASS]" every night.

if the mail stops comming, you know something is wrong.

Or if the mails stops saying "[PASS]"

Any supervision worth having must avoid happy path configuration where everything is silent until something goes wrong - then you have no way of noticing that the supervision has stopped working.

Quote

How bad is my drive?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)