Smart health failure = imminent disk failure?

Nodiaque · June 17, 2022

Hello everyone,

I just received a smart health check error on one of my disk

Does it mean my disk is imminent failure and I must change it asap?

Thank you

JonathanM · June 17, 2022

Not all SMART errors are equal, but that particular one looks pretty bad. You could run a SMART extended test to get more info, but I would plan on replacing it.

Nodiaque · June 17, 2022

Ok. I've already started the extended test, waiting for the result.

Thanks for the input

Nodiaque · June 18, 2022

do you know how much time it can take for an 8tb hdd? It's been running for the past hour and it's still at 10%

trurl · June 18, 2022

2 hours per TB

Nodiaque · June 18, 2022

HOLY! ok, I'll check in 2 days... Thanks!

trurl · June 18, 2022

With updates at each 10%

trurl · June 18, 2022

If that disk is in your array you should attach diagnostics to your NEXT post in this thread. I would replace and then decide if it is good enough to use for something else

Nodiaque · June 18, 2022

I'm ordering 2 new 16tb. I have 2x 8tb in the array that are 4 years old (at least) seagate archive drive that where on 24/7 before it was put in that server. These are slow drive that are suppose to be used as cold storage. I'll probably put them in mirror and have them as cold backup, once a month sync.

It may explain the weird thing that are happening with my unraid server that nobody ever found the cause, although it should be from ram since it's in the OS, but I don't know.

Just need to find a sata external enclosure now so they can be better cooled.

trurl · June 18, 2022

11 hours ago, Nodiaque said:

It may explain the weird thing that are happening with my unraid server that nobody ever found the cause, although it should be from ram since it's in the OS, but I don't know.

Don't know what you are referring to, but if you suspect a RAM problem you shouldn't even be running your server until you verify RAM is OK. Everything goes through RAM, the OS and any application code, your data, everything. The CPU can't do anything with anything until it is loaded into RAM.

Have you done memtest?

Nodiaque · June 18, 2022

I have done memtest. You can check in my post history, I have 3 other thread with investigation that lead to nothing. Ram was tested with memtest86+ latest version as of last week, no error. GPU was also swapped

Problem I have, yes a reboot fixed it but when you're away, it's not the best thing:

- ini file in /usr/local/emhttp/state disapear out of nowhere, making the webgui unworking

- docker tab not working, dashboard loading without docker info. Had to force shutdown with power button cause even in putty, couldn't make a reboot

- Unraid stop working, webgui not responsive, all docker not working, cannot reboot from shell (have to force shutdown from power)

smart health is still only at 10% after more then 12 hours.... I guess it will fail

Edited June 18, 2022 by Nodiaque

Nodiaque · June 18, 2022

here's the requested diagnostic

servraid-diagnostics-20220618-0855.zip

trurl · June 18, 2022

Flash drive problems can lead to UI problems. Another thing that can happen is filling rootfs somehow. rootfs is the RAM the OS is in. If you fill rootfs the OS has no space to work with its own files and all sorts of odd things can happen. A common reason for filling rootfs is a docker mapping to some host path that isn't actual storage.

I didn't notice either of those in your diagnostics.

Do the problems begin soon after booting, or does it run OK for a while?

You can see how much of rootfs is used in the df output. This is in diagnostics, you can get the same results with this command line:

df -h

Nodiaque · June 18, 2022

It happen randomly. rootfs when I checked the last time was at the same % as right now, so not full. First time it took about 3 months, then it happen in 24 hour. It then run fine for a while before another problem appear, then about a month later got the last bug I had with ini file. We though maybe it was backup plugin since it seems to start when it ended, but the problem reproduce 12 hours after the last crash which was far from the backup schedule. Because of that, I haven't tried to upgrade to 6.10.x, want to be sure my base is stable before.

I was just wondering, how can the flash drive be a problematic if it's not used once it's booted (since everything is in ram)?

trurl · June 18, 2022

Did you follow this suggestions from one of those other threads?

On 4/28/2022 at 7:33 AM, JorgeB said:

see if it works without the Nvidia GPU, or without loading the Nvidia driver.

Nodiaque · June 18, 2022

I cannot remove it entirely, there's no onboard gpu and the board won't boot without one. But I did use it without the gpu driver and the gpu not loaded in any docker, had same problem. I switched card recently to see if it help.

Nodiaque · June 18, 2022

Is that the remaining time for the smart scan?

Nodiaque · June 19, 2022

Smart finished and it's not pretty. New HDD arriving tomorrow.

servraid-diagnostics-20220619-0847.zip

Edited June 19, 2022 by Nodiaque

Smart health failure = imminent disk failure?

Recommended Posts

Nodiaque

Link to comment

JonathanM

Link to comment

Nodiaque

Link to comment

Nodiaque

Link to comment

trurl

Link to comment

Nodiaque

Link to comment

trurl

Link to comment

trurl

Link to comment

Nodiaque

Link to comment

trurl

Link to comment

Nodiaque

Link to comment

Nodiaque

Link to comment

trurl

Link to comment

Nodiaque

Link to comment

trurl

Link to comment

Nodiaque

Link to comment

Nodiaque

Link to comment

Nodiaque

Link to comment

Join the conversation