Nodiaque Posted June 17, 2022 Share Posted June 17, 2022 Hello everyone, I just received a smart health check error on one of my disk < Does it mean my disk is imminent failure and I must change it asap? Thank you Quote Link to comment
JonathanM Posted June 17, 2022 Share Posted June 17, 2022 Not all SMART errors are equal, but that particular one looks pretty bad. You could run a SMART extended test to get more info, but I would plan on replacing it. Quote Link to comment
Nodiaque Posted June 17, 2022 Author Share Posted June 17, 2022 Ok. I've already started the extended test, waiting for the result. Thanks for the input Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 do you know how much time it can take for an 8tb hdd? It's been running for the past hour and it's still at 10% Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 HOLY! ok, I'll check in 2 days... Thanks! Quote Link to comment
trurl Posted June 18, 2022 Share Posted June 18, 2022 If that disk is in your array you should attach diagnostics to your NEXT post in this thread. I would replace and then decide if it is good enough to use for something else Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 I'm ordering 2 new 16tb. I have 2x 8tb in the array that are 4 years old (at least) seagate archive drive that where on 24/7 before it was put in that server. These are slow drive that are suppose to be used as cold storage. I'll probably put them in mirror and have them as cold backup, once a month sync. It may explain the weird thing that are happening with my unraid server that nobody ever found the cause, although it should be from ram since it's in the OS, but I don't know. Just need to find a sata external enclosure now so they can be better cooled. Quote Link to comment
trurl Posted June 18, 2022 Share Posted June 18, 2022 11 hours ago, Nodiaque said: It may explain the weird thing that are happening with my unraid server that nobody ever found the cause, although it should be from ram since it's in the OS, but I don't know. Don't know what you are referring to, but if you suspect a RAM problem you shouldn't even be running your server until you verify RAM is OK. Everything goes through RAM, the OS and any application code, your data, everything. The CPU can't do anything with anything until it is loaded into RAM. Have you done memtest? Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 (edited) I have done memtest. You can check in my post history, I have 3 other thread with investigation that lead to nothing. Ram was tested with memtest86+ latest version as of last week, no error. GPU was also swapped Problem I have, yes a reboot fixed it but when you're away, it's not the best thing: - ini file in /usr/local/emhttp/state disapear out of nowhere, making the webgui unworking - docker tab not working, dashboard loading without docker info. Had to force shutdown with power button cause even in putty, couldn't make a reboot - Unraid stop working, webgui not responsive, all docker not working, cannot reboot from shell (have to force shutdown from power) smart health is still only at 10% after more then 12 hours.... I guess it will fail Edited June 18, 2022 by Nodiaque Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 here's the requested diagnostic servraid-diagnostics-20220618-0855.zip Quote Link to comment
trurl Posted June 18, 2022 Share Posted June 18, 2022 Flash drive problems can lead to UI problems. Another thing that can happen is filling rootfs somehow. rootfs is the RAM the OS is in. If you fill rootfs the OS has no space to work with its own files and all sorts of odd things can happen. A common reason for filling rootfs is a docker mapping to some host path that isn't actual storage. I didn't notice either of those in your diagnostics. Do the problems begin soon after booting, or does it run OK for a while? You can see how much of rootfs is used in the df output. This is in diagnostics, you can get the same results with this command line: df -h Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 It happen randomly. rootfs when I checked the last time was at the same % as right now, so not full. First time it took about 3 months, then it happen in 24 hour. It then run fine for a while before another problem appear, then about a month later got the last bug I had with ini file. We though maybe it was backup plugin since it seems to start when it ended, but the problem reproduce 12 hours after the last crash which was far from the backup schedule. Because of that, I haven't tried to upgrade to 6.10.x, want to be sure my base is stable before. I was just wondering, how can the flash drive be a problematic if it's not used once it's booted (since everything is in ram)? Quote Link to comment
trurl Posted June 18, 2022 Share Posted June 18, 2022 Did you follow this suggestions from one of those other threads? On 4/28/2022 at 7:33 AM, JorgeB said: see if it works without the Nvidia GPU, or without loading the Nvidia driver. Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 I cannot remove it entirely, there's no onboard gpu and the board won't boot without one. But I did use it without the gpu driver and the gpu not loaded in any docker, had same problem. I switched card recently to see if it help. Quote Link to comment
Nodiaque Posted June 18, 2022 Author Share Posted June 18, 2022 Is that the remaining time for the smart scan? Quote Link to comment
Nodiaque Posted June 19, 2022 Author Share Posted June 19, 2022 (edited) Smart finished and it's not pretty. New HDD arriving tomorrow. servraid-diagnostics-20220619-0847.zip Edited June 19, 2022 by Nodiaque Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.