mattcoughlin Posted September 16, 2023 Share Posted September 16, 2023 So my unraid system keeps locking up. Under normal use it will work for a while then all the drives, even the device list goes blank and the system won't shutdown. So the next time it boots up it runs a parity check, which locks up. The point where it locks up keeps going down. I have upgraded/replaced ALL the hardware in my system including moving it over to a new chassis, a new HBA, a new USB drive, new NVME cashe drives and the problem persists. I have even formatted my unraid OS back to the stock OS keeping only the array drives/parity and I still cannot get it to quit. I can't complete a long health test on the drives before it locks up. I'm at a loss at this point. The only thing i can think to do is transfer the data from the drives to new drives a few at a time, do a health check, and stress test the old drives to see if something is wrong. But that is a F-ton of work. I would really appreciate some help. 20230915_195433.mp4 unraid-diagnostics-20230915-2015.zip Quote Link to comment
JorgeB Posted September 16, 2023 Share Posted September 16, 2023 There's a segfualt in the log, to rule out any plugin issues reboot in safe mode, if the same post new diags. Quote Link to comment
mattcoughlin Posted September 16, 2023 Author Share Posted September 16, 2023 unraid-diagnostics-20230916-0711.zip Quote Link to comment
mattcoughlin Posted September 16, 2023 Author Share Posted September 16, 2023 it still locked up in safe mode. Here is the diagnostic file after the array locked up. unraid-diagnostics-20230916-0942.zip Quote Link to comment
JorgeB Posted September 17, 2023 Share Posted September 17, 2023 The Unraid driver is crashing, this suggests a hardware problem, start by using just two RAM sticks, if the same try the other two, that would basically rule out a RAM issue. Quote Link to comment
mattcoughlin Posted September 17, 2023 Author Share Posted September 17, 2023 I tried that. I tried 1 ram stick, all 3 other ram sticks individually, as well as pair of them. The problem is i have replaced all the hardware after this started happening. All new intel 13th gen. I've even tried going back to the old hardware (6800k) and still no luck. I replaced the HBA, the cables, moved everything to a new larger rack mount enclosure, new power supply, new NVME cache drives, removed the Nvidia GPU from the system as well as the SFP+ network card, and a new unraid USB stick. The only thing i haven't tried replacing is the hard drives and the fact that it's unraid although i did reinstall unraid fresh. As of yesterday i took the parity drives out (didn't format them, just removed them) and the system has been running the longest so far (19 hours) without crashing. The problem is i can't make any changes or my parity will be invalid. I'm actually not even sure it is still valid since the system could have made a change to something without my realizing it. All my critical data is backed up to other locations so its just movies, music, and tv shows here that i could loose. I'm currently trying to transfer the bulk of the data off the server onto new drives so i can try more aggressive tactics and don't have to worry about losing 100+TB of data. Quote Link to comment
JorgeB Posted September 18, 2023 Share Posted September 18, 2023 The errors I'm seeing still suggest a hardware issue to me, of course cannot be certain. Quote Link to comment
mattcoughlin Posted September 19, 2023 Author Share Posted September 19, 2023 I had an issue with the motherboard and had to RMA it but that seems to have solve the problem. The ram came back fine in the memtest but i'm trying 1 stick at a time at the moment. The only other hardware would the the CPU. I guess that will be my next route to check if the ram doesn't solve the problem. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.