March 4, 20233 yr I've been struggeling with concurring kernel panics on unRAID a few times already, mostly seeming to be connected to defective hardware. The last time I was able to point it to the RAM and since that haven't had issues for about two months. Recently now unRAID again seems to randomly lock up after a few hours of uptime (I never seem to get a full 24 hours before it happens). The one thing I noticed was, that it started happening after a update of the Nvidia-Drivers plugin. At first, every time I utilized the GPU either directly or via a docker, lock-ups would occur after a few hours of use. I rolled back to different driver-versions with no effect. Even when intentionally not utilizing the GPU at all the lock-ups happened. So I'm not sure anymore, if they are tied to the GPU or if there is a different problem. I'd much appreciate a nod in the right direction. Thanks in advance! nrc-diagnostics-20230304-1054.zip
March 4, 20233 yr Community Expert Start here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 If that doesn't help or was already done enable the syslog server and post that after a crash.
March 20, 20233 yr Author Solution Over the last week the lock-ups occured more rapidly, so in addition to the things mentioned above I did a lot of testing, even swapping the CPU. After a lot of digging I found that either of two things seem to coincide with the lock-ups. So for anyone reading this, maybe this helps you too: SSH-Connections using the SSH-Plugin For a while I have been using this SSH-Plugin to have a more granular control. In the logs I found messages regarding SSH-connections occuring shortly before a crash/lock-up. So I deactivated and eventually removed the plugin and moved to this docker-solution, which has been working flawlessly and is even a bit better for what I need. PCIe-Confusions I'm using a GTX 1050 Ti and an LSI Host Bus Adapter as PCI-Devices with the lanes split as 2x8. After the lock-ups the display sometimes showed some cryptic error messages containing PCI-specific values. So I swapped the cards around some and allocated a full 16 PCIe-Lanes to the Graphics Card with the HBA using the remaining 4 Lanes. This seems to have done the trick. I have since gotten a few days without any hickups so I hope this has solved the problem for good.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.