NichollsGlen Posted July 30, 2023 Share Posted July 30, 2023 Seemingly every 15 days, my unRAID server has a hiccup in the middle of the night and the GUI/terminal are inaccessible. When this happens, I have to do a hard reboot and receive no information about what exactly the problem is. I mirrored my syslog to flash and am seeing a kernel panic a couple days ago (on vacation so didn't notice until today). However, I am unable to determine what the problem actually is, so hopefully someone here will understand the logs better. I have attached the logs from July 28th when this happened last. This looks like there's an issue with nvidia-smi, but that's about as far as I've gotten with this. Let me know if there's any other information needed. syslog.txt Quote Link to comment
JorgeB Posted July 31, 2023 Share Posted July 31, 2023 There are Nvidia call traces, can you test without the GPU or the Nvidia driver installed? Quote Link to comment
NichollsGlen Posted July 31, 2023 Author Share Posted July 31, 2023 (edited) Feel free to close this. I found some posts on Nvidia's support that are similar so I'm going to go that route for support as this doesn't seem like an issue with unRAID. Edited July 31, 2023 by NichollsGlen 1 Quote Link to comment
Solution NichollsGlen Posted September 1, 2023 Author Solution Share Posted September 1, 2023 I removed the CA plugin "Prometheus nvidia-smi Exporter" and it appears to have solved the issue as my server has now been up for 18 days. I'm not certain this plugin is the culprit, but it has stayed up longer than it has since installing that plugin. I'll update back if I still see the issues I described above. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.