Cody Peters Posted December 24, 2023 Share Posted December 24, 2023 (edited) My server is struggling. It died overnight after seeing a bunch of high latency events in my network. I have tested the USB stick, and it seems fine. I initially thought it was my 10GB Mellanox nic, I pulled it, but am still having issues getting the system online. Unable to get a web interface at this point. I can access the terminal from a direct attached monitor and keyboard, but I usually only get a few minutes before the call trace starts and locks me out. Update 1: Safe mode appears to be working ok. Nic card reinstalled. Everything is running. So that means its likely a plugin? Is there any way to re-enable one at a time? Ive seen some mentions of nvidia in the errors. I suspect its an issue with the nvidia driver/plugin, or my video card itself. Update 2: Spoke to soon, still crashing while in safe mode. Picture attached. Console stops working when this happens. Update 3: Decided to run memtest86. Failed. Removed 2 of 4 sticks, passed. Removed the other 2 sticks, put original 2 back in, passed, put all 4 back in, passed.... Ummm. This was a head scratcher. server-diagnostics-20231224-1401.zip Edited December 26, 2023 by Cody Peters update Quote Link to comment
itimpi Posted December 24, 2023 Share Posted December 24, 2023 You can rename any of the .plg files in the config folder on the flash drive to have a different extension and then reboot to stop a particular plugin from ,loading. Quote Link to comment
Cody Peters Posted December 25, 2023 Author Share Posted December 25, 2023 17 minutes ago, itimpi said: You can rename any of the .plg files in the config folder on the flash drive to have a different extension and then reboot to stop a particular plugin from ,loading. Awesome, thanks for this information. However, I left it running in safe mode for a bit and it started crashing again. I updated my post. Quote Link to comment
JorgeB Posted December 25, 2023 Share Posted December 25, 2023 Enable the syslog server and post that after a crash. Quote Link to comment
Cody Peters Posted December 26, 2023 Author Share Posted December 26, 2023 23 hours ago, JorgeB said: Enable the syslog server and post that after a crash. Thanks for the info, but it turned out to be a random, but not actual ram failure. Its running again and doing a parity check. Seems okay. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.