eroc1990 Posted July 8, 2021 Share Posted July 8, 2021 (edited) Hey all, I've been running into a somewhat odd issue lately. I'm running Unraid 6.9.2 mainly as a Docker container host, but do utilize it for some file shares on my LAN. I have a QNAP TS-251+ acting as a storage backend for most of the larger data within my Docker containers, mounted as an SMB share through Unassigned Devices. These are both connected to my network via gigabit network connections. Without fail, at least once or twice a day, my server will completely power itself off without my instruction. There are no processes I'm running that would automate a shutdown of the server. I've attached a snippet of my syslog. I'm seeing a number of kernel trap messages that are formatted similar to the following: kernel: traps: lsof[32635] general protection fault ip:14f154227a9e sp:c1118279394e5048 error:0 in libc-2.30.so[14f154208000+16b000] I recently swapped out a 2GB RAM stick I had to 8GB to give myself 16GB total, so I initially thought that might have been the issue, but no matter what combination of RAM I try, it doesn't result in stability lasting for longer than a day. Has anyone else seen this issue before that might be able to point me in the right troubleshooting direction? Thanks!! new 1.txt ericserverpc-diagnostics-20210708-1418.zip Edited July 8, 2021 by eroc1990 Adding disgnostics Quote Link to comment
eroc1990 Posted July 24, 2021 Author Share Posted July 24, 2021 (edited) Updating with a more recent diagnostic report.ericserverpc-diagnostics-20210723-2233.zip I'm starting to suspect the lsof errors less than before because I no longer see them leading up to the crashes, but I still can't pin down the root cause. Edited July 24, 2021 by eroc1990 Adding a bit more context Quote Link to comment
JorgeB Posted July 24, 2021 Share Posted July 24, 2021 Server shutting down, instead of crashing/locking up, is almost always a hardware problem, start by checking cooling and fans, could also be a PSU/board issue. Quote Link to comment
eroc1990 Posted July 24, 2021 Author Share Posted July 24, 2021 Temps are fine and all fans are working. PSU I haven’t tried and am hoping I don’t have to, but we will see. I might have to give it a shot at this rate. Quote Link to comment
eroc1990 Posted July 27, 2021 Author Share Posted July 27, 2021 Well I tried changing out the PSU. Swapped the OEM 300W out for a 450W EVGA PSU and that failed. Server remained powered on for about an hour and then crashed. I'll have to wait for a bit to swap out the mobo, but that might be necessary at this point. Quote Link to comment
wokemup Posted July 27, 2021 Share Posted July 27, 2021 This might be a silly suggestion but check to see if you have any miscellaneous usb or other peripherals hooked up to the system. I just recently had a tiny bluetooth usb adapter in a system and it was conflicting with the motherboard bluetooth. I would have thought this was no big deal but it did keep randomly causing reboots until I took our the bluetooth adapter. Which in general to troubleshoot its good to just simplify the system as much as you can and see if the reboots keep happening. I hate these types of problems. I would have suggested PSU as well. But I see you confirmed that was not a problem. Quote Link to comment
eroc1990 Posted July 28, 2021 Author Share Posted July 28, 2021 The only physical connections to my server are network, power, and my boot usb. So unfortunately that’s not it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.