How to diagnose crash

jeffreywhunter · July 22, 2017

I've got a 6.3.5 server that runs fine for a couple days, then stops communicating. No login from web or console. Only recourse is to do a hard reset, which is not a happy situation. I'm looking for advice on how to diagnose this.

I take a look at the log all the time and dont see anything that looks ominous (except I am having a problem with user inotify watches being exceeded (even though set to 720,000). See that discussion here: https://forums.lime-technology.com/topic/58894-still-not-enough-inotify-watches/

I've disabled dockers and no change. Is there a way to dump the log when the system becomes unresponsive? or is there a way to have the log written to a file so we can get an inkling of what is going on? Maybe we'd see something in the last bit of the log that would indicate something...

Thanks in advance...

Squid · July 22, 2017

In no particular order,

Check for bios updates
Run memtest
Disable any VMs that utilize hardware passthrough
Power Supply Issues
Mains voltage dipping and UPS not kicking in or no UPS present. (ie: do your lights dim in the house when your fridge kicks in?) - Ideally servers should be on a separate isolated electric line, and if not feasible, then definitely not on the same line as anything that has a motor in it
CPU overheating
Hook up a monitor to the server and see what's on its display at time of non-responsivenes

Log saved up to crash: Either Fix Common Problems in troubleshooting mode (will also log a ton of other info that may or may not be of interest) or from a command line (ideally via screen or the actual local monitor / keyboard)

tail -f /var/log/syslog /boot/syslog.txt

JonathanM · July 23, 2017

13 hours ago, Squid said:

definitely not on the same line as anything that has a motor in it

or laser printer.

How to diagnose crash

Recommended Posts

jeffreywhunter

Link to comment

Squid

Link to comment

JonathanM

Link to comment

Archived