Jump to content

How to diagnose crash


jeffreywhunter

Recommended Posts

I've got a 6.3.5 server that runs fine for a couple days, then stops communicating.  No login from web or console.  Only recourse is to do a hard reset, which is not a happy situation.  I'm looking for advice on how to diagnose this.  

 

I take a look at the log all the time and dont see anything that looks ominous (except I am having a problem with user inotify watches being exceeded (even though set to 720,000).  See that discussion here: https://forums.lime-technology.com/topic/58894-still-not-enough-inotify-watches/

 

I've disabled dockers and no change.  Is there a way to dump the log when the system becomes unresponsive?  or is there a way to have the log written to a file so we can get an inkling of what is going on?  Maybe we'd see something in the last bit of the log that would indicate something...

 

Thanks in advance...

Link to comment

In no particular order, 

 

  • Check for bios updates
  • Run memtest
  • Disable any VMs that utilize hardware passthrough
  • Power Supply Issues
  • Mains voltage dipping and UPS not kicking in or no UPS present.  (ie: do your lights dim in the house when your fridge kicks in?) - Ideally servers should be on a separate isolated electric line, and if not feasible, then definitely not on the same line as anything that has a motor in it
  • CPU overheating
  • Hook up a monitor to the server and see what's on its display at time of non-responsivenes

Log saved up to crash:  Either Fix Common Problems in troubleshooting mode (will also log a ton of other info that may or may not be of interest) or from a command line (ideally via screen or the actual local monitor / keyboard)

tail -f /var/log/syslog /boot/syslog.txt

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...