July 22, 20178 yr I've got a 6.3.5 server that runs fine for a couple days, then stops communicating. No login from web or console. Only recourse is to do a hard reset, which is not a happy situation. I'm looking for advice on how to diagnose this. I take a look at the log all the time and dont see anything that looks ominous (except I am having a problem with user inotify watches being exceeded (even though set to 720,000). See that discussion here: https://forums.lime-technology.com/topic/58894-still-not-enough-inotify-watches/ I've disabled dockers and no change. Is there a way to dump the log when the system becomes unresponsive? or is there a way to have the log written to a file so we can get an inkling of what is going on? Maybe we'd see something in the last bit of the log that would indicate something... Thanks in advance...
July 22, 20178 yr In no particular order, Check for bios updates Run memtest Disable any VMs that utilize hardware passthrough Power Supply Issues Mains voltage dipping and UPS not kicking in or no UPS present. (ie: do your lights dim in the house when your fridge kicks in?) - Ideally servers should be on a separate isolated electric line, and if not feasible, then definitely not on the same line as anything that has a motor in it CPU overheating Hook up a monitor to the server and see what's on its display at time of non-responsivenes Log saved up to crash: Either Fix Common Problems in troubleshooting mode (will also log a ton of other info that may or may not be of interest) or from a command line (ideally via screen or the actual local monitor / keyboard) tail -f /var/log/syslog /boot/syslog.txt
July 23, 20178 yr 13 hours ago, Squid said: definitely not on the same line as anything that has a motor in it or laser printer.
Archived
This topic is now archived and is closed to further replies.