March 30, 20179 yr I have started having an issue with my test build spontaneously restarting, its running several dockers and VMs, I have the Fix Common Problems plugin and I have it logging syslog to flash but it doesn't seem to have anything in there that gives any suggestion as to why it rebooted. I run the dockers, plexmediaserver (from official repro) and linuxserver.io versions of sonarr, radarr and ruTorrent, along with jackett. FCPsyslog_tail.txt tower-diagnostics-20170330-1622.zip
March 30, 20179 yr Spontaneous restarts (with no logging) are usually one of a couple of things... If the VMs utilize passthrough, then BIOS updates Power Supply is iffy Memory issues Power dip at the mains (or at UPS) sufficient enough to bugger the system up And at least one user here had a cat who was ultimately discovered to be playing with the restart switch. (Evil creatures) You can try disabling the VMs and see if any improvement happens...
March 30, 20179 yr Author It has been working fine with VMs running for several days, but annoyingly is changed a few things all at once. I changed from ahci to sw raid (but just running as jbod) in bios, thinking it may improve performance, I've also changed the CPUs for more powerful ones and enabled HT. I've also installed several dockers since. If one of the CPUs is defective, would it get chance to write the syslog to flash? Edited March 30, 20179 yr by Spies
March 30, 20179 yr 1 hour ago, Spies said: It has been working fine with VMs running for several days, but annoyingly is changed a few things all at once. I changed from ahci to sw raid (but just running as jbod) in bios, thinking it may improve performance, I've also changed the CPUs for more powerful ones and enabled HT. I've also installed several dockers since. If one of the CPUs is defective, would it get chance to write the syslog to flash? The only things that aren't logged up to the moment of a crash / restart is what gets outputted to stderr (ie: the local monitor) by the base OS. Investigated capturing that a long time ago, and its impossible to do from a different session. If you're lucky and the system just plain crashes, then take a photo of what's on the attached monitor. But, by changing the CPU's, I'd go with either Power Supply isn't up to snuff or memory issues. Run memtest off the boot menu and see what happens.
March 30, 20179 yr Author I think intel server boards have a bios level log of sorts. So if it happens again, i will send what that says and i will check the bios revision to make sure it's in the latest.
March 30, 20179 yr Author I have allocated all 16 cores and 12gb of ram to my server VM, I'm running Prime95 on it and it's been fine. Memory usage in the dashboard is 98% and CPU is at 100% as you would expect. The only changes are that the dockers rutorrent, sonarr, radarr and jackett are not running. I'm a little surprised that a docker could possibly bring down an entire system, could rutorrent cause an exception over a bridged interface with the VMs somehow?
March 31, 20179 yr Author The VM has crashed overnight, bit more info this time FCPsyslog_tail.txt tower-diagnostics-20170331-0853.zip
March 31, 20179 yr Author Ran memtest in SMT mode, it found bad blocks above 16gb, how do I work out which module is at fault as I have to install them in pairs?
April 8, 20179 yr Author Still getting random reboots, usually when there is high i/o (torrents downloading). RAM has been upgraded to 8x4gb Rdimm, flash stick has been replaced with a brand new 16gb one. Any clever ways to capture the console output on the VGA?
April 10, 20179 yr Author Changed the cache drive to xfs because I run my VMs from it and its a spinner. Turned off PCIe ACS Override as I no longer need it. Recording the console screen with a webcam to see if I can catch the panic.
April 14, 20179 yr Author Turning off PCIe ACS made no difference. I've replaced the power supply with no change. I can't capture the kernel panic (if there is one) from the console, the webcam just showed the screen going black. All the hardware checks out fine when tested independently, really frustrating as unraid works perfectly on my HP microserver
April 15, 20179 yr Have you tried limiting the number of VM's and dockers running to see if the system will stabilize, thus trying to locate the item that may be the root cause?
April 19, 20179 yr Author I have done this, and it was running for 3 days, I started the ones I had disabled and a day later it is still running. The only difference, is that I have disconnected a USB hard drive I was using to backup to through my Server 2016 VM.
April 22, 20179 yr Author Exhausted all possibilities of it being software now, pretty sure its a motherboard issue because of the way the system just dies. 1_2017-04-22_15-50-49.mp4
April 29, 20179 yr Author Just to bring this to a conclusion, I switched out the motherboard, RAM and CPUs for an old LGA755 system and it's been working fine, so I'm on the lookout for a replacement LGA1366 board now.
Archived
This topic is now archived and is closed to further replies.