System crashing need assistance

March 30, 20179 yr

I have started having an issue with my test build spontaneously restarting, its running several dockers and VMs, I have the Fix Common Problems plugin and I have it logging syslog to flash but it doesn't seem to have anything in there that gives any suggestion as to why it rebooted.

I run the dockers, plexmediaserver (from official repro) and linuxserver.io versions of sonarr, radarr and ruTorrent, along with jackett.

FCPsyslog_tail.txt

tower-diagnostics-20170330-1622.zip

Quote

March 30, 20179 yr

Spontaneous restarts (with no logging) are usually one of a couple of things...

If the VMs utilize passthrough, then BIOS updates
Power Supply is iffy
Memory issues
Power dip at the mains (or at UPS) sufficient enough to bugger the system up
And at least one user here had a cat who was ultimately discovered to be playing with the restart switch. (Evil creatures)

You can try disabling the VMs and see if any improvement happens...

Quote

March 30, 20179 yr

Author

It has been working fine with VMs running for several days, but annoyingly is changed a few things all at once.

I changed from ahci to sw raid (but just running as jbod) in bios, thinking it may improve performance, I've also changed the CPUs for more powerful ones and enabled HT. I've also installed several dockers since.

If one of the CPUs is defective, would it get chance to write the syslog to flash?

Edited March 30, 20179 yr by Spies

Quote

March 30, 20179 yr

1 hour ago, Spies said:

It has been working fine with VMs running for several days, but annoyingly is changed a few things all at once.

I changed from ahci to sw raid (but just running as jbod) in bios, thinking it may improve performance, I've also changed the CPUs for more powerful ones and enabled HT. I've also installed several dockers since.

If one of the CPUs is defective, would it get chance to write the syslog to flash?

The only things that aren't logged up to the moment of a crash / restart is what gets outputted to stderr (ie: the local monitor) by the base OS. Investigated capturing that a long time ago, and its impossible to do from a different session. If you're lucky and the system just plain crashes, then take a photo of what's on the attached monitor.

But, by changing the CPU's, I'd go with either Power Supply isn't up to snuff or memory issues. Run memtest off the boot menu and see what happens.

Quote

March 30, 20179 yr

Author

I think intel server boards have a bios level log of sorts. So if it happens again, i will send what that says and i will check the bios revision to make sure it's in the latest.

Quote

March 30, 20179 yr

Author

I have allocated all 16 cores and 12gb of ram to my server VM, I'm running Prime95 on it and it's been fine. Memory usage in the dashboard is 98% and CPU is at 100% as you would expect.

The only changes are that the dockers rutorrent, sonarr, radarr and jackett are not running.

I'm a little surprised that a docker could possibly bring down an entire system, could rutorrent cause an exception over a bridged interface with the VMs somehow?

Quote

March 31, 20179 yr

Author

The VM has crashed overnight, bit more info this time

FCPsyslog_tail.txt

tower-diagnostics-20170331-0853.zip

Quote

March 31, 20179 yr

Author

Ran memtest in SMT mode, it found bad blocks above 16gb, how do I work out which module is at fault as I have to install them in pairs?

Quote

April 8, 20179 yr

Author

Still getting random reboots, usually when there is high i/o (torrents downloading).

RAM has been upgraded to 8x4gb Rdimm, flash stick has been replaced with a brand new 16gb one.

Any clever ways to capture the console output on the VGA?

Quote

April 10, 20179 yr

Author

Changed the cache drive to xfs because I run my VMs from it and its a spinner.

Turned off PCIe ACS Override as I no longer need it.

Recording the console screen with a webcam to see if I can catch the panic.

Quote

April 14, 20179 yr

Author

Turning off PCIe ACS made no difference.

I've replaced the power supply with no change.

I can't capture the kernel panic (if there is one) from the console, the webcam just showed the screen going black.

All the hardware checks out fine when tested independently, really frustrating as unraid works perfectly on my HP microserver

Quote

April 15, 20179 yr

Have you tried limiting the number of VM's and dockers running to see if the system will stabilize, thus trying to locate the item that may be the root cause?

Quote

April 19, 20179 yr

Author

I have done this, and it was running for 3 days, I started the ones I had disabled and a day later it is still running.

The only difference, is that I have disconnected a USB hard drive I was using to backup to through my Server 2016 VM.

Quote

April 22, 20179 yr

Author

Exhausted all possibilities of it being software now, pretty sure its a motherboard issue because of the way the system just dies.

1_2017-04-22_15-50-49.mp4

Quote

April 29, 20179 yr

Author

Just to bring this to a conclusion, I switched out the motherboard, RAM and CPUs for an old LGA755 system and it's been working fine, so I'm on the lookout for a replacement LGA1366 board now.

Quote

System crashing need assistance

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)