Hi All.
Hope someone can help as I seem to be out of my depth here on this one.
So i was running a successful unraid server on an HP microserver gen8 with no problems... decided to upgrade my server so purchased the following
Ryzen 1700
16gb ram
asus Prime X370 Mobo (Bios up to date)
256gb NVME (For Cache which I have never had before)
Nvidia GTX 760 (Temporary GPU)
So i preceded to build the machine, plugged in the USB key and the 4 drives I had my array on from the old machine.. booted everything up and all seemed to be fine. I then set-up the cache drive, moved some folder onto it and left it at that.
This is where problems started..
The server started to randomly freeze and only a hard reboot would bring it back up. I checked the logs and found some errors
Error with CPU thread 11 - Thought there was a hardware issue so chucked a new formatted HD in and proceeded to install windows 10 to check for any hardware problems.. spent 5 hours with the machine with windows 10 bench marking cpu/gpu and found no issues what so ever, also did a memmtest and again no issues found
Next i started unraid in safe mode with no plugins installed and what do you know after 2 hours of tinkering no crashes even running 8 dockers.
so next i decided to reboot into normal mode and delete any plugins which i managed to do but before i could reboot server crashed again.
So so far i have only been ale to run the server in safe mode without crashes.
I have just rebooted the machine without any plugins to see how long it last this time around.
some of the other errors in the log i have found
Nov 17 13:33:40 Tower ntpd[1957]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
and something about upstream timeout on certain plugins (Before i removed them)
also noticed this morning before i used safe mode and removed the plugins a couple of cores where stuck at 100% and overall usage at 26% which made loading GUI pages really slow.
The only thing the system has is a i wouldn't say old but a power supply from an old machine. its a corsair TX650w. now i know alot of people say that could be the issue but no problems running in windows environment and i would have thought that would have stressed the system more than unraid would have.
Any ideas how i start to diagnose this issue as the logs dont ssem to show much around the time of the crashes.
I have attached some log files from last night.
tower-diagnostics-20191117-1101.zip
tower-diagnostics-20191116-1812.zip