Need help with a crashing server


enigma27

Recommended Posts

Hi All.

 

Hope someone can help as I seem to be out of my depth here on this one.

 

So i was running a successful unraid server on an HP microserver gen8 with no problems... decided to upgrade my server so purchased the following

 

Ryzen 1700

16gb ram

asus Prime X370 Mobo (Bios up to date)

256gb NVME (For Cache which I have never had before)

Nvidia GTX 760 (Temporary GPU)

 

So i preceded to build the machine, plugged in the USB key and the 4 drives I had my array on from the old machine.. booted everything up and all seemed to be fine. I then set-up the cache drive, moved some folder onto it and left it at that.

 

This is where problems started..

 

The server started to randomly freeze and only a hard reboot would bring it back up. I checked the logs and found some errors

 

Error with CPU thread 11 - Thought there was a hardware issue so chucked a new formatted HD in and proceeded to install windows 10 to check for any hardware problems.. spent 5 hours with the machine with windows 10 bench marking cpu/gpu and found no issues what so ever, also did a memmtest and again no issues found

 

Next i started unraid in safe mode with no plugins installed and what do you know after 2 hours of tinkering no crashes even running 8 dockers.

 

so next i decided to reboot into normal mode and delete any plugins which i managed to do but before i could reboot server crashed again.

 

So so far i have only been ale to run the server in safe mode without crashes.

 

I have just rebooted the machine without any plugins to see how long it last this time around.

 

some of the other errors in the log i have found

Nov 17 13:33:40 Tower ntpd[1957]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

and something about upstream timeout on certain plugins (Before i removed them)

 

also noticed this morning before i used safe mode and removed the plugins a couple of cores where stuck at 100% and overall usage at 26% which made loading GUI pages really slow.

 

The only thing the system has is a i wouldn't say old but a power supply from an old machine. its a corsair TX650w. now i know alot of people say that could be the issue but no problems running in windows environment and i would have thought that would have stressed the system more than unraid would have.

 

Any ideas how i start to diagnose this issue as the logs dont ssem to show much around the time of the crashes.

 

I have attached some log files from last night.

tower-diagnostics-20191117-1101.zip tower-diagnostics-20191116-1812.zip

Edited by enigma27
Link to comment
1 hour ago, John_M said:

In the BIOS make sure the Power Supply Idle Control setting is changed to Typical Current Idle, not the default Low Current Idle. It can be tricky to find, so look for Advanced -> AMD CBS -> Power Supply Idle Control.

thanks i can see that setting and its set to auto

 

also 2hrs and 52 mins now without a crash with no plugins installed

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.