Server crashing randomly after hardware upgrade.


GWEST

Recommended Posts

I recently changed from an Intel I5 to Ryzen 7 2700x with a MSI x470 Gaming Pro. I also added an M.2 drive for the cache. The server was up and running fine for about 10 days doing Plex server duties with some handbrake conversions not much else. This morning the Server was unresponsive. I couldn't ping it or SSH to it. After a reboot, it came up for a few minutes then crashed again. I have tried it several times with several different steps and get the same thing it comes up for a few minutes or even an hour then bang it's down again.

 

I have C-states disabled on the MB and the newest Bios.

I added rcu_nocbs=0-15 to the syslinux configuration.

I tried starting it in safe mode.

I tried booting it and leaving the array offline.

 

No matter what I try it seems to do the same thing, up for a short time then down.

 

I'm about to put the old Intel CPU and MB back but I wanted to check here first to see if anyone has any suggestions.

syslog (3)

Link to comment

Double check that all power connectors are securely and fully inserted.  Same for the RAM modules.  Unless you have ECC memory, run memtst for 24 hours minimum (It is an option in the Boot Menu.)  Connect a monitor to the console so that you can see if there are any clues there.  (Take a picture of the screen making sure the photo is sharp and clear. )  Check the MB and CPU temperatures (with the Dynamix System Temperature plugin).

Link to comment

I did not mean to take a picture of the results of memtst but rather after the system has crashed.  Hopefully, there is will be something on the screen.  You can also install the Fix Common Problems plugin and turn on its 'Troubleshooting' mode.  That will write files to the   logs   directory of your flash drive.  Upload those logs after the crash.   With your system being new, it  could be possible that you have one of those new components in a failure mode. 

 

Running the memtst was the first step in trying to isolate the problem.  If the crashes are getting closer together, it might be advantageous to run it again. Plus, memory is one of the easiest to change out if defective (or suspect).  

Edited by Frank1940
Link to comment

What is current status?  If the  problem is still happening and it still does it in Safe Mode, you probably have a hardware problem.  Hopefully a Hardware  Guru ( like @johnnie.black )  Will have a look at that screen photo of the console monitor that you posted.  

 

Beyond that, Since the memory does not appear to be an issue, I would be looking at the remaining hardware.  Since that Ryzen 7 2700X is a relativity high power consumption chip, I would have a good look at the PSU.  (PSU's have been the culprit  in several similar problems recently.)  You may have one laying around, borrow one from a friend or see if you can 'borrow' one from a vendor with a liberal return policy. 

Link to comment

Jonnie, 

 

That was my thought as well, I reverted to my Intel I5 CPU and MB setup last night and my rig has now been running for 24 hours with no issues.

 

I'm going to return the Ryzen gear and try again in a few months.

 

Thanks to everyone for all the support. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.