Investigating random reboot after upgrade


Roang

Recommended Posts

Hey guys, 

 

I'm trying to figure out why my server is randomly restarting after an upgrade. 

 

I've first noticed this happening about a month ago when I replaced my existing cache drive with a 980 PRO 1tb (from 950 evo 500gb SSD) . 

It might be worth noting that there was already an NVME in the MOBO(AX370-K7) only slot so I moved that out to a PCIE NVME Adapter (NVME is SX8200 PRO 1tb)

https://www.pbtech.co.nz/product/ENCOEM0010/kingshare-KS-NVX401-M-KEY-Nvme-M2-SSD-to-PCI-e-X4

The issue happened 2 days after the upgrade however it was fine for a month afterwards.

 

Coming back to now, I've made further upgrade to the server by adding 1 additional data drive. 

It is also probably worth noting that I used this time to clean my build with pressured air such as below
https://www.pbtech.co.nz/product/TOLSTO1040/Dynamix-CK-AD400-400ml-Air-Duster-Non-Flammable-hi

 

The first strange symptom I saw was that the server refused to start (nothing happened when power button was pressed)

I plugged out the power cable a few times and tried making some plugs inside more tight in case something went loose.

Eventually the power button worked.

Now within the last day after the upgrade, the server has restarted twice already. 

As per the recommendations, I've gathered the logs and have attached it.

From the quick glance I gave, there doesn't seem to be anything useful in the moment of failure unfortunately.

 

I'm wondering if my PSU is slowly dying or maybe it doesn't have enough power but based on calculations, I don't think this is it.

The PSU I'm using is 550W Corsair one. 

The build currently has 9 HDD of which 1 is being used as Parity and rest Data drive.

980 Pro is used as cache and there is one SX8200 1TB used as unassigned device.

The build also has LSI SATA SAS 9211-8i for additional SATA slots. 

Build is on 1st gen Ryzen 1700 so it also has a random graphic card attached to it. Possibly a GT 720 

MOBO is on AX370 K7.

Memory is a weird hybrid with combination of 1 16GB ram and 2 8GB ram. 

 

Unraid is currently on 6.5.3

 

Would appreciate any feedback guys.

 

Thanks

 

 

 

 

 

 

 

 

 

 

FCPsyslog_tail.txt tower-diagnostics-20211108-2241.zip

Edited by Roang
Link to comment
3 minutes ago, JorgeB said:

Start by upgrading to latest, also check this if you haven't yet.

 

Thanks for the info @jorgeB

I'm aware of the C6 state issue with Ryzen 1st gen so I've made necessary measures to prevent this.

With those measures in place, the system was very stable managing to get up to 6 months uptime only to be stopped manually for adding HDD.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.