Jump to content

Random Reboots - no information in logs


Edge9028
Go to solution Solved by Edge9028,

Recommended Posts

Hello everyone,

I've been encountering a perplexing issue with my Ryzen 5600 build, which until recently, has been performing flawlessly. Over the past few weeks, I've been plagued with random restarts, and despite my best efforts, I can't seem to pin down the cause.

Here's what I've tried so far:

 

·         Replacing RAM: I ran memtest, but it returned no errors.

·         Replacing the PSU: Thought it might be a power issue, but the problem persists.

·         Reseating the CPU and reapplying thermal compound: No change in the restart behavior.

·         Disabling Docker and VMs: I thought these might be contributing factors, but disabling them had no effect.

·         BIOS Tweaks: Following Ryzen-specific forum guides, I disabled C-states in the BIOS, but this didn't resolve the issue, XMP is also disabled and the ram is running and its default speed.

·        All drive passing SMART checks

 

The frustrating part is that the system crashes have become more frequent. Sometimes, it restarts just minutes after booting, leading to a visible "unclean shutdown" message next to the button in the GUI to start the array.

 

I've combed through the logs, adjusted numerous settings, but nothing seems to point to a clear culprit. When the server reboots, nothing is written to the logs, which would make me inclined to believe this is a hardware issue, but I am not convinced.

 

At this point, the only things I haven't tried are replacing the SATA cables and the flash drive, though these seem like long shots.

For additional context, I upgraded from Version 6.12.4 to 6.12.6, hoping for a fix, but to no avail.

 

Most recently, I have ran in in safe mode with GUI enabled, and reboots happen shortly after the array is started.  

These issues do suggest to me some kind of hardware issue, but given the hardware changes I have made, it’s difficult to conclude.

 

Perhaps this is also worth mentioning: this started to happen after I ran CA Appdata Backup for the first time - I am not suggesting this is the cause (given that this is happening in safe mode now, too), likely just coincidence.

 

Has anyone else experienced similar issues with their Ryzen-based machines? Any advice or suggestions would be greatly appreciated!

 

Link to comment

Thanks,

 

I have already replaced both ram sticks that are in the system - sorry, that was not clear in the information I provided.

 

Second power supply is good - less than a month old and has been reliable in another system.

 

I think I am just going to go ahead and buy a new mobo + cpu - spent a lot of time on this, and would have likely been more cost efficient to replace them in the end.

 

Before I do, can you advise on whether it's worth restoring the flashdrive/trying a new one? My feeling is that it's not the issue (lack of any mention of it in the logs that are available) given the problem - just want to be reasonably sure i have ruled out everything before dropping the cash on new motherboard and cpu.

 

Thanks for your help,

 

Link to comment

I backed up the exisiting drive, and move it to another flash drive. The restarts are still an issue.

 

As a last ditch attempt to diagnose this, I would like to boot the server with a fresh usb drive, just to rule out any configuration/corruption being loaded into the ram. The plan is to start it up, and see if it reboots happen.

 

If it reboots - which I am almost certain it will - I will drop the cash on the new parts, and then update this thread.

 

If I boot the sever with a fresh drive, I understand that this will have no impact on the data stored on the array, is this correct?

 

Thanks for your help

 

Link to comment

Ok,

 

So, issues persist with fresh, new USB with no hard drives connected. So, I think we can safely say that this is a hardware issue that relates to either the cpu or the motherboard. It's just unfortuante that without buying replacements, it's going to be almost impossible to conclude which one is the culrpit.

Link to comment

Update:

 

I have now replaced the motherboard. The issue persists with the same pattern of consistent restarts at approximately the two hour mark.

 

Before going ahead and replacing the cpu, I had a thought to try a different OS: I have been running Ubuntu from a live usb for many hours now, without a single restart.

 

This would lead me to belive that there is a chance that this is not hardware related and may be related to a recent update to the the Unraid OS considering i still get restarts with a fresh UNRAID USB.

 

Interested to hear your thoughts.

Link to comment

Update:

 

Ok, so this is interesting:

 

I have never had to mess around with the syslinux.cfg file, but after adding rcu_nocbs=0-11 into the syslinux.cfg file, I've not seen any restarts for more than double the maximum time I usually get.

 

I've had two months of stability since using Unraid at version 6.12.4. I did upgrade to 6.12.5 and shortly after, this is when the issue began. I missed this part out in my initial message: I actually went from 6.12.5 > 6.12.6 in a bid to address the issue at hand.

 

I will monitor closely, but if stability resumes after having made this change, it might suggest that a previous patch/fix has been missed during a merge - as this is the first time I've needed to do this.

 

It's early days, and it could fail again, soon :)

 

Regardless, I will update the thread when a more significant amount of time has passed.

 

Thanks

 

Link to comment
7 minutes ago, Edge9028 said:

Update:

 

Ok, so this is interesting:

 

I have never had to mess around with the syslinux.cfg file, but after adding rcu_nocbs=0-11 into the syslinux.cfg file, I've not seen any restarts for more than double the maximum time I usually get.

 

I've had two months of stability since using Unraid at version 6.12.4. I did upgrade to 6.12.5 and shortly after, this is when the issue began. I missed this part out in my initial message: I actually went from 6.12.5 > 6.12.6 in a bid to address the issue at hand.

 

I will monitor closely, but if stability resumes after having made this change, it might suggest that a previous patch/fix has been missed during a merge - as this is the first time I've needed to do this.

 

It's early days, and it could fail again, soon :)

 

Regardless, I will update the thread when a more significant amount of time has passed.

 

Thanks

 

 

Good to hear that.

 

By the way, rcu_nocbs=0-11 is for AMD CPU ?

Link to comment

Hello,

 

Not long after I posted this message, the server rebooted again. I am determined to get to the bottom of this, so I will flash a clean install of 6.12.4 to a usb and try it. Downloads are going exceptionally slow from the unraid website right now, but I will persist.

 

And, yes, @JackieWu this is a Ryzen processor (Ryzen 5 5600G).

 

Thanks,

Link to comment
  • 2 weeks later...
  • Solution

UPDDATE:

 

After much pain and troubleshooting, this seems to have been a case of dying CPU.

 

Once the CPU was replaced, stability has been restored to the server.

 

This is literally the first time I have ever experienced a dying CPU - I'm not even sure what causes this to happen 😑.

 

Thanks very much for you support. I will mark this matter as resoved, now.

 

 

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...