(Solved) Unraid keeps freezing or restarting and I can't figure out why.


Recommended Posts

For the past couple of weeks I've occasionally been finding that my server has rebooted or is unresponsive. (The past 1 week it's been basically unusable.) As far as I can tell there is no single-thing that causes this to happen. It will just happen anywhere between 10 minutes to 5 hours. I have tried many different things in attempt to narrow down what the issue is, but nothing fixes the problem. Before I list everything I've done so far I want to ask about something I found in a syslog file. (For some reason it only just crossed my mind to search for "error" in the log files.)

.... node  #0, CPUs:        #1  #2  #3
mce: [Hardware Error]: Machine check events logged
mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000000000108
mce: [Hardware Error]: TSC 0 ADDR 1ffff816f2e4a MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1615731921 SOCKET 0 APIC 6 microcode 8001138
 #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15
smp: Brought up 1 node, 16 CPUs

And another time:

.... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14
mce: [Hardware Error]: Machine check events logged
mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 5: bea0000000000108
mce: [Hardware Error]: TSC 0 ADDR 1ffff8152f996 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1615699503 SOCKET 0 APIC d microcode 8001138
#15
smp: Brought up 1 node, 16 CPUs

 

Is this a good indicator that my CPU is failing? I do have a CPU in my main desktop that will work in the server (the motherboard was just in my desktop until recently.), is it worth the effort to install it in my server for now? Or am I somehow misinterpreting this message?

Extra note: Some variation of these same messages have appeared in all of the log files I just searched in.

 

Solution was found here: 

 

Edited by RBoots
Issue was solved
Link to comment
6 minutes ago, Squid said:

During initialization of the cores, certain combinations of hardware issue an MCE.  Can be safe to ignore.

 

What you really want to do is configure the syslog server (mirror to flash), and after your next crash post the syslog generated and a set of diagnostics

I already have that turned on. I guess I don't really know what I'm looking for, but it doesn't seem to show any errors before it freezes. There's just a gap from the normal logs, to the next boot.

If it doesn't appear to be related to a CPU issue, should I go on with listing the things I've tried to fix the problem?

Link to comment
On 3/15/2021 at 4:56 AM, JorgeB said:

That appears to have fixed my issues. My server has been on for over 17 hours now, which is much longer than it has been for the past week. (Maybe I should give it another day before I declare it officially fixed.) At least it was an easy fix, I just wish I came across this much earlier. Thanks for your help

  • Like 1
Link to comment
  • RBoots changed the title to (Solved) Unraid keeps freezing or restarting and I can't figure out why.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.