Jump to content

Unraid crashing and taking down network


Recommended Posts

I recently updated to 6.9.1 and after doing so my server will become unresponsive at random intervals and my entire network will not function.  When I look at a monitor directly attached to the server, it is just a black screen.  I end up needing to power cycle the router and server

 

I have machine check events (but have had them for the life of the server and not had this issue until after the update).

 

I ran memtest and had no errors.

 

I found other related threads but didn't see any solutions besides perhaps a new NIC

 

https://forums.unraid.net/topic/59142-unraid-crashing-and-taking-down-network/

https://forums.unraid.net/topic/56574-loss-of-network-crashes-server/page/2/

https://forums.unraid.net/topic/49700-riddle-me-this-unraid-kills-my-home-network/

 

Any thoughts or help on this?  My logs are attached below

 

 

 

 

Edited by Lancebro
clarifications in first paragraph
Link to comment
  • 1 month later...

I rolled back the update and the server continued to have the same issue. 
 

I switched my Ethernet cable to a different NIC and the problem went away for almost two months. 
 

Them I updated to 6.9.2.  And the machine started crashing again. It looks like the whole network didn’t go down, just the wired devices (including the WAPs). 
 

any help?

Edited by Lancebro
Link to comment
  • 4 weeks later...

There's a RAM problem:

 

Jul 14 05:53:52 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010092
Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: TSC 26fa9a273d20a
Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: ADDR 11da50f40
Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: MISC 4405c1a86
Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1626263632 SOCKET 0 APIC 2

 

Check the board's system/ipmi event log, it should have more info on the affected DIMM, then remove or replace and see if the errors go away.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...