Lancebro Posted April 24, 2021 Share Posted April 24, 2021 (edited) I recently updated to 6.9.1 and after doing so my server will become unresponsive at random intervals and my entire network will not function. When I look at a monitor directly attached to the server, it is just a black screen. I end up needing to power cycle the router and server I have machine check events (but have had them for the life of the server and not had this issue until after the update). I ran memtest and had no errors. I found other related threads but didn't see any solutions besides perhaps a new NIC https://forums.unraid.net/topic/59142-unraid-crashing-and-taking-down-network/ https://forums.unraid.net/topic/56574-loss-of-network-crashes-server/page/2/ https://forums.unraid.net/topic/49700-riddle-me-this-unraid-kills-my-home-network/ Any thoughts or help on this? My logs are attached below Edited July 14, 2021 by Lancebro clarifications in first paragraph Quote Link to comment
Lancebro Posted June 16, 2021 Author Share Posted June 16, 2021 (edited) I rolled back the update and the server continued to have the same issue. I switched my Ethernet cable to a different NIC and the problem went away for almost two months. Them I updated to 6.9.2. And the machine started crashing again. It looks like the whole network didn’t go down, just the wired devices (including the WAPs). any help? Edited June 16, 2021 by Lancebro Quote Link to comment
JorgeB Posted June 16, 2021 Share Posted June 16, 2021 Enable syslog mirror to flash then post that log after a crash. 1 Quote Link to comment
Lancebro Posted July 14, 2021 Author Share Posted July 14, 2021 (edited) Thanks! Here is the syslog I see a bunch of machine check events that seem to be similar but the server runs memtest fine. Any advice? Edited July 14, 2021 by Lancebro Quote Link to comment
JorgeB Posted July 14, 2021 Share Posted July 14, 2021 There's a RAM problem: Jul 14 05:53:52 Tower kernel: mce: [Hardware Error]: Machine check events logged Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010092 Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: TSC 26fa9a273d20a Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: ADDR 11da50f40 Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: MISC 4405c1a86 Jul 14 05:53:52 Tower kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1626263632 SOCKET 0 APIC 2 Check the board's system/ipmi event log, it should have more info on the affected DIMM, then remove or replace and see if the errors go away. 1 Quote Link to comment
Lancebro Posted July 14, 2021 Author Share Posted July 14, 2021 Wow! I had a bunch for P1-DIMMC1 so I moved some RAM around and took out two sticks. I'll watch it for a while. Thank you so much for your help!!! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.