Jump to content

Server goes unresponsive: Kernel panic - not syncing: Fatal exception in interrupt


Recommended Posts

A few weeks ago I upgraded from 6.9 to 6.13.  No other changes were made prior to this issue beginning.  Since then my server has suffered irregularly timed unresponsiveness. Sometimes a few hours, sometimes many days pass by before the system goes unresponsive and requires a reset/ hard boot.  I've set up remote syslog and captured a number of instances but thus far have been unable to determine a root cause.

 

I've read a number of similar posts from over the year, but many don't have any final actions taken or resolution.

I've closely followed the steps outlined here regarding AMD setups:

 

While c-states were set to auto and had no ill-effects running on 6.9 for years, I have completely disabled them.

 

Memtest determined that one of my DIMMs was in fact bad, but I've removed that DIMM and am currently experiencing the issue running on the single remaining DIMM which has passed multiple tests over a period of days.

 

As with any troubleshooting scenario I've been slow to make changes and only change one thing at a time to determine the true impact.  My next step might be to update the BIOS, but again, that one of those things that you don't really do without good reason - security patch, new functionality, or big fix.  Additionally stepping back to an older version of Unraid is always a possibility as well.

 

Assistance resolving this issue is appreciated.

 

Board, CPU and RAM are call certified compatible

https://pcpartpicker.com/b/qx7WGX

SyslogCatchAll-2023-08-30.txt tower-diagnostics-20230824-1625.zip

Link to comment
023-08-30 03:59:36    Kernel.Warning    10.0.0.8    Aug 30 03:59:37 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
2023-08-30 03:59:36    Kernel.Warning    10.0.0.8    Aug 30 03:59:37 Tower kernel: ? _raw_spin_unlock+0x14/0x29
2023-08-30 03:59:36    Kernel.Warning    10.0.0.8    Aug 30 03:59:37 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Macvlan call traces will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

Link to comment

Well that didn't last very long.  Nothing useful in the syslog, but I attached it anyway.   I've included diagnostics too, and hopefully something stands out because I'm at a loss and until the upgrade this thing was rock solid (even, apparently, with a failing DIMM).

SyslogCatchAll-2023-09-02.txt SyslogCatchAll-2023-08-31.txt SyslogCatchAll-2023-09-01.txt tower-diagnostics-20230903-2005.zip

Link to comment
  • 2 weeks later...

Yes,I did address that thread in my initial post.  While not being an issue previously I have completely disabled c-states.  I will try to flash a newer BIOS, but as you've suggested nothing suggests a hardware issue, so I'm not putting a lot of weight in that solving my problem here.  I'm less than thrilled to see nothing useful in the syslog prior to the system becoming unresponsive.  That being said, it has been running since your post, with downtime associated to me re-installing the RMA'd RAM.

Link to comment
  • 5 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...