navigat0 Posted August 31, 2023 Share Posted August 31, 2023 A few weeks ago I upgraded from 6.9 to 6.13. No other changes were made prior to this issue beginning. Since then my server has suffered irregularly timed unresponsiveness. Sometimes a few hours, sometimes many days pass by before the system goes unresponsive and requires a reset/ hard boot. I've set up remote syslog and captured a number of instances but thus far have been unable to determine a root cause. I've read a number of similar posts from over the year, but many don't have any final actions taken or resolution. I've closely followed the steps outlined here regarding AMD setups: While c-states were set to auto and had no ill-effects running on 6.9 for years, I have completely disabled them. Memtest determined that one of my DIMMs was in fact bad, but I've removed that DIMM and am currently experiencing the issue running on the single remaining DIMM which has passed multiple tests over a period of days. As with any troubleshooting scenario I've been slow to make changes and only change one thing at a time to determine the true impact. My next step might be to update the BIOS, but again, that one of those things that you don't really do without good reason - security patch, new functionality, or big fix. Additionally stepping back to an older version of Unraid is always a possibility as well. Assistance resolving this issue is appreciated. Board, CPU and RAM are call certified compatible https://pcpartpicker.com/b/qx7WGX SyslogCatchAll-2023-08-30.txt tower-diagnostics-20230824-1625.zip Quote Link to comment
JorgeB Posted August 31, 2023 Share Posted August 31, 2023 023-08-30 03:59:36 Kernel.Warning 10.0.0.8 Aug 30 03:59:37 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan] 2023-08-30 03:59:36 Kernel.Warning 10.0.0.8 Aug 30 03:59:37 Tower kernel: ? _raw_spin_unlock+0x14/0x29 2023-08-30 03:59:36 Kernel.Warning 10.0.0.8 Aug 30 03:59:37 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)). Quote Link to comment
navigat0 Posted August 31, 2023 Author Share Posted August 31, 2023 I did try this, but maybe I wasn't properly prepared for the change because after changing the setting the docker service would not start. I can try again. Are there any specific steps other than changing the docker option in setting with the docker service stopped? Quote Link to comment
JorgeB Posted August 31, 2023 Share Posted August 31, 2023 Usually that's it, and it should not affect the service start, but post the diags if it does. Quote Link to comment
navigat0 Posted August 31, 2023 Author Share Posted August 31, 2023 Odd. Made the change again. S L O W E R this time, revisiting the pages after each change, and the service did start. I'll start the clock on uptime again and report if another failure occurs. Quote Link to comment
navigat0 Posted September 4, 2023 Author Share Posted September 4, 2023 Well that didn't last very long. Nothing useful in the syslog, but I attached it anyway. I've included diagnostics too, and hopefully something stands out because I'm at a loss and until the upgrade this thing was rock solid (even, apparently, with a failing DIMM). SyslogCatchAll-2023-09-02.txt SyslogCatchAll-2023-08-31.txt SyslogCatchAll-2023-09-01.txt tower-diagnostics-20230903-2005.zip Quote Link to comment
JorgeB Posted September 4, 2023 Share Posted September 4, 2023 Nothing logged suggests a hardware issue, has this been taken care of? https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Quote Link to comment
navigat0 Posted September 12, 2023 Author Share Posted September 12, 2023 Yes,I did address that thread in my initial post. While not being an issue previously I have completely disabled c-states. I will try to flash a newer BIOS, but as you've suggested nothing suggests a hardware issue, so I'm not putting a lot of weight in that solving my problem here. I'm less than thrilled to see nothing useful in the syslog prior to the system becoming unresponsive. That being said, it has been running since your post, with downtime associated to me re-installing the RMA'd RAM. Quote Link to comment
Richy1989 Posted February 23 Share Posted February 23 Hey navigat0, did you find a soloution? Do you know what you problem was back then? I seem to have the same problem right now. Thanks for the help! Best Regards Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.