October 14, 20241 yr Hi. I had an unraid server stable for several years running 7/7 - 365 . I recently upgraded to a new more powerful hardware based on an AMD CPU. I kept on getting weekly crashes when unraid becomes totally unresponsive on the network. I already did a few changes suggested on forums (Global c-state disabled, reduce frequency of RAM just in case as I am NOT doing any overclocking anyway) but the issues are stil ocurring. This is driving me nuts. That said when the server becomes totally unresponsive it still responds to me pressing the physical power shutdown and then does a graceful shutdown. Below is an extract of the syslog just when the unresponsiveness is triggered. Any help would be appreciated. log.txt
October 14, 20241 yr Community Expert Problem with the NIC getting dropped, I assume this is onboard? Oct 14 15:28:29 Tower kernel: igc 0000:0b:00.0 eth0: PCIe link lost, device now detached
October 14, 20241 yr Solution I am dealing with the same issue. In my case running an Intel-based CPU (Gigabyte Mobo with 2.5G Intel 225 NIC onboard). Same occasional crashes, same syslog error message. About 14 days ago I deactivated all ASPM BIOS settings and all C-State settings. Also everything related to powertop tweaking in Unraid. Since then, no crash. You may give this a try yourself… Nevertheless this should not be the final solution for me, as the server consumes more energy than necessary…but to track things down I started „from scratch“. I am abroad at the moment, so I cannot tweak anything in BIOS. I will try to re-activate things as soon as I come back.
October 14, 20241 yr I'm having similar issues with server instability. Tried replacing a 5 year old flash drive but no joy. Server has to be hard reset at least once a week. This is lowering the WAF and of course I will be away from home for the next few months and unable to troubleshoot. Syslog attached if anyone could offer any insights. syslog-192.168.0.150.log
October 14, 20241 yr Community Expert There appears to be a container constantly restarting, check the up-times so see if you can find out which one it is.
October 14, 20241 yr Author 14 hours ago, JorgeB said: Problem with the NIC getting dropped, I assume this is onboard? Oct 14 15:28:29 Tower kernel: igc 0000:0b:00.0 eth0: PCIe link lost, device now detached Yes, this the Intel 2.5 Gb Ethernet NIC onboard the ROG STRIX X670E-E motherboard.
October 14, 20241 yr Author 12 hours ago, JayDee73 said: I am dealing with the same issue. In my case running an Intel-based CPU (Gigabyte Mobo with 2.5G Intel 225 NIC onboard). Same occasional crashes, same syslog error message. About 14 days ago I deactivated all ASPM BIOS settings and all C-State settings. Also everything related to powertop tweaking in Unraid. Since then, no crash. You may give this a try yourself… Nevertheless this should not be the final solution for me, as the server consumes more energy than necessary…but to track things down I started „from scratch“. I am abroad at the moment, so I cannot tweak anything in BIOS. I will try to re-activate things as soon as I come back. Thank you very much. I have just deactivated the ASPM in the BIOS (C-state was already disabled in BIOS and I have not installed powertop in Unraid). I will continue to monitor the server and revert to this forum with updates.
October 18, 20241 yr Crashes continue and now getting out of memory errors? Diagnostics and syslog post crash attached. tower-diagnostics-20241018-1148.zip syslog-192.168.0.150.log
October 18, 20241 yr Community Expert 1 hour ago, wewantrice said: Crashes continue Please start your own thread, since the OP's issue may still not be resolved, it can be confusing trying to help different users at the same time.
November 28, 20241 yr Author On 10/15/2024 at 8:16 AM, Jerem said: Thank you very much. I have just deactivated the ASPM in the BIOS (C-state was already disabled in BIOS and I have not installed powertop in Unraid). I will continue to monitor the server and revert to this forum with updates. Unraid has not crashed for 1 month and 10 days after I deactivated the ASPM in the BIOS! Thank you @JayDee73! ASPM deactivation was the key BIOS parameter that solved the issue. For reference, you can see in the screenshot attached all the BIOS parameters values that I end up changing compared to the BIOS default values.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.