October 4, 20232 yr Hello, This morning when i woke up unraid was unresponsive until reboot, so i was not able to collect logs before rebooting. This is the second time this happened. I checked the system logs but i cant find any clues of what could be causing this. Both times i was sleeping so i did not see exactly when it did happen. Can someone point me to the right direction to troubleshoot this? Thanks!! server1a-diagnostics-20231004-0946.zip
October 4, 20232 yr Author Ohh, okay, makes sense why the logs i had were useless. I will activate the syslog server and see. Last time it hung up was like a month ago, so it will be a long time before i get those useful logs... Thanks!
October 9, 20232 yr Author It just happened again, this time I was not sleeping, i was not doing anything special with unraid when it happened. Attached the diagnostics file, this time with syslog server enabled. server1a-diagnostics-20231009-1050_latest.zip
October 9, 20232 yr Community Expert 1 hour ago, SHALcL said: this time with syslog server enabled. You need to post the separately, it does not come with the diags.
October 9, 20232 yr Author Damn, i dont have the syslogs. If you check my previous post history you will see that i have some weird network behaviour on my unraid, and this made it to not capture the logs since the last reboot... Now i'm capturing logs again, lets wait for another crash... Sorry.
October 9, 20232 yr Community Expert 4 minutes ago, SHALcL said: Damn, i dont have the syslogs. If you check my previous post history you will see that i have some weird network behaviour on my unraid, and this made it to not capture the logs since the last reboot... Now i'm capturing logs again, lets wait for another crash... Sorry. If you have the Mirror to Flash option set for the syslog server then it does not need the network working to capture a log to the flash drive in the 'logs' folder.
October 9, 20232 yr Author 1 minute ago, itimpi said: If you have the Mirror to Flash option set for the syslog server then it does not need the network working to capture a log to the flash drive in the 'logs' folder. Yeah, but i did not enable it becasue i can't casue (or i dont know how yet) the crash, and i have to leave it running for weeks to months for it to happen again, and I don't want to burn my flashdrive
October 16, 20232 yr I see this in the logs: Oct 17 00:00:13 Server1A kernel: r8169 0000:0b:00.0 eth1: RTL8168h/8111h, 22:09:5c:07:20:4f, XID 541, IRQ 81 Oct 17 00:00:13 Server1A kernel: r8169 0000:0b:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko] You have a Realtek NIC (eth0). The Realtek NICs are troublesome on Linux because the drivers are not well maintained. You are also using Jumbo Frames. This is not a good combination. Jumbo frames are discouraged bcause it is hard to set up a network to properly handle them. Do the following: Set the MTUs on all networking back to default. Reconfigure your network setup to either use eth1 as a backup to eth0 (bond with both NICs), or use eth1 only. Get your system stable, then work on network improvements a little at a time and watch for issues.
October 17, 20232 yr Author Thanks for the reply! I have the 2.5G connected to the lan, and the 10G with jumbo packets connected directly to a PC. I will try to disable jumbo packets between te server and the pc and see if it stops crashing.
October 18, 20232 yr Author It happened again and this time without jumbo packets enabled. EDIT: I think its OOM this time... server1a-diagnostics-20231019-0012.zip Edited October 18, 20232 yr by SHALcL
November 2, 20232 yr Author It happened again, this time without apparent reason. server1a-diagnostics-20231102-1150_latest.zip
November 3, 20232 yr You are still using Jumbo frames: Nov 2 11:48:57 Server1A kernel: r8169 0000:0b:00.0 eth0: RTL8168h/8111h, 22:09:5c:07:20:4f, XID 541, IRQ 47 Nov 2 11:48:57 Server1A kernel: r8169 0000:0b:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko] Recommendations: Remove Jumbo frames. You have to be sure they ar not enabled anywhere on your network. IMHO, Jumbo frame offer little improvement and are not worth the headaches. Update your gpustat plugin. Try setting up a bridge with both eth0 and eth1 in the bridge and use backup configuration. This wil allow eth1 to take over if eth0 fails. Get an Intel NIC.
November 3, 20232 yr Author Both interfaces are using the default MTU of 1500. At the other end, on my windows computer i also have jumbo packets disabled (10G NIC is a direct connection between the server and the workstation).
November 9, 20232 yr Author Hello, And happened once again... I dont have any MTU above the default 1500 and i updated everything. This freezing randomly thing is starting to get old... server1a-diagnostics-20231102-1150_latest1.zip
November 9, 20232 yr Author Just now, JorgeB said: Did you enable the syslog server as mentioned? Yes, i did
November 9, 20232 yr Author Just now, JorgeB said: Then please post it as well. Sorry, I assumed the diagnostics would already take them. syslog-127.0.0.1.log syslog-127.0.0.1_1.log syslog-127.0.0.1_2.log syslog-127.0.0.1_3.log syslog-127.0.0.1_4.log
November 9, 20232 yr Community Expert Solution On 10/9/2023 at 10:53 AM, JorgeB said: On 10/9/2023 at 9:52 AM, SHALcL said: this time with syslog server enabled. You need to post the separately, it does not come with the diags. Nov 7 23:45:46 Server1A kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Nov 7 23:45:46 Server1A kernel: ? _raw_spin_unlock+0x14/0x29 Nov 7 23:45:46 Server1A kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).
November 9, 20232 yr Author Changing this setting right away. Fingers crossed. Thanks for looking in to it! If it does not crash in a couple of weeks I will mark this as the solution. Edited November 9, 20232 yr by SHALcL
November 9, 20232 yr Community Expert Make sure you reboot after changing the setting, in case there's already been a call trace.
November 9, 20232 yr Author 41 minutes ago, JorgeB said: Make sure you reboot after changing the setting, in case there's already been a call trace. I will do it tonight. Many thanks.-
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.