Mathervius Posted April 13, 2020 Share Posted April 13, 2020 Hi Guys, Server stats: Dell T610 Dual Xeon E5530 CPUs 44GB RAM UNRAID 6.8.3 Plugins: CA Auto Update Applications CA Backup / Restore CA Config Editor Custom Tab Local Master SSD TRIM System Statistics Fix common problems Nerd Tools NUT Server Layout Speed Test Theme engine Tips and Tweaks Unassigned Drives (both) User scripts I've been running UNRAID for about a year now and have no issues and it has been solid, until now! About every two days now my server will crash. I've searched the web and the only things I can really find have to do with a kernel issue that affects the NIC. I honestly have no idea because when this happens I can't access anything on the server. The monitor plugged into the server just keeps repeating the same error message over and over and the only thing I can do is hold down the power button and force a reboot. I've only been able to find similar issues that are pretty dated involving other OS. It seems like those issues have something to do with the kernel. I'm attaching a picture of the error. Apologies because I can't access any other diagnostics when this happens. Syslog and image of the error attached... tower-syslog-20200413-0241.zip Quote Link to comment
Mathervius Posted April 13, 2020 Author Share Posted April 13, 2020 Just bumping this in the hopes that someone has some input or experience with issues like this... Quote Link to comment
Dissones4U Posted April 14, 2020 Share Posted April 14, 2020 (edited) 5 hours ago, Mathervius said: bumping this in the hopes that someone has some input or experience Unfortunately I don't have experience with this and there really isn't much information provided but here are some things for you to consider until someone else chimes in... On 4/12/2020 at 10:49 PM, Mathervius said: Apologies because I can't access any other diagnostics Enable syslog server, the attached log is only a few minutes long and doesn't really show anything other than: Quote Apr 12 18:47:30 Tower kernel: traps: ffdetect[12663] general protection ip:4042af sp:7ffd6b77a880 error:0 in ffdetect[403000+c000] Apr 12 18:47:30 Tower kernel: traps: ffdetect[12664] general protection ip:4042af sp:7ffec7e115e0 error:0 in ffdetect[403000+c000] All I could find on this error (↑) was that it may be related to emby? On 4/12/2020 at 10:49 PM, Mathervius said: I've been running UNRAID for about a year now and have no issues and it has been solid, until now! It is absolutely possible that hardware is failing but, I have to ask if something has changed recently, either hardware, software (new or updated), or you may have moved the rig and knocked something loose etc. If not, then, a piece of hardware is failing (possibly the NIC). This post suggests kernel crashing may be related to setting MTU greater than 1500 (Jumbo frames). I've always used the default 1500 and nothing more so I 'm not familiar with making changes to this setting. Again, if the setting has been that way for a year then I'd question if something else has changed recently. Things to try Start in safe mode, no dockers/vms/plugins etc Enable syslog server Pull diagnostics frequently until crash and upload the most recent If your RAM is not ECC then run memtest for 24 hours and add the results along with your diagnostics You may want to consider tailing the log as well, this lets you get a screen shot of things that are not able to be written to the log before it crashes Attach your monitor and keyboard, using the command line enter Quote tail -n 30 /var/log/syslog -f This will show the last 30 lines, you can change -n or remove it (default is 10). To quit tail... Quote Ctrl + C Hopefully this will get you going in the right direction Edited April 14, 2020 by Dissones4U Quote Link to comment
Mathervius Posted April 14, 2020 Author Share Posted April 14, 2020 Thank you for getting back to me! The only recent changes are adding syncthing and tdarr_aio. They have both been running without any issues until this all started about two weeks ago. I believe you are correct about the ffdetect error being emby. That was my impression as well. My MTU is default of 1500. I'm going to setup the syslog server now. Thank you for that idea! I have tdarr_aio off for now as it isn't essential and Im going to wait a few days and see if it happens again. It seems to be about every 48hrs give or take a little. Quote Link to comment
Mathervius Posted April 21, 2020 Author Share Posted April 21, 2020 I just wanted to post a quick update about this issue. Since turning off the tdarr_aio docker there have been no crashes for 8 days, which is more like my normal experience with UNRAID. I really liked that container but for now I'm going back to my own solution. Quote Link to comment
Mathervius Posted May 27, 2020 Author Share Posted May 27, 2020 Well, the crashes are back.... I was finally able to setup Graylog since my other syslog server (unraid) wasn't capturing the issue. I've attached the logs. The other issue is that it seems Graylog didn't export everything in the exact order that it came in. Sorry about that... This came in after rebooting the server and had it running for about an hour: May 26 19:17:08 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e May 26 19:17:08 Tower kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 95 f1 ff ff be 00 02 00 00 48 May 26 19:17:08 Tower kernel: RSP: 0018:ffff8885a99c3d58 EFLAGS: 00010202 May 26 19:17:08 Tower kernel: RAX: 0000000000000188 RBX: ffff888574348500 RCX: ffff888ad98fce18 May 26 19:17:08 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e091c4 May 26 19:17:08 Tower kernel: RBP: ffff888ad98fcdc0 R08: 00000000e48f2dcb R09: ffffffff81c8aa80 May 26 19:17:08 Tower kernel: R10: 0000000000000158 R11: ffffffff81e91080 R12: 000000000000baf1 May 26 19:17:08 Tower kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 000000000000eaf0 May 26 19:17:08 Tower kernel: FS: 0000000000000000(0000) GS:ffff8885a99c0000(0000) knlGS:0000000000000000 May 26 19:17:08 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 26 19:17:08 Tower kernel: CR2: 0000146fb1114000 CR3: 0000000001e0a000 CR4: 00000000000006e0 May 26 19:17:08 Tower kernel: Call Trace: May 26 19:17:08 Tower kernel: <IRQ> May 26 19:17:08 Tower kernel: ipv4_confirm+0xaf/0xb9 May 26 19:17:08 Tower kernel: nf_hook_slow+0x3a/0x90 May 26 19:17:08 Tower kernel: ip_local_deliver+0xad/0xdc May 26 19:17:08 Tower kernel: ? ip_sublist_rcv_finish+0x54/0x54 May 26 19:17:08 Tower kernel: ip_sabotage_in+0x38/0x3e May 26 19:17:08 Tower kernel: nf_hook_slow+0x3a/0x90 May 26 19:17:08 Tower kernel: ip_rcv+0x8e/0xbe May 26 19:17:08 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 May 26 19:17:08 Tower kernel: __netif_receive_skb_one_core+0x53/0x6f May 26 19:17:08 Tower kernel: process_backlog+0x77/0x10e May 26 19:17:08 Tower kernel: net_rx_action+0x107/0x26c May 26 19:17:08 Tower kernel: __do_softirq+0xc9/0x1d7 May 26 19:17:08 Tower kernel: do_softirq_own_stack+0x2a/0x40 May 26 19:17:08 Tower kernel: </IRQ> May 26 19:17:08 Tower kernel: do_softirq+0x4d/0x5a May 26 19:17:08 Tower kernel: netif_rx_ni+0x1c/0x22 May 26 19:17:08 Tower kernel: macvlan_broadcast+0x111/0x156 [macvlan] May 26 19:17:08 Tower kernel: ? __switch_to_asm+0x41/0x70 May 26 19:17:08 Tower kernel: macvlan_process_broadcast+0xea/0x128 [macvlan] May 26 19:17:08 Tower kernel: process_one_work+0x16e/0x24f May 26 19:17:08 Tower kernel: worker_thread+0x1e2/0x2b8 May 26 19:17:08 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 May 26 19:17:08 Tower kernel: kthread+0x10c/0x114 May 26 19:17:08 Tower kernel: ? kthread_park+0x89/0x89 May 26 19:17:08 Tower kernel: ret_from_fork+0x35/0x40 May 26 19:17:08 Tower kernel: ---[ end trace b58796bea918bc16 ]--- It didn't crash the server this time though. Sorry I just don't have experience with this kind of issue... graylog-search-result-relative-0.txt Quote Link to comment
JorgeB Posted May 27, 2020 Share Posted May 27, 2020 Macvlan call traces are usually related to having dockers with a custom IP address: Quote Link to comment
Mathervius Posted May 27, 2020 Author Share Posted May 27, 2020 6 hours ago, johnnie.black said: Macvlan call traces are usually related to having dockers with a custom IP address: OK, I read through that post but my dockers don't have an IP address assigned to them. Mine are Host, Bridge, Proxynet (letsencrypt), and a VPN container. Could one of those networks cause the macvlan issue? Maybe it's because I have docker set to be able to communicate with the host network (Host access to custom networks)? Quote Link to comment
Mathervius Posted May 27, 2020 Author Share Posted May 27, 2020 After reading through a bunch of forum posts I tried putting the docker network onto its own NIC. Anytime I change the network settings I am no longer able to reach the machine over LAN. I then deleted the network.cfg and rebooted into GUI mode. I made the suggested adjustments to put docker on its own NIC and once again I lost all network connectivity. I have now adjusted eth0 to: Bonding = no, Enable bridge = yes, Bridging members of br0 = eth0. If I try and set eth0 to a static IP I lose network connectivity again. I have it set with a static IP from pfSense already but previously I had it set as static in UNRAID as well and it worked no problem. eth1, eth2, and eth3 all show as not configured now. If I make any adjustments to them I lose network connectivity and have to delete the network.cfg file and reboot in order to get connected again. The dashboard still shows that I'm using bond0, which is what it always showed before. It seems like it would match the network settings page though? Sorry for the long post but I am genuinely stuck here. Quote Link to comment
Mathervius Posted May 27, 2020 Author Share Posted May 27, 2020 I see this in the log a couple times Tower kernel: bond0: the permanent HWaddr of eth0 - MAC:ADDRESS - is still in use by bond0 - set the HWaddr of eth0 to a different address to avoid conflicts Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.