Jump to content
Hoopster

[6.5.0]+ Call Traces when assigning IP to Dockers

64 posts in this topic Last Reply

Recommended Posts

Posted (edited)

Description:

Since it has happened three times now and each time after assigning an IP address to a docker, that is fairly good evidence that, at least on my system, I am unable to assign a separate IP address to any docker.

 

I have tried assigning an IP address to the following dockers:

 

UniFI

OpenVPN-AS

Pi-Hole

 

Every time, call traces appeared in the syslog hours after the IP address assignment and continued until I either uninstalled the docker or removed the IP address assignment and let it go back to using the unRAID host IP.

 

How to Reproduce:  Assign a static IP address to a docker

 

Expected Result:  No call traces and the system functions normally

 

Actual Result: ip/macvlan call traces appear in the syslog

 

Other Information: The latest call traces started after installing Pi-Hole on the evening of March 25.  The call traces started on the 26th and several have been generated every day since then. In every case, all call traces disappear if the ip address assigned to the docker is removed and the docker goes back to using host ip address.

 

Perhaps I will try one of the 6.5.1 RCs and see if any changes in kernel, etc. impact this.

 

Diagnostics attached.

 

medianas-diagnostics-20180328-0747.zip

Edited by Hoopster

Share this post


Link to post

I updated my main server (where the call traces are occurring) to 6.5.1 RC2 on the off chance that a kernel change would resolve the call trace issue.  Although it took a little longer for them to reappear. the call traces are back and, once, again, are related to ip addressing, macvlan, etc.

 

My next step will be to remove Pi-Hole and see if the call traces disappear.  I am fairly confident that they will.

 

Latest diagnostics attached.

 

 

medianas-diagnostics-20180330-0242.zip

Share this post


Link to post
Posted (edited)

@limetech @bonienl  Since this is obviously not a general defect with unRAID/docker networking I don't expect any resolution from you.  However, I will continue to experiment and look for a resolution on my own should this be helpful to anyone now or in the future.

 

I updated the server to 6.5.1 RC3 three days ago and, so far, have only seen one call trace.  It is still the usual suspect (IP/macvlan), but the call traces appear to have lessened in frequency, or, my recent usage patterns have not triggered them.

 

Since it appears my experience with call traces when assigning IP address to dockers it not shared by other users, perhaps it is a hardware issue unique to my server.  unRAID/docker LAN is currently on eth0.  My MB has two NICs.  Perhaps i will try assigning the dockers to eth1 and see how that affects the issue.  I could try bonding as well. I don't know if any of that will help, but, I suppose it is worth a try.  I believe the GUI/unRAID is tied to eth0, correct?

 

Quote

Apr  1 01:16:52 MediaNAS kernel: CPU: 0 PID: 15111 Comm: kworker/0:2 Not tainted 4.14.31-unRAID #1
Apr  1 01:16:52 MediaNAS kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P2.50 12/12/2017
Apr  1 01:16:52 MediaNAS kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Apr  1 01:16:52 MediaNAS kernel: task: ffff880759889d00 task.stack: ffffc90008cc8000
Apr  1 01:16:52 MediaNAS kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x4d6
Apr  1 01:16:52 MediaNAS kernel: RSP: 0018:ffff88086dc03d30 EFLAGS: 00010202
Apr  1 01:16:52 MediaNAS kernel: RAX: 0000000000000188 RBX: 000000000000c318 RCX: 0000000000000001
Apr  1 01:16:52 MediaNAS kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81c09260
Apr  1 01:16:52 MediaNAS kernel: RBP: ffff880721881700 R08: 0000000000000101 R09: ffff8807daca1700
Apr  1 01:16:52 MediaNAS kernel: R10: 0000000000000098 R11: 0000000000000000 R12: ffffffff81c8b080
Apr  1 01:16:52 MediaNAS kernel: R13: 0000000000007caa R14: ffff88037f170a00 R15: ffff88037f170a58
Apr  1 01:16:52 MediaNAS kernel: FS:  0000000000000000(0000) GS:ffff88086dc00000(0000) knlGS:0000000000000000
Apr  1 01:16:52 MediaNAS kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  1 01:16:52 MediaNAS kernel: CR2: 00003bcfb3cd2000 CR3: 0000000001c0a005 CR4: 00000000003606f0
Apr  1 01:16:52 MediaNAS kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr  1 01:16:52 MediaNAS kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr  1 01:16:52 MediaNAS kernel: Call Trace:
Apr  1 01:16:52 MediaNAS kernel: <IRQ>
Apr  1 01:16:52 MediaNAS kernel: ipv4_confirm+0xac/0xb4 [nf_conntrack_ipv4]
Apr  1 01:16:52 MediaNAS kernel: nf_hook_slow+0x37/0x96
Apr  1 01:16:52 MediaNAS kernel: ip_local_deliver+0xab/0xd3
Apr  1 01:16:52 MediaNAS kernel: ? inet_del_offload+0x3e/0x3e
Apr  1 01:16:52 MediaNAS kernel: ip_rcv+0x311/0x346
Apr  1 01:16:52 MediaNAS kernel: ? ip_local_deliver_finish+0x1b8/0x1b8
Apr  1 01:16:52 MediaNAS kernel: __netif_receive_skb_core+0x6ba/0x733
Apr  1 01:16:52 MediaNAS kernel: ? enqueue_task_fair+0x94/0x42c
Apr  1 01:16:52 MediaNAS kernel: process_backlog+0x8c/0x12d
Apr  1 01:16:52 MediaNAS kernel: net_rx_action+0xfb/0x24f
Apr  1 01:16:52 MediaNAS kernel: __do_softirq+0xcd/0x1c2
Apr  1 01:16:52 MediaNAS kernel: do_softirq_own_stack+0x2a/0x40
Apr  1 01:16:52 MediaNAS kernel: </IRQ>
Apr  1 01:16:52 MediaNAS kernel: do_softirq+0x46/0x52
Apr  1 01:16:52 MediaNAS kernel: netif_rx_ni+0x21/0x35
Apr  1 01:16:52 MediaNAS kernel: macvlan_broadcast+0x117/0x14f [macvlan]
Apr  1 01:16:52 MediaNAS kernel: ? __switch_to_asm+0x24/0x60
Apr  1 01:16:52 MediaNAS kernel: macvlan_process_broadcast+0xe4/0x114 [macvlan]
Apr  1 01:16:52 MediaNAS kernel: process_one_work+0x14c/0x23f
Apr  1 01:16:52 MediaNAS kernel: ? rescuer_thread+0x258/0x258
Apr  1 01:16:52 MediaNAS kernel: worker_thread+0x1c3/0x292
Apr  1 01:16:52 MediaNAS kernel: kthread+0x111/0x119
Apr  1 01:16:52 MediaNAS kernel: ? kthread_create_on_node+0x3a/0x3a
Apr  1 01:16:52 MediaNAS kernel: ? SyS_exit_group+0xb/0xb
Apr  1 01:16:52 MediaNAS kernel: ret_from_fork+0x35/0x40
Apr  1 01:16:52 MediaNAS kernel: Code: 48 c1 eb 20 89 1c 24 e8 24 f9 ff ff 8b 54 24 04 89 df 89 c6 41 89 c5 e8 a9 fa ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 03 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 
Apr  1 01:16:52 MediaNAS kernel: ---[ end trace f8aa7c492ea55664 ]---

 

Edited by Hoopster

Share this post


Link to post
Posted (edited)

I am now running 6.5.1 RC5 on main server and backup.

 

At least one other user is experiencing the same call traces with an IP address assigned to the Pihole docker.  He has completely different hardware than I do (Supermicro server, Xeon E5) so, I am less certain this is a hardware issue unique to my system.

 

I had to disable Pihole this morning.  Overnight it had completely locked up my unRAID server (perhaps due to the ever increasing number of call traces generated).  Since Pihole was my DNS, the whole network was inaccessible.  I had to hard reset the unRAID server since even the GUI was locked up. 

 

There was a Pihole update last night which I applied after rebooting everything as the previous update was causing many issues for many users - perhaps it was the cause of my problems as well.  Still, ip/macvlan call traces were being generated regularly in the syslog.  I have now disabled the Pihole docker and reset my router DNS back to what it was prior to installing Pihole.  I am sure the call traces will go away as well.  Not the solution I want, but, it is the only one available to me now.

 

EDIT: Disabling Pihoie, as expected, put an end to the call traces.

Edited by Hoopster

Share this post


Link to post
Posted (edited)

On to the next attempt to track down the cause and eliminate these ip/macvlan call traces.  My server has two integrated NICs (Intel i210 and Intel i219-LM).  LAN1 is the i210 and LAN2 is the i219-LM.  I have both enabled in the BIOS but I only had a LAN cable connected to LAN1.  In unRAID Network Settings, LAN1 showed up as eth0 and LAN2 as eth1.  LAN2/eth1 had no settings configured as nothing was connected to it.

 

On the assumption this may be hardware related, I have done the following:

 

Step 1 (as a control step) - disable Pihole docker for 24 hours and reset DNS in router to not use Pihole; result - NO call traces.  This confirms the call traces are coming from Pihole with its IP address assignment.

 

Step 2 - disable LAN2 and re-enable Pihole as DNS in router; result - call traces occurred after an hour or two using Pihole (I again disabled Pihole for 16 hours and call traces ceased)

 

Step 3 - disable LAN1 in BIOS and enable LAN2 (now eth0 in unRAID), re-enable Pihole as DNS; results pending as server has currently been running in this configuration for ~1 hour

 

 I also disabled all unused MB and security features in the BIOS.

 

UPDATE:  Now at 28 hours without a call trace.  I am not yet ready to declare victory, but, it looks promising; at least more so than anything else I have tried.

Edited by Hoopster

Share this post


Link to post

After changing the NIC from LAN1 to LAN 2, the system ran without ip/macvlan call traces for over 3 1/2 days.  That's a new record; however the old familiar call trace returned just after updating to 6.5.1 RC6.  I have since rebooted the server and will monitor for the return of ip/macvlan call traces.

Share this post


Link to post
Posted (edited)

It's been about 2 1/2 days since the last reboot and the server just experienced another macvlan call trace.  It looks like the change of NIC and/or unRAID 6.5.1.RC6 is not a cure; although the frequency of call traces has diminished.

 

UPDATE:  Apr. 19: Another identical call trace has occurred less than 24 hours after the last one.  It almost seems that one a call trace is generated, subsequent traces begin to come with more frequency.

Edited by Hoopster

Share this post


Link to post
Posted (edited)

@limetech @bonienl

 

It is now clear that I am not the only one getting macvlan call traces on my server and these are not just related to my specific hardware. Below are three reports of the same in last couple of weeks/days. We all have very different hardware configurations. I am sure there are others. One one occasion, the call traces came every hour and I eventually had to reboot the server.  These only occur when an IP address is assigned to a docker container.

 

This entire thread documents my efforts to isolate and resolve it on my own.  The best I have been able do is eliminate the frequency of macvlan call traces by changing the MB NIC which unRAID/br0 uses.  I am now running 6.5.1 RC6 on my server.

 

Since you cannot reproduce, I am happy tp keep digging around on my own, but, if you have any guidance regarding things I can test or information I can provide, that would be helpful.

 

Overall the incidence of call traces for a variety of reasons seems to be increasing among the general unRAID user community.

 

Additional reports of macvlan call traces when assigning an IP address to a docker container:

 

 

 

 

Edited by Hoopster

Share this post


Link to post

It seems most (all?) of your call traces come from pi-hole. Let's start with comparing the settings of pi-hole and see if there are noticeable differences.

 

I am running pi-hole on a separate interface (br1) and have given it a fixed IP address (10.0.101.99). This IP address is used both as management and DNS address.

My router is the DHCP server, and it has a configuration to tell that clients in my network must use 10.0.101.99 as DNS server.

Since pi-hole internally only knows interface "eth0", it is told to use and listen only to that interface.

pi-hole information is stored on my cache pool (/mnt/cache/appdata/pihole)

 

image.thumb.png.77148548664ed9dcf0ddafcd839bb8a6.png

 

 

Share this post


Link to post
Posted (edited)
1 hour ago, bonienl said:

It seems most (all?) of your call traces come from pi-hole. Let's start with comparing the settings of pi-hole and see if there are noticeable differences.

 

I am running pi-hole on a separate interface (br1) and have given it a fixed IP address (10.0.101.99). This IP address is used both as management and DNS address.

My router is the DHCP server, and it has a configuration to tell that clients in my network must use 10.0.101.99 as DNS server.

Since pi-hole internally only knows interface "eth0", it is told to use and listen only to that interface.

pi-hole information is stored on my cache pool (/mnt/cache/appdata/pihole)

 

image.thumb.png.77148548664ed9dcf0ddafcd839bb8a6.png

 

 

 

I am getting call traces only from Pihole because it is the only docker to which I currently have a separate IP address assigned.  I got the same call traces when I assigned an IP address to UniFi and OpenVPN-AS.  I have since removed the IP address assignments on those dockers and only Pihole has its own IP address:

 

I am using the br0 network and have assigned IP address 192.168.1.100 to Pihole which is both the admin and DNS address.  My router (Ubiquiti USG) is the DHCP server and it is configured to tell LAN clients that Pihole is the DNS.

image.png.7c0dbd5d7be821171ac016ac238b39c5.png

 

 

Here is my Pihole config:

image.thumb.png.882d654561f5c3e75fb8ea110fa10ea3.png

 

The differences I see are the specified interface which is br0 (not eth0; although they physically share the same NIC) and Pihole is listening on all interfaces.  Should I change those both to eth0?  I followed the Spaceinvader One video guide for setting up Pihole on unRAID (as I suspect many have) and I do not believe changing those variables was mentioned; however, perhaps it is necessary?

 

Again, some tweaks here may result in the elimination of Pihole generated macvlan call traces, but, I did experience them with other dockers as well.  Admittedly, that was with unRAID 6.4.0/1 and perhaps they would be less prevalent in 6.5.0/1.

 

Thanks for your assistance.

Edited by Hoopster

Share this post


Link to post
Posted (edited)
5 hours ago, bonienl said:

Since pi-hole internally only knows interface "eth0", it is told to use and listen only to that interface.

 

I see the macvlan call traces seem to be associated with a macvlan broadcast.  macvlan_process_broadcast seems to come just before each call trace and the broadcast is referenced in the trace. Is this because the docker INTERFACE variable is br0 instead of eth0 and it is listening on all interfaces instead of being restricted to eth0?

 

UPDATE: I have gone ahead and changed INTERFACE and DNSMASQ_LISTENING variables from br0 to eth0.  We'll see if that changes anything.

Edited by Hoopster

Share this post


Link to post

Have you tried pluging in another cable to eth1 (without assigning an IP to it) and migrating the docker custom networking to br1 instead of br0?

I don't have pi-hole, but after the kernel bug regarding TCP resets was patched, I don't get call traces at all, and I'm the guy who started the entire assigning IPs to dockers back in 6.3 :D

 

Share this post


Link to post
Posted (edited)
22 minutes ago, ken-ji said:

Have you tried pluging in another cable to eth1 (without assigning an IP to it) and migrating the docker custom networking to br1 instead of br0?

I don't have pi-hole, but after the kernel bug regarding TCP resets was patched, I don't get call traces at all, and I'm the guy who started the entire assigning IPs to dockers back in 6.3 :D

 

No, that I have not tried, but, it's easy enough to do.  I will give that a shot and report results.

Edited by Hoopster

Share this post


Link to post
Posted (edited)
On 4/19/2018 at 12:30 PM, bonienl said:

It seems most (all?) of your call traces come from pi-hole. Let's start with comparing the settings of pi-hole and see if there are noticeable differences.

 

I am running pi-hole on a separate interface (br1) and have given it a fixed IP address (10.0.101.99). This IP address is used both as management and DNS address.

My router is the DHCP server, and it has a configuration to tell that clients in my network must use 10.0.101.99 as DNS server.

Since pi-hole internally only knows interface "eth0", it is told to use and listen only to that interface.

pi-hole information is stored on my cache pool (/mnt/cache/appdata/pihole

 

@bonienl I changed the configuration of my Pihole docker to match yours by setting the INTERFACE and DNSMASQ_LISTENING variables from br0 to eth0.  The server ran for 4 1/2 days with no call trace.  Yesterday, I updated the server to unRAID 6.5.1 final.  It ran an additional 24 hours until another call trace appeared.

 

I'll try next what ken-ji suggested and move pihole docker to br1.  I have enabled eth1 and cabled that to my switch without assigning a static IP address to eth1.

 

Apr 24 16:25:44 MediaNAS kernel: ------------[ cut here ]------------
Apr 24 16:25:44 MediaNAS kernel: WARNING: CPU: 0 PID: 3316 at net/netfilter/nf_conntrack_core.c:769 __nf_conntrack_confirm+0x97/0x4d6
Apr 24 16:25:44 MediaNAS kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap macvlan xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod i915 iosf_mbi drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 hwmon_vid igb i2c_algo_bit e1000e ptp pps_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate ie31200_edac mxm_wmi intel_uncore i2c_i801 i2c_core intel_rapl_perf video ahci wmi libahci backlight acpi_pad button [last unloaded: pps_core]
Apr 24 16:25:44 MediaNAS kernel: CPU: 0 PID: 3316 Comm: kworker/0:0 Not tainted 4.14.35-unRAID #1
Apr 24 16:25:44 MediaNAS kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P2.50 12/12/2017
Apr 24 16:25:44 MediaNAS kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Apr 24 16:25:44 MediaNAS kernel: task: ffff880757fa4880 task.stack: ffffc90008570000
Apr 24 16:25:44 MediaNAS kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x4d6
Apr 24 16:25:44 MediaNAS kernel: RSP: 0018:ffff88086dc03d30 EFLAGS: 00010202
Apr 24 16:25:44 MediaNAS kernel: RAX: 0000000000000188 RBX: 0000000000000d46 RCX: 0000000000000001
Apr 24 16:25:44 MediaNAS kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81c08b68
Apr 24 16:25:44 MediaNAS kernel: RBP: ffff88030cde6100 R08: 0000000000000101 R09: ffff8806f4a8cb00
Apr 24 16:25:44 MediaNAS kernel: R10: 0000000000000098 R11: 0000000000000000 R12: ffffffff81c8b0c0
Apr 24 16:25:44 MediaNAS kernel: R13: 000000000000c15a R14: ffff88068f5d57c0 R15: ffff88068f5d5818
Apr 24 16:25:44 MediaNAS kernel: FS:  0000000000000000(0000) GS:ffff88086dc00000(0000) knlGS:0000000000000000
Apr 24 16:25:44 MediaNAS kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 24 16:25:44 MediaNAS kernel: CR2: 00002104472f0000 CR3: 0000000001c0a002 CR4: 00000000003606f0
Apr 24 16:25:44 MediaNAS kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 24 16:25:44 MediaNAS kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 24 16:25:44 MediaNAS kernel: Call Trace:
Apr 24 16:25:44 MediaNAS kernel: <IRQ>
Apr 24 16:25:44 MediaNAS kernel: ipv4_confirm+0xac/0xb4 [nf_conntrack_ipv4]
Apr 24 16:25:44 MediaNAS kernel: nf_hook_slow+0x37/0x96
Apr 24 16:25:44 MediaNAS kernel: ip_local_deliver+0xab/0xd3
Apr 24 16:25:44 MediaNAS kernel: ? inet_del_offload+0x3e/0x3e
Apr 24 16:25:44 MediaNAS kernel: ip_rcv+0x311/0x346
Apr 24 16:25:44 MediaNAS kernel: ? ip_local_deliver_finish+0x1b8/0x1b8
Apr 24 16:25:44 MediaNAS kernel: __netif_receive_skb_core+0x6ba/0x733
Apr 24 16:25:44 MediaNAS kernel: ? enqueue_task_fair+0x94/0x42c
Apr 24 16:25:44 MediaNAS kernel: process_backlog+0x8c/0x12d
Apr 24 16:25:44 MediaNAS kernel: net_rx_action+0xfb/0x24f
Apr 24 16:25:44 MediaNAS kernel: __do_softirq+0xcd/0x1c2
Apr 24 16:25:44 MediaNAS kernel: do_softirq_own_stack+0x2a/0x40
Apr 24 16:25:44 MediaNAS kernel: </IRQ>
Apr 24 16:25:44 MediaNAS kernel: do_softirq+0x46/0x52
Apr 24 16:25:44 MediaNAS kernel: netif_rx_ni+0x21/0x35
Apr 24 16:25:44 MediaNAS kernel: macvlan_broadcast+0x117/0x14f [macvlan]
Apr 24 16:25:44 MediaNAS kernel: ? __switch_to_asm+0x24/0x60
Apr 24 16:25:44 MediaNAS kernel: macvlan_process_broadcast+0xe4/0x114 [macvlan]
Apr 24 16:25:44 MediaNAS kernel: process_one_work+0x14c/0x23f
Apr 24 16:25:44 MediaNAS kernel: ? rescuer_thread+0x258/0x258
Apr 24 16:25:44 MediaNAS kernel: worker_thread+0x1c3/0x292
Apr 24 16:25:44 MediaNAS kernel: kthread+0x111/0x119
Apr 24 16:25:44 MediaNAS kernel: ? kthread_create_on_node+0x3a/0x3a
Apr 24 16:25:44 MediaNAS kernel: ? SyS_exit_group+0xb/0xb
Apr 24 16:25:44 MediaNAS kernel: ret_from_fork+0x35/0x40
Apr 24 16:25:44 MediaNAS kernel: Code: 48 c1 eb 20 89 1c 24 e8 24 f9 ff ff 8b 54 24 04 89 df 89 c6 41 89 c5 e8 a9 fa ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 03 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 
Apr 24 16:25:44 MediaNAS kernel: ---[ end trace e7a3347acd2d207f ]---

 

Edited by Hoopster

Share this post


Link to post

Does it make a difference when you limit pi-hole to listen to a single interface only (see settings -> dns) ?

 

image.png.2116605256012f21e043f647d823f257.png

Share this post


Link to post
8 hours ago, bonienl said:

Does it make a difference when you limit pi-hole to listen to a single interface only (see settings -> dns) ?

 

image.png.2116605256012f21e043f647d823f257.png

 

That is the current setting.  I  assume it has always been set to eth0 only as I have never changed it.

 

I will likely assign an IP address to a couple of other dockers to continue testing, I don't think this is a Pihole-specific issue as I got the call traces with other dockers as well.. 

 

It's been 36 hours since my last reboot (not associated with call traces), so, i will probably have to wait a few days to know if any changes I have made with docker networking make a difference.

Share this post


Link to post

@bonienl I assigned some other dockers (DelugeVPN, Handbrake, Dolphin) their own IP address on br0 as a test.  Although it took a couple of days, the server again experienced several macvlan call traces over a period of several hours.

 

I have now created a docker VLAN and have DelugeVPN, Handbrake, Dolphin and Heimdall dockers running on br0.3 (192.168.3.x) with static IP addresses.  Pihole is still running on br0 with a static IP of 192.168.1.100.  I suppose for a true test I should really move Pihole to br0.3 as well, but, since it is the DNS for the 192.168.1.x LAN I'll first have to create some routing and firewall rules in UnifFi.  That will be the next step; move Pihole to br0.3.

 

I seriously doubt it makes any difference, but, I have always assigned static IP addresses to the dockers as opposed to letting Docker assign them from a docker VLAN DHCP pool.

Share this post


Link to post
Posted (edited)

I have now taken Pi-hole on br0 out of the picture.  I am running it on a Raspberry Pi instead of as a docker.  It was too inconvenient that when my unRAID server was down, the Internet, for all intents and purposes, was also inaccessible.  Putting Pihole on the Pi took care of that.  I have Heimdall running on br0 and the rest as either bridge or host for the moment.  We'll see if just Heimdall on br0 with a static IP results in macvlan call traces as other dockers have.

Edited by Hoopster

Share this post


Link to post

After running Heimdall as the only docker on br0 with others assigned to br0.3, call traces again appeared after a day or so.  I now have no dockers at all running on br0 and have assigned just Heimdall and Handbrake to br0.3.  Hopefully a docker VLAN as opposed to host bridge will yield different results with static docker IP addresses.

Share this post


Link to post

I am having a similar event. I use a Plex docker which is set to DHCP for IP assignment on the br0 interface.

 

Quote

------------[ cut here ]------------
WARNING: CPU: 1 PID: 7475 at net/netfilter/nf_conntrack_core.c:769 __nf_conntrack_confirm+0x97/0x4d6
Modules linked in: xt_nat macvlan ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod i915 iosf_mbi i2c_algo_bit drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops mlx4_en mlx4_core ptp pps_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf ahci libahci i2c_i801 i2c_core nvme video mxm_wmi wmi_bmof nvme_core wmi backlight acpi_pad button [last unloaded: mlx4_core]
CPU: 1 PID: 7475 Comm: kworker/1:2 Tainted: G        W       4.14.35-unRAID #1
Hardware name: System manufacturer System Product Name/STRIX Z270G GAMING, BIOS 1203 12/25/2017
Workqueue: events macvlan_process_broadcast [macvlan]
task: ffff8805ccb30e80 task.stack: ffffc90003d54000
RIP: 0010:__nf_conntrack_confirm+0x97/0x4d6
RSP: 0018:ffff88086ec43d30 EFLAGS: 00010202
RAX: 0000000000000188 RBX: 000000000000dccb RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81c0892c
RBP: ffff8806e31a0600 R08: 0000000000000101 R09: ffff880157ba7400
R10: 0000000000000098 R11: 0000000000000000 R12: ffffffff81c8b0c0
R13: 000000000000405b R14: ffff8806ec86a780 R15: ffff8806ec86a7d8
FS:  0000000000000000(0000) GS:ffff88086ec40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000154e463cc000 CR3: 0000000001c0a001 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
ipv4_confirm+0xac/0xb4 [nf_conntrack_ipv4]
nf_hook_slow+0x37/0x96
ip_local_deliver+0xab/0xd3
? inet_del_offload+0x3e/0x3e
ip_rcv+0x311/0x346
? ip_local_deliver_finish+0x1b8/0x1b8
__netif_receive_skb_core+0x6ba/0x733
? mlx4_en_rx_irq+0x23/0x3e [mlx4_en]
process_backlog+0x8c/0x12d
net_rx_action+0xfb/0x24f
__do_softirq+0xcd/0x1c2
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq+0x46/0x52
netif_rx_ni+0x21/0x35
macvlan_broadcast+0x117/0x14f [macvlan]
macvlan_process_broadcast+0xe4/0x114 [macvlan]
process_one_work+0x14c/0x23f
? rescuer_thread+0x258/0x258
worker_thread+0x1c3/0x292
kthread+0x111/0x119
? kthread_create_on_node+0x3a/0x3a
? SyS_exit_group+0xb/0xb
ret_from_fork+0x35/0x40
Code: 48 c1 eb 20 89 1c 24 e8 24 f9 ff ff 8b 54 24 04 89 df 89 c6 41 89 c5 e8 a9 fa ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 03 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 
---[ end trace 5e99d938594ea448 ]---

No plugins installed, only modprobe the i915 module for hardware transcoding. Plex is the only Docker installed and running. 

Share this post


Link to post
1 hour ago, bcjenkins said:

I am having a similar event. I use a Plex docker which is set to DHCP for IP assignment on the br0 interface.

 

No plugins installed, only modprobe the i915 module for hardware transcoding. Plex is the only Docker installed and running. 

 

@bonienl

In my case, I think I have isolated the problem to br0.  If I have even one docker with an IP address on br0, I will get call traces.  As an experiment, I setup a docker VLAN (br0.3) and assigned a couple of dockers to the 192.168.3.x subnet while removing everything from br0.  It has been almost 12 days now with zero call traces.  I thought that might have to do with dockers generating little network traffic, so, I restarted my Pihole docker and assigned it to br0.3 as well.  So far, it is two days in with zero call traces on br0.3 (needs a bit more time to be sure).

 

When I had dockers (any docker) assigned to br0, the longest the system went without a call trace was four days.  After running for a few more days with all dockers on br0.3, I will move one or two back to br0 as a final test.  If I get call traces on that interface within a few days, that will solidify my conclusion that br0 is the source of call traces on my system.

 

I think the other users (at least a half dozen or so) reporting call traces with docker IPs also had them on br0.

 

I would prefer to have dockers on br0 and the same subnet as the rest of my LAN, but, if that is not possible, br0.3 appears to work without generating call traces even though it isolates those dockers completely from the host unRAID system.

Share this post


Link to post

Macvlan call traces are definitely caused by assigning IP addresses to dockers on br0 on my system.  With several dockers assigned IP addresses on br0.3 (docker VLAN), unRAID ran for over two weeks without a single call trace.  Within two days of assigning a couple back to br0 as a control test, call traces again appeared.

 

I think it is safe to say, I only get call traces with dockers (any docker) assigned IP addresses on br0.

Share this post


Link to post

As a test I have now moved a number of my containers from br1 (separate interface) to br0 (shared interface). All these containers have a fixed IP address.

 

image.png.4f20642c4cbf6a29eb551cf89fdbd3ee.png

 

See how that goes...

Share this post


Link to post
Posted (edited)
29 minutes ago, bonienl said:

As a test I have now moved a number of my containers from br1 (separate interface) to br0 (shared interface). All these containers have a fixed IP address.

 

image.png.4f20642c4cbf6a29eb551cf89fdbd3ee.png

 

See how that goes...

 

It will work fine for you and you will never see call traces.  That's always the way it goes, right?  ?

 

Both br0 and br0.3, of course, are shared interfaces.  I get call traces on br0 but not on br0.3.  I have never tried assigning any dockers to br1.  I have a second NIC, so, perhaps I should try that.  My guess is the system would be call trace free on br1 as well, but, I will verify.

Edited by Hoopster

Share this post


Link to post
7 minutes ago, Hoopster said:

Both br0 and br0.3, of course, are shared interfaces

 

They share the same physical interface, but in terms of networking they are logically separated.

Initially I was thinking it has something to do with your ethernet driver, but obviously both br0 and br0.3 use the same driver. Making the mystery bigger...

 

10 minutes ago, Hoopster said:

It will work fine for you and you will never see call traces

 

I never had call traces before, and initially I started with br0, but I have more containers running now, including the "problematic" pi-hole container. 

So far so good (=no traces), but I let it run a couple of days longer, to be more conclusive.

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now