Jump to content
Hoopster

[6.5.0]+ Call Traces when assigning IP to Dockers

66 posts in this topic Last Reply

Recommended Posts

Currently using a mix of everything:

bridge

host

br1 (custom IP)

br1.3 (VLAN)

but not using br0 or br0.x

and so far no call traces...

 

current NICs are

Intel I211 (igb) - eth0

Intel I219-V (e1000e) -eth1

 

Share this post


Link to post

So I finally decided that using ( br0 which unraid runs on ) with VLAN's is not a good idea for me.

 

Since I have more ethernet adapters on my server, I decided to use eth4 and put all my docker VLAN's on that.

 

It's fairly early, but I'm pretty sure that I have gotten rid of my call traces.

 

One note, you have to set interface IPv4 address assignment to None.  If you don't do that, unraid will assign an arbitrary ip address.  But the bigger problem especially with my network is that you will constantly loose and regain an ip address on the main VLAN interface from the dhcp server.  This would occur randomly between 15 minutes to two hours.

 

So my final setup for the VLAN is as follows:

 

1168687376_ScreenShot2018-06-12at5_06_47PM.thumb.png.894343d1adbc3c4c38e10eb3c874e3a8.png

 

And I have a docker pool setup as:

 

889359873_ScreenShot2018-06-12at5_11_04PM.thumb.png.719f4e76755f9aa0a69c6202b5008fb6.png

 

Hope this helps anyone with VLAN problems.

 

I will report my results back after a few days.

Share this post


Link to post

You probably want to set the VLAN IPv4 address to none too.

Why?

* if you don't want unRAID to be accessible directly on that VLAN

* dockers won't be able to access unRAID on that IP, while unRAID will try to talk/respond to the dockers using the directly attached IP

 

Share this post


Link to post

Okay, I think I will try that.

 

I decided to update to 6.5.3 and 2 hours later I got another call trace.

 

That could be the reason for the macvlan broadcast call traces.

Edited by Limy

Share this post


Link to post
1 hour ago, ken-ji said:

You probably want to set the VLAN IPv4 address to none too.

Why?

* if you don't want unRAID to be accessible directly on that VLAN

* dockers won't be able to access unRAID on that IP, while unRAID will try to talk/respond to the dockers using the directly attached IP

 

Hi ken-ji,

 

Can you be a bit more specific.  When I set IPv4 address assignment: to None on my example VLAN 34, I cannot then start the dockers and br4.34 is no longer available.

 

Thanks.

Share this post


Link to post

Hi ken-ji,

 

Posting that helped a lot and I was able to figure out my issue.  Apparently I needed to assign a pool to br4 as well in order for br4.34 to be available.  So my docker settings now look as follows:

 

2056989612_ScreenShot2018-06-13at8_47_45AM.thumb.png.d9ebdda34c17df3a6e341daf9427857b.png

 

Just out of curiosity, are you actually using 192.168.2.XXX for anything in your system.  Or did you just set this up for isolation.

 

Thanks for you help.  I will see if this rids me of the call traces.

Share this post


Link to post

Odd. You shouldn't need to since from the docker networking point of view br4 should be independent of br4.34

I'll look into stopping my array to help check that out or maybe @bonienl can confirm.

 

192.168.2.xxx is my main network.

I've assigned it to br1 so as to allow any dockers running on br1 to be able to access unraid (nginx reverse proxy, etc) and vice versa, otherwise the macvlan security limitation kicks in.

 

you can just nuke the docker network on br0 and place it in br1. which might be why I never see call traces :D

Share this post


Link to post

Main interface (eth4) and VLAN interface (br4.34) work completely independent from each other. And each may have different settings.

 

As @ken-ji suggests the easiest approch is to uncheck br0 under docker settings and assign the same network settings to either a VLAN interface or another physical interface.

If you don't specify a DHCP pool then docker will do DHCP assignment which may interfere with your router DHCP server.

 

Share this post


Link to post

Thanks @ken-ji and @bonienl.

 

Well, after reading what you wrote, I’m wondering if there is some sort of bug / glitch that caused br4.34 to be available only after I setup a pool on br4.

 

At any rate on my system br0 192.168.19.xxx is my main unsaid network.  I have some dockers running on that which are operating in bridge mode.

 

So @bonienl are you inferring that in the Docker Settings a pool should always be assigned otherwise Docker is running a DHCP server and is competing with my routers DHCP server?

 

By the way, ever since I setup br4 as shown in my earlier post, my system has totally settled down. I’m not actually using 192.168.2.xxx in my system / router at all.

 

I’m going to leave my system going for a couple more days to see if the call traces are resolved.  If so, I can do some more experimenting.

Share this post


Link to post
6 hours ago, Limy said:

in the Docker Settings a pool should always be assigned otherwise Docker is running a DHCP server and is competing with my routers DHCP server

 

Docker itself is unaware of the IP addresses handed out by your regular DHCP server. If no explicit DHCP pool is configured, docker will simply start with the first IP address in the range (usually .1) as assignment. This may or may not interfere depending on your current IP assignments.

 

It is safer to set up a docker DHCP pool, which is in a range not used by your router or other devices.

 

Share this post


Link to post
1 hour ago, bonienl said:

 

Docker itself is unaware of the IP addresses handed out by your regular DHCP server. If no explicit DHCP pool is configured, docker will simply start with the first IP address in the range (usually .1) as assignment. This may or may not interfere depending on your current IP assignments.

 

It is safer to set up a docker DHCP pool, which is in a range not used by your router or other devices.

 

Okay, thanks for clarifying.  BTW with my current setup no call traces so far with broadcast messages and macvlan.

 

I should note that one other change I adopted from @ken-ji‘s setup, is that I also setup the Network Protocol to IPv4 + IPv6 for both the interface and the VLAN which I cannot really see that resolving any issue.  I just thought I should mention it.

Share this post


Link to post

Yeah. that shouldn't have any impact either way.

I left mine enabled, because I'm studying IPv6 deployments, but since my ISP doesn't have IPv6 yet, I need to be sure I understand what I'm doing as it will a break the entire internet from my network's point of view.

Share this post


Link to post

So I have been running my system since Wednesday of last week and I no longer have any call traces on macvlan.

 

I may try the next suggestion and place 192.168.19.xxx with a DHCP pool on br4.

 

Right now, I’m very happy that I can eliminate the call traces.

 

*** Just an update on this.  No more call traces in my system as of July 4th.  I would say my problems have been resolved. *** :)

Edited by Limy
Update

Share this post


Link to post

This issue still exists in 6.7.0 it seems as I get same call stack quite regularly with br0

May 25 05:34:30 SERVER kernel: WARNING: CPU: 8 PID: 53 at net/netfilter/nf_nat_core.c:420 nf_nat_setup_info+0x6b/0x5fb [nf_nat]
May 25 05:34:30 SERVER kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT xt_nat ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap veth ipt_MASQUERADE iptable_nat nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod nct7904 mlx4_en mlx4_core igb i2c_algo_bit sr_mod cdrom nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel mpt3sas aes_x86_64 crypto_simd cryptd drm isci libsas ahci glue_helper raid_class mxm_wmi ftdi_sio agpgart i2c_i801 libahci scsi_transport_sas intel_cstate i2c_core usbserial intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops intel_rapl_perf wmi pcc_cpufreq
May 25 05:34:30 SERVER kernel: button [last unloaded: mlx4_core]
May 25 05:34:30 SERVER kernel: CPU: 8 PID: 53 Comm: kworker/8:0 Tainted: P        W  O      4.19.41-Unraid #1
May 25 05:34:30 SERVER kernel: Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.2 09/23/2016
May 25 05:34:30 SERVER kernel: Workqueue: events macvlan_process_broadcast [macvlan]
May 25 05:34:30 SERVER kernel: RIP: 0010:nf_nat_setup_info+0x6b/0x5fb [nf_nat]
May 25 05:34:30 SERVER kernel: Code: 48 89 fb 48 8b 87 80 00 00 00 49 89 f7 41 89 d6 76 04 0f 0b eb 0b 85 d2 75 07 25 80 00 00 00 eb 05 25 00 01 00 00 85 c0 74 07 <0f> 0b e9 ac 04 00 00 48 8b 83 90 00 00 00 4c 8d 64 24 30 48 8d 73
May 25 05:34:30 SERVER kernel: RSP: 0018:ffff88a03f803c80 EFLAGS: 00010202
May 25 05:34:30 SERVER kernel: RAX: 0000000000000080 RBX: ffff889fec261e00 RCX: 0000000000000000
May 25 05:34:30 SERVER kernel: RDX: 0000000000000000 RSI: ffff88a03f803d6c RDI: ffff889fec261e00
May 25 05:34:30 SERVER kernel: RBP: ffff88a03f803d58 R08: ffff889fec261e00 R09: ffff889fea44c000
May 25 05:34:30 SERVER kernel: R10: 0000000000000098 R11: ffff888fdb108000 R12: ffff88a01fbaaf00
May 25 05:34:30 SERVER kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff88a03f803d6c
May 25 05:34:30 SERVER kernel: FS:  0000000000000000(0000) GS:ffff88a03f800000(0000) knlGS:0000000000000000
May 25 05:34:30 SERVER kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 25 05:34:30 SERVER kernel: CR2: 000015226c473de0 CR3: 0000000001e0a001 CR4: 00000000000606e0
May 25 05:34:30 SERVER kernel: Call Trace:
May 25 05:34:30 SERVER kernel: <IRQ>
May 25 05:34:30 SERVER kernel: ? fib_validate_source+0xaf/0xd6
May 25 05:34:30 SERVER kernel: ? ipt_do_table+0x58e/0x5db [ip_tables]
May 25 05:34:30 SERVER kernel: ? ip_route_input_slow+0x616/0x7cb
May 25 05:34:30 SERVER kernel: nf_nat_alloc_null_binding+0x6f/0x86 [nf_nat]
May 25 05:34:30 SERVER kernel: nf_nat_inet_fn+0xa0/0x192 [nf_nat]
May 25 05:34:30 SERVER kernel: nf_hook_slow+0x37/0x96
May 25 05:34:30 SERVER kernel: ip_local_deliver+0xa7/0xd5
May 25 05:34:30 SERVER kernel: ? ip_sublist_rcv_finish+0x53/0x53
May 25 05:34:30 SERVER kernel: ip_rcv+0x9e/0xbc
May 25 05:34:30 SERVER kernel: ? ip_rcv_finish_core.isra.0+0x2e2/0x2e2
May 25 05:34:30 SERVER kernel: __netif_receive_skb_one_core+0x4d/0x69
May 25 05:34:30 SERVER kernel: process_backlog+0x7e/0x116
May 25 05:34:30 SERVER kernel: net_rx_action+0x10b/0x274
May 25 05:34:30 SERVER kernel: __do_softirq+0xce/0x1e2
May 25 05:34:30 SERVER kernel: do_softirq_own_stack+0x2a/0x40
May 25 05:34:30 SERVER kernel: </IRQ>
May 25 05:34:30 SERVER kernel: do_softirq+0x4d/0x59
May 25 05:34:30 SERVER kernel: netif_rx_ni+0x1c/0x22
May 25 05:34:30 SERVER kernel: macvlan_broadcast+0x10f/0x153 [macvlan]
May 25 05:34:30 SERVER kernel: ? __switch_to_asm+0x34/0x70
May 25 05:34:30 SERVER kernel: macvlan_process_broadcast+0xd5/0x131 [macvlan]
May 25 05:34:30 SERVER kernel: process_one_work+0x16e/0x24f
May 25 05:34:30 SERVER kernel: ? pwq_unbound_release_workfn+0xb7/0xb7
May 25 05:34:30 SERVER kernel: worker_thread+0x1dc/0x2ac
May 25 05:34:30 SERVER kernel: kthread+0x10b/0x113
May 25 05:34:30 SERVER kernel: ? kthread_park+0x71/0x71
May 25 05:34:30 SERVER kernel: ret_from_fork+0x35/0x40
May 25 05:34:30 SERVER kernel: ---[ end trace 13d4ebd815b46078 ]---

At least server stopped freezing completely (though not sure if it's 6.7.0 or that I installed mcelog)

 

Share this post


Link to post
On 5/26/2019 at 4:06 PM, Auxilium said:

This issue still exists in 6.7.0 it seems as I get same call stack quite regularly with br0

This is not really an unRAID issue in the sense that Limetech could release a version that "solves" it.  This appears to be an issue with the macvlan implementation in Docker which only shows up with certain server and/or LAN hardware and configurations.

 

It is really hard to tell what causes the issue.  Perhaps it is not macvlan/Docker at fault at all.  Maybe it is the hardware not properly handling network broadcast messages in certain LAN/VLAN configurations.

 

There are just too many variables.  For me the fix is to avoid br0 and set up a VLAN (br0.3) in which the problem disappears.

 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.