thecode

Community Developer
  • Posts

    155
  • Joined

  • Last visited

Report Comments posted by thecode

  1. 4 minutes ago, bonienl said:

    I see what is happening

    When you create a static IP address + gateway without explicitely setting a metric value, then a default route with metric 0 is created for br0. This is indeed wrongly displayed in the GUI as value 1 (a display error).

     

    When creating the interface shim-br0, another default route with metric 0 is added, but this fails because two times the same metric is not allowed and hence there is a missing default route for the shim-br0 interface as a result, and this causes communication loss to the outside world whenever the shim-interface is used.

     

    The solution is indeed to set a metric value other then zero for the br0 gateway when a static IP assignment is used and "host access" is enabled.

     

    Need to think of a way to make this clear to the user and avoid this situation.

     

    Thanks for all the testing.

     


    Many thanks for your helping me debugging this issue. I think that a first step (but not urgent) would be to fix the GUI to reduce confusion. To fixed it, maybe if metric is not explicitly set for br0 it can be automatically set to 1 when the script creates "shim-br0"

     

    About making it clear to the user, since the 6.12 Known issues points to your excellent post, I would suggest to edit the post and suggest to first try ipvlan with metric set to 1 for br0.

     

     

    And thanks again 👍

  2. To test my assumption about the interface metric I have manually set the interface metric to 1.

    image.thumb.png.6580d0e480530835aab12bd8cbf9d31b.png

     

    With this set to 1 my routing table in the GUI now looks like this:

    image.thumb.png.855787f4e7cafc870f18927fbd65bacd.png

     

    output from route:

    image.png.91703e099b2ca5a7898e805607062c95.png

     

    There may be one little GUI bug since the GUI shows metric as "1" while route shows 0 or 1.

    With this setting I think the issue of the incomplete arp is fixed for me. I will let it run for 1-2 days to test.

  3. 2 minutes ago, bonienl said:

     

    To me this confirms your router is the source of the problem.

     


    This doesn't make any sense, how would setting a static IP in unraid (with the same IP) related to the router? If a problem would be when DHCP is used it can explain, but when setting a static IP the router should not have any effect.

     

    I have another unraid server which is set to "macvlan" which doesn't suffer from this (with a static IP) and more than 100 network devices and only this server suffer from this problem. I have also tried to setup a linux machine with docker and manually create shim interface (using https://blog.oddbit.com/post/2018-03-12-using-docker-macvlan-networks/) and it did not have any problem. Only this server has the incomplete arp problem and only with ipvlan.

    I also can't understand why would a switch/router that has both servers connected to it will be missing the arp for one server. I can capture traffic directly on the router to check.

    I am not the only one having this, not sure if this helps but there is a thread about it on reddit:

     

  4. Since @bonienl mentioned my config is missing "SYSNICS", renamed the network.cfg file and rebooted the server. This doesn't make any problem since the server IPv4 is also reserved on the router DHCP.

     

    After reboot now I noticed two changes:

    1. br0 interface metric changed from 1 to 1006

    2. broadcast address for br0 changed from 0.0.0.0 to 192.168.x.255

    image.thumb.png.5bac74c2795d6ac85dcf88c3dfc3d81c.png

    I was already suspecting that having both shim-br0 and br0 using the same metric can cause a problem with routing (thous maybe creating the incomplete arp table after server is accessing an internet resource) 

     

    Checking my other server (which uses macvlan for now) both br0 and shim-br0 metric is set to 1. I can't change settings on this server since it runs the house now but I assume even with macvlan it should have different metric for br0.

     

    For now I did not succeed to reproduce the incomplete arp, before this change it happen within few minutes. I still have the problematic config file, I can try to switch back to it and find the problematic setting if it can help others. I also wonder if it worth switching to macvlan and check if it makes any changes there now.

     

    Note: My system doesn't have a network.cfg file now, since I deleted it and did not create any changes from the GUI.

  5. 53 minutes ago, bonienl said:

    Looks like your router gets confused by the shim-br0 interface.

    Test again with host acces to custom network disabled.

     

    Disabling host access to custom networks does solve the issue but create another issue which I haven't found a solution  yet (I have 30 dockers on a custom network and 2 on br0).

     

    37 minutes ago, bonienl said:

    Did you modify your "network.cfg" file manually?

    It is missing the last entry, which should be

    SYSNICS="1"

    No I did not modify it, this is interesting since I have 3 servers and all of them are missing it. What is the meaning of this parameter?


     

    40 minutes ago, bonienl said:

    Tip: since you are using a single interface, you can disable bonding.

    The 2nd interface was used to make a direct connection between 2 unraid servers. My setup has two servers which act as Active/Passive so if one falls I can switch to the other one. Due to the network issues I physically disconnected the link between them and let the passive server take control since it doesn't suffer from the macvlan issue.
    I will try to disable bonding.


     

    22 minutes ago, bonienl said:

    One other thing.

    Do you change interface settings using the "Tips and Tweaks" plugin?

    If yes, disable this (or temporary uninstall the plugin) and retest.

     

    That was my first guess when starting to debug this issue, to isolate it I have uninstalled the plugin, tested that there is no effect and installed it again only changing the cpu governor to powersave without any other changes. Looking at the config file everything else there is set to "default"

     

  6. Array is stopping correctly with this release and I did not detect any problem, I have also enabled IPv6 just to test it (although I don't enable IPv6 on my servers by default) and webgui loads correctly.

     

    Two issues (which are not mentioned to be fixed) are still there:

     

    So currently at least for me Docker is not useable yet.

  7. 16 minutes ago, thecode said:

    I am having the same issue with 6.12.1, I was using macvlan but changed it to ipvlan since I had kernel errors and recommended to change docker to use macvlan.

     

    Server has a static IPv4, IPv6 disabled, after reboot everything work well for a while, than server can't access the network (pings from a console to internal network fails), routing looks valid. Server is accessible from then network.

     

    Diagnostics attached

     

    tower-diagnostics-20230626-0049.zip


    Update: Just for testing I have stopped the array to change docker network type to macvlan, stopping the array (which also stops docker services) is enough to get back network access without reboot.

    with macvlan the kernel errors are back:

    Jun 25 14:59:06 Tower kernel: ------------[ cut here ]------------
    Jun 25 14:59:06 Tower kernel: WARNING: CPU: 15 PID: 0 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm macvlan md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_nat xt_tcpudp veth ipvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ipmi_devintf nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igb wmi_bmof amd64_edac edac_mce_amd edac_core ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ipmi_ssif nvme sha512_ssse3 backlight aesni_intel crypto_simd cryptd rapl agpgart k10temp i2c_piix4 nvme_core i2c_algo_bit syscopyarea joydev i2c_core input_leds ahci
    Jun 25 14:59:06 Tower kernel: sysfillrect sysimgblt led_class fb_sys_fops libahci acpi_ipmi wmi ipmi_si button acpi_cpufreq unix [last unloaded: md_mod]
    Jun 25 14:59:06 Tower kernel: CPU: 15 PID: 0 Comm: swapper/15 Tainted: P           O       6.1.34-Unraid #1
    Jun 25 14:59:06 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P4.20 04/14/2021
    Jun 25 14:59:06 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
    Jun 25 14:59:06 Tower kernel: RSP: 0018:ffffc900004ec838 EFLAGS: 00010202
    Jun 25 14:59:06 Tower kernel: RAX: 0000000000000001 RBX: ffff888151b68400 RCX: 1fd18afb63311d0b
    Jun 25 14:59:06 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888151b68400
    Jun 25 14:59:06 Tower kernel: RBP: 0000000000000001 R08: e8344bd8ece47fd3 R09: 3d43667ad226f28c
    Jun 25 14:59:06 Tower kernel: R10: 00afe721eab120e0 R11: ffffc900004ec800 R12: ffffffff82a11440
    Jun 25 14:59:06 Tower kernel: R13: 000000000000b05e R14: ffff88817f02cf00 R15: 0000000000000000
    Jun 25 14:59:06 Tower kernel: FS:  0000000000000000(0000) GS:ffff889fbebc0000(0000) knlGS:0000000000000000
    Jun 25 14:59:06 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jun 25 14:59:06 Tower kernel: CR2: 0000000000537d30 CR3: 0000000151966000 CR4: 0000000000350ee0
    Jun 25 14:59:06 Tower kernel: Call Trace:
    Jun 25 14:59:06 Tower kernel: <IRQ>
    Jun 25 14:59:06 Tower kernel: ? __warn+0xab/0x122
    Jun 25 14:59:06 Tower kernel: ? report_bug+0x109/0x17e
    Jun 25 14:59:06 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: ? handle_bug+0x41/0x6f
    Jun 25 14:59:06 Tower kernel: ? exc_invalid_op+0x13/0x60
    Jun 25 14:59:06 Tower kernel: ? asm_exc_invalid_op+0x16/0x20
    Jun 25 14:59:06 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: ? nf_nat_inet_fn+0xc0/0x1a8 [nf_nat]
    Jun 25 14:59:06 Tower kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
    Jun 25 14:59:06 Tower kernel: nf_hook_slow+0x3d/0x96
    Jun 25 14:59:06 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
    Jun 25 14:59:06 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9
    Jun 25 14:59:06 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
    Jun 25 14:59:06 Tower kernel: ip_sabotage_in+0x52/0x60 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: nf_hook_slow+0x3d/0x96
    Jun 25 14:59:06 Tower kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
    Jun 25 14:59:06 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9
    Jun 25 14:59:06 Tower kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
    Jun 25 14:59:06 Tower kernel: __netif_receive_skb_one_core+0x77/0x9c
    Jun 25 14:59:06 Tower kernel: netif_receive_skb+0xbf/0x127
    Jun 25 14:59:06 Tower kernel: br_handle_frame_finish+0x438/0x472 [bridge]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: br_nf_hook_thresh+0xe5/0x109 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: br_nf_pre_routing_finish+0x2c1/0x2ec [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: ? NF_HOOK.isra.0+0xe4/0x140 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: br_nf_pre_routing+0x236/0x24a [br_netfilter]
    Jun 25 14:59:06 Tower kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
    Jun 25 14:59:06 Tower kernel: br_handle_frame+0x27a/0x2e0 [bridge]
    Jun 25 14:59:06 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
    Jun 25 14:59:06 Tower kernel: __netif_receive_skb_core.constprop.0+0x4fd/0x6e9
    Jun 25 14:59:06 Tower kernel: __netif_receive_skb_list_core+0x8a/0x11e
    Jun 25 14:59:06 Tower kernel: netif_receive_skb_list_internal+0x1d2/0x20b
    Jun 25 14:59:06 Tower kernel: gro_normal_list+0x1d/0x3f
    Jun 25 14:59:06 Tower kernel: napi_complete_done+0x7b/0x11a
    Jun 25 14:59:06 Tower kernel: igb_poll+0xd88/0xf8e [igb]
    Jun 25 14:59:06 Tower kernel: __napi_poll.constprop.0+0x2b/0x124
    Jun 25 14:59:06 Tower kernel: net_rx_action+0x159/0x24f
    Jun 25 14:59:06 Tower kernel: __do_softirq+0x129/0x288
    Jun 25 14:59:06 Tower kernel: __irq_exit_rcu+0x5e/0xb8
    Jun 25 14:59:06 Tower kernel: common_interrupt+0x9b/0xc1
    Jun 25 14:59:06 Tower kernel: </IRQ>
    Jun 25 14:59:06 Tower kernel: <TASK>
    Jun 25 14:59:06 Tower kernel: asm_common_interrupt+0x22/0x40
    Jun 25 14:59:06 Tower kernel: RIP: 0010:cpuidle_enter_state+0x11d/0x202
    Jun 25 14:59:06 Tower kernel: Code: 17 39 a0 ff 45 84 ff 74 1b 9c 58 0f 1f 40 00 0f ba e0 09 73 08 0f 0b fa 0f 1f 44 00 00 31 ff e8 39 f8 a4 ff fb 0f 1f 44 00 00 <45> 85 e4 0f 88 ba 00 00 00 48 8b 04 24 49 63 cc 48 6b d1 68 49 29
    Jun 25 14:59:06 Tower kernel: RSP: 0018:ffffc900001cfe98 EFLAGS: 00000246
    Jun 25 14:59:06 Tower kernel: RAX: ffff889fbebc0000 RBX: ffff8881086d4c00 RCX: 0000000000000000
    Jun 25 14:59:06 Tower kernel: RDX: 00004f3fbe4d29c7 RSI: ffffffff8209093c RDI: ffffffff82090e45
    Jun 25 14:59:06 Tower kernel: RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000002
    Jun 25 14:59:06 Tower kernel: R10: 0000000000000020 R11: 000000000000afc8 R12: 0000000000000002
    Jun 25 14:59:06 Tower kernel: R13: ffffffff823235a0 R14: 00004f3fbe4d29c7 R15: 0000000000000000
    Jun 25 14:59:06 Tower kernel: ? cpuidle_enter_state+0xf7/0x202
    Jun 25 14:59:06 Tower kernel: cpuidle_enter+0x2a/0x38
    Jun 25 14:59:06 Tower kernel: do_idle+0x18d/0x1fb
    Jun 25 14:59:06 Tower kernel: cpu_startup_entry+0x1d/0x1f
    Jun 25 14:59:06 Tower kernel: start_secondary+0xeb/0xeb
    Jun 25 14:59:06 Tower kernel: secondary_startup_64_no_verify+0xce/0xdb
    Jun 25 14:59:06 Tower kernel: </TASK>
    Jun 25 14:59:06 Tower kernel: ---[ end trace 0000000000000000 ]---

     

  8. I am having the same issue with 6.12.1, I was using macvlan but changed it to ipvlan since I had kernel errors and recommended to change docker to use macvlan.

     

    Server has a static IPv4, IPv6 disabled, after reboot everything work well for a while, than server can't access the network (pings from a console to internal network fails), routing looks valid. Server is accessible from then network.

     

    Diagnostics attached

     

    tower-diagnostics-20230626-0049.zip

  9. 4 hours ago, Eugene D said:

    I don't know who can fix this but I'm also having this issue with a TrueNAS VM (The only one I have at this time). I noticed that when the VM is active in the vm tab under graphics it says "vnc:5900" (when off it says "vnc:auto") and when i click on open remote vnc in the address bar it has a spot that looks like it's trying to connect to 5700. I have no clue if this is the issue or not or how/where to change this.

     

    It's not that I use this too often since TrueNAS has it's own web UI but would be nice to have working in case i have to access it or I try another VM if this isn't an isolated issue.

     

    Edit: Unraid Pro 6.10.3 now on a Supermicro X10DRI-T4+ board in my SC846 Chassis. Previously in a Chenbro NR12000 Chassis with a Tyan board, same issue before and after chassis/system swap.

    This is not related to the issue here, you will get better support creating a new post

  10. So I added a fake cookie (named it "test"), deleted the ca_data and started increasing its payload until the VNC session no longer connects.

    When total of all cookies is 3159 bytes it still works, increasing the cookie by one byte to 3160 breaks VNC and reducing it again by 1 byte fixes VNC. 

    So the limit at least on my system is 3159, now need to figure out where does this limit comes from

     

    EDIT: also checked without SSL, the limit that still work without SSL is 3522

  11. 56 minutes ago, Squid said:

    I can't see it being CA's cookies which cause this.

     

    image.png

     

    And that biggie from CA should always be <500 bytes

    It is not the cookie itself, there is no problem with the size of it my ca_data cookie is 438 bytes while the rxd-init which I first suggested removing is 541 bytes. The total size of all cookies on my system right now which already breaks VNC is 2816 bytes so surely all of  them does not even reach the limit for a single cookie.

     

    However the combination of the size of the cookies breaks something. I will try to find the limit by editing a cookie and reducing the size until it starts working.

     

    @Squid just to be clear this is not CA related. There are also other big cookies. I showed this as an example since it is easy to delete the CA data and let CA create it again, while for example deleting the rxd-init is not easy to reproduce back the problem since it starts small and get larger with time.

     

  12. The only problem is that it requires doing it every few days. I have also identified that it is not specifically the `rxd-init` but probably a combination of big cookies.

    I know now how to reproduce it every time without clearing cache, but still not how to solve it.

    For example community apps also adds a big cookie to store it's settings so deleting it (ca_data) also  fix the problem, going to CA settings brings back the cookie which will again break VNC.

    The error even hints that something happens during read:

    recv() failed (104: Connection reset by peer) while reading upstream

    So I guess it would be possible to increase the read buffer somewhere, but this would have to be looked by someone from @limetech

  13. 8 minutes ago, itimpi said:

    Have you tried clearing your browser caches?  This seems to be needed after the upgrade to fix this issue.

    Yes I did, see my explanation at 

    I linked this so we make 1 discussion instead of two.

    This one should be closed IMO.

     

    Clearing browser cache solve this for a period of time (I can't spot the exact period, hours to days), I did identify that the `txd-init` or `rxd-init` cookies are causing the issue, when you clear cache you also clear these cookies.

     

     

  14. I tried to find out what is stored in the cache that causing this problem, so instead of clearing all application cache I looked at the stored data. It looks like the `rxd-init` and `txd-init` cookies are too long and break something.

     

    I would appreciate if someone confirm my findings (as it takes hours to days for this to happen)  - For chrome users:

    - When the problem is active and NoVNC does not connect

    - From the VNC window with the error message press F12

    - Select Application and cookies on the DevTools window

    - Delete the `rxd-init` cookie

    - Press Connect on the NoVNC window

     

  15. On 5/11/2022 at 12:12 AM, bonienl said:

    The new font is a bit more compact, and may appear as smaller, but the font size is still the same. You can experiment with the font size setting to see what suites you best.

     

    Interesting that on the 6.10 release I had to put the size back to normal for the font size to match the old size 🙂

    Thanks for great release, highly appreciated 

  16. 34 minutes ago, bonienl said:

     

    I went with "source code pro" and "source sans pro" as the new fonts, have a look at rc8.

    Thx.

    I actually liked the font the previous font, but not that critical (and hey you can't satisfy everyone).

     

    However, it looks like there is a size difference between the fonts. Setting the font size to Large gives a similar font size as Normal 6.9.2, is that expected?

     

  17. 5 minutes ago, Merson said:

    Looks like I've lost the ability to go to unraid via the machine IP. I need to use the full myunraid.net domain otherwise I get a 404. Usually, I would be automatically forwarded to the myunraid.net domain.

    This has also impacted my ability to locally reverse proxy unraid through swag regardless if I use the IP or the full myunraid.net domain. 

    Resetting SSL from auto > off > auto did nothing. 

    Note that I have the unraid UI on non standard ports (1443) 

    I also use non  standard ports, and a reverse proxy, SSL is set to off, and I can access via IP. 

    I guess the common answer would be to first post diagnostics, preferably in a new thread.