0perat0r Posted May 12, 2018 Share Posted May 12, 2018 Hello unRAID. First let me say I am sorry for the wall of text below, but this issue has not been easy to diagnose. I have been having an issue with my unRAID Pro server for as long as I can remember. I have always used VMs with this specific server, and that always required using the "br0" virtual switch to bridge connections to the VMs. Long story short, if I just disable bridging/br0 all of my issues go away. Re-enabling it, all of my issues return. Testing: A Powershell script running on machines connecting to unRAID to test how long it takes to open a shared folder: $i=1 Do { Start-Sleep -s 3 Measure-Command { cmd /c dir \\10.0.10.3\testshare\testfolder } $i $i++ } While ($i -le 150) Output (with br0 disabled): 9 Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 277 Ticks : 2770585 TotalDays : 3.20669560185185E-06 TotalHours : 7.69606944444444E-05 TotalMinutes : 0.00461764166666667 TotalSeconds : 0.2770585 TotalMilliseconds : 277.0585 10 Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 254 Ticks : 2549549 TotalDays : 2.95086689814815E-06 TotalHours : 7.08208055555556E-05 TotalMinutes : 0.00424924833333333 TotalSeconds : 0.2549549 TotalMilliseconds : 254.9549 11 Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 218 Ticks : 2185821 TotalDays : 2.52988541666667E-06 TotalHours : 6.071725E-05 TotalMinutes : 0.003643035 TotalSeconds : 0.2185821 TotalMilliseconds : 218.5821 Output (with br0 enabled): 6 Days : 0 Hours : 0 Minutes : 1 Seconds : 2 Milliseconds : 747 Ticks : 627476154 TotalDays : 0.000726245548611111 TotalHours : 0.0174298931666667 TotalMinutes : 1.04579359 TotalSeconds : 62.7476154 TotalMilliseconds : 62747.6154 7 Days : 0 Hours : 0 Minutes : 0 Seconds : 0 Milliseconds : 475 Ticks : 4752386 TotalDays : 5.50044675925926E-06 TotalHours : 0.000132010722222222 TotalMinutes : 0.00792064333333333 TotalSeconds : 0.4752386 TotalMilliseconds : 475.2386 8 Days : 0 Hours : 0 Minutes : 0 Seconds : 26 Milliseconds : 348 Ticks : 263486462 TotalDays : 0.00030496118287037 TotalHours : 0.00731906838888889 TotalMinutes : 0.439144103333333 TotalSeconds : 26.3486462 TotalMilliseconds : 26348.6462 You can see that with bridging enabled, accessing folders on the unRAID server can take literally minutes sometimes. And general browsing performance is abysmal. It is very inconsistent though, sometimes working quickly, other times take ages. Again, upon disabling bridging all lag in browsing disappears. Pinging the server works perfectly (only after toggling br0 upon a fresh boot, more on that later): (ping) Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 I have made sure DNS resolution/pathing works through Windows Server 2012: (tracert) 1 <1ms <1ms <1ms HYDRA (10.0.10.3) Trace Complete I even added the IP and Hostname of the Unraid server in the Host file of all machines. A few other bits: -This does this with the new 10GB X540-T2 NIC, an older 10GB X520-DA2, and the old on-board 1GB NIC. -This does this with 802.3ad (LACP) bonding on OR off. -This does this with a standard MTU of 1500 or a jumbo MTU of 9014. -The VLAN this server sits on is setup completely with MTU 9000/9014. I have also tested with a completely MTU 1500 network. -There is only one VLAN exposed to this server for simplicity. -The unRAID server is "Joined" to a Windows Domain, and files are read/write protected via user security groups. -That Windows 2012 R2 Standard system acts as the DNS and DHCP server on the network. -Routing is done with pfSense -Switch is a late-model (brand new) Ubiquiti ES-16-XG, copper ports ARE being used, though the issue happens using SFP\X520-DA2 too. -I see this issue from all other machines on the network that connect to the unRAID system, and at the same exact time. Whatever is happening is happening on unRAID's side. -This same network has two ESXI servers going, both of which "bridge" fine. Here is the log enabling br0 (with X540-T2, 802.3ad LACP, and 9014 MTU): May 12 09:19:32 HYDRA ool www[18903]: /usr/local/emhttp/plugins/dynamix/scripts/netconfig 'eth0' May 12 09:19:32 HYDRA rc.inet1: ip -4 addr flush dev bond0 May 12 09:19:32 HYDRA avahi-daemon[4170]: Withdrawing address record for 10.0.10.3 on bond0. May 12 09:19:32 HYDRA avahi-daemon[4170]: Leaving mDNS multicast group on interface bond0.IPv4 with address 10.0.10.3. May 12 09:19:32 HYDRA avahi-daemon[4170]: Interface bond0.IPv4 no longer relevant for mDNS. May 12 09:19:32 HYDRA rc.inet1: ip link set bond0 down May 12 09:19:32 HYDRA avahi-daemon[4170]: Interface bond0.IPv6 no longer relevant for mDNS. May 12 09:19:32 HYDRA avahi-daemon[4170]: Leaving mDNS multicast group on interface bond0.IPv6 with address fe80::a236:9fff:fee5:34d4. May 12 09:19:32 HYDRA avahi-daemon[4170]: Withdrawing address record for fe80::a236:9fff:fee5:34d4 on bond0. May 12 09:19:32 HYDRA rc.inet1: ip link set eth0 nomaster May 12 09:19:32 HYDRA kernel: bond0: Removing an active aggregator May 12 09:19:32 HYDRA kernel: bond0: Releasing backup interface eth0 May 12 09:19:32 HYDRA kernel: bond0: the permanent HWaddr of eth0 - a0:36:9f:e5:34:d4 - is still in use by bond0 - set the HWaddr of eth0 to a different address to avoid conflicts May 12 09:19:32 HYDRA kernel: bond0: first active interface up! May 12 09:19:32 HYDRA kernel: ixgbe 0000:06:00.0: removed PHC on eth0 May 12 09:19:32 HYDRA kernel: ixgbe 0000:06:00.0 eth0: changing MTU from 9014 to 1500 May 12 09:19:32 HYDRA rc.inet1: ip link set eth1 nomaster May 12 09:19:32 HYDRA kernel: bond0: Removing an active aggregator May 12 09:19:32 HYDRA kernel: bond0: Releasing backup interface eth1 May 12 09:19:32 HYDRA kernel: ixgbe 0000:06:00.1: removed PHC on eth1 May 12 09:19:32 HYDRA kernel: ixgbe 0000:06:00.1 eth1: changing MTU from 9014 to 1500 May 12 09:19:32 HYDRA rc.inet1: ip link set bond0 down May 12 09:19:32 HYDRA rc.inet1: ip link del bond0 May 12 09:19:32 HYDRA kernel: bond0 (unregistering): Released all slaves May 12 09:19:32 HYDRA rc.inet1: ip link add name bond0 type bond mode 4 miimon 100 May 12 09:19:32 HYDRA rc.inet1: ip link set bond0 up May 12 09:19:32 HYDRA kernel: IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready May 12 09:19:32 HYDRA kernel: 8021q: adding VLAN 0 to HW filter on device bond0 May 12 09:19:32 HYDRA rc.inet1: ip link set eth0 master bond0 down May 12 09:19:32 HYDRA kernel: pps pps2: new PPS source ptp2 May 12 09:19:32 HYDRA kernel: ixgbe 0000:06:00.0: registered PHC device on eth0 May 12 09:19:32 HYDRA kernel: 8021q: adding VLAN 0 to HW filter on device eth0 May 12 09:19:32 HYDRA kernel: bond0: Enslaving eth0 as a backup interface with a down link May 12 09:19:32 HYDRA rc.inet1: ip link set eth1 master bond0 down May 12 09:19:33 HYDRA kernel: pps pps3: new PPS source ptp3 May 12 09:19:33 HYDRA kernel: ixgbe 0000:06:00.1: registered PHC device on eth1 May 12 09:19:33 HYDRA kernel: 8021q: adding VLAN 0 to HW filter on device eth1 May 12 09:19:33 HYDRA kernel: bond0: Enslaving eth1 as a backup interface with a down link May 12 09:19:33 HYDRA rc.inet1: ip link add name br0 type bridge stp_state 0 forward_delay 0 May 12 09:19:33 HYDRA rc.inet1: ip link set br0 up May 12 09:19:33 HYDRA rc.inet1: ip link set bond0 down May 12 09:19:33 HYDRA rc.inet1: ip link set bond0 mtu 9014 May 12 09:19:33 HYDRA kernel: ixgbe 0000:06:00.0 eth0: changing MTU from 1500 to 9014 May 12 09:19:33 HYDRA kernel: ixgbe 0000:06:00.1 eth1: changing MTU from 1500 to 9014 May 12 09:19:33 HYDRA rc.inet1: ip -4 addr flush dev bond0 May 12 09:19:33 HYDRA rc.inet1: ip link set bond0 promisc on master br0 up May 12 09:19:33 HYDRA kernel: device bond0 entered promiscuous mode May 12 09:19:33 HYDRA kernel: device eth0 entered promiscuous mode May 12 09:19:33 HYDRA kernel: device eth1 entered promiscuous mode May 12 09:19:33 HYDRA kernel: IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready May 12 09:19:33 HYDRA kernel: 8021q: adding VLAN 0 to HW filter on device bond0 May 12 09:19:33 HYDRA kernel: br0: port 1(bond0) entered blocking state May 12 09:19:33 HYDRA kernel: br0: port 1(bond0) entered disabled state May 12 09:19:33 HYDRA rc.inet1: ip link set br0 mtu 9014 May 12 09:19:33 HYDRA rc.inet1: ip -4 addr add 10.0.10.3/255.255.255.0 dev br0 May 12 09:19:33 HYDRA avahi-daemon[4170]: Joining mDNS multicast group on interface br0.IPv4 with address 10.0.10.3. May 12 09:19:33 HYDRA avahi-daemon[4170]: New relevant interface br0.IPv4 for mDNS. May 12 09:19:33 HYDRA avahi-daemon[4170]: Registering new address record for 10.0.10.3 on br0.IPv4. May 12 09:19:33 HYDRA rc.inet1: ip link set br0 up May 12 09:19:33 HYDRA rc.inet1: ip -4 route add default via 10.0.10.1 dev br0 May 12 09:19:34 HYDRA avahi-daemon[4170]: Joining mDNS multicast group on interface br0.IPv6 with address fe80::cc9b:5eff:fe34:d3c4. May 12 09:19:34 HYDRA avahi-daemon[4170]: New relevant interface br0.IPv6 for mDNS. May 12 09:19:34 HYDRA avahi-daemon[4170]: Registering new address record for fe80::cc9b:5eff:fe34:d3c4 on br0.*. May 12 09:19:36 HYDRA ntpd[2594]: Deleting interface #6 bond0, 10.0.10.3#123, interface stats: received=129, sent=170, dropped=0, active_time=101831 secs May 12 09:19:36 HYDRA ntpd[2594]: 166.70.136.35 local addr 10.0.10.3 -> <null> May 12 09:19:38 HYDRA kernel: ixgbe 0000:06:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: None May 12 09:19:38 HYDRA kernel: bond0: link status definitely up for interface eth0, 10000 Mbps full duplex May 12 09:19:38 HYDRA kernel: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond May 12 09:19:38 HYDRA kernel: bond0: first active interface up! May 12 09:19:38 HYDRA kernel: IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready May 12 09:19:38 HYDRA kernel: br0: port 1(bond0) entered blocking state May 12 09:19:38 HYDRA kernel: br0: port 1(bond0) entered forwarding state May 12 09:19:38 HYDRA kernel: ixgbe 0000:06:00.1 eth1: NIC Link is Up 10 Gbps, Flow Control: None May 12 09:19:38 HYDRA kernel: bond0: link status definitely up for interface eth1, 10000 Mbps full duplex May 12 09:19:40 HYDRA avahi-daemon[4170]: Joining mDNS multicast group on interface bond0.IPv6 with address fe80::a236:9fff:fee5:34d4. May 12 09:19:40 HYDRA avahi-daemon[4170]: New relevant interface bond0.IPv6 for mDNS. May 12 09:19:40 HYDRA avahi-daemon[4170]: Registering new address record for fe80::a236:9fff:fee5:34d4 on bond0.*. May 12 09:19:41 HYDRA ntpd[2594]: Listen normally on 7 br0 10.0.10.3:123 May 12 09:19:41 HYDRA ntpd[2594]: new interface(s) found: waking up resolver May 12 09:19:49 HYDRA ool www[19815]: /usr/local/emhttp/plugins/dynamix/scripts/emhttpd_update May 12 09:19:49 HYDRA emhttpd: req (5): cmdStatus=apply&csrf_token=**************** May 12 09:19:49 HYDRA emhttpd: Starting services... May 12 09:19:49 HYDRA emhttpd: shcmd (5810): /usr/local/sbin/mount_image '/mnt/user/system/libvirt/libvirt.img' /etc/libvirt 1 May 12 09:19:49 HYDRA kernel: BTRFS info (device loop2): disk space caching is enabled May 12 09:19:49 HYDRA kernel: BTRFS info (device loop2): has skinny extents May 12 09:19:49 HYDRA root: Resize '/etc/libvirt' of 'max' May 12 09:19:49 HYDRA kernel: BTRFS info (device loop2): new size for /dev/loop2 is 1073741824 May 12 09:19:49 HYDRA emhttpd: shcmd (5812): /etc/rc.d/rc.libvirt start May 12 09:19:49 HYDRA root: Starting virtlockd... May 12 09:19:49 HYDRA root: Starting virtlogd... May 12 09:19:49 HYDRA root: Starting libvirtd... May 12 09:19:49 HYDRA kernel: tun: Universal TUN/TAP device driver, 1.6 May 12 09:19:49 HYDRA kernel: virbr0: port 1(virbr0-nic) entered blocking state May 12 09:19:49 HYDRA kernel: virbr0: port 1(virbr0-nic) entered disabled state May 12 09:19:49 HYDRA kernel: device virbr0-nic entered promiscuous mode May 12 09:19:50 HYDRA kernel: virbr0: port 1(virbr0-nic) entered blocking state May 12 09:19:50 HYDRA kernel: virbr0: port 1(virbr0-nic) entered listening state May 12 09:19:50 HYDRA avahi-daemon[4170]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1. May 12 09:19:50 HYDRA avahi-daemon[4170]: New relevant interface virbr0.IPv4 for mDNS. May 12 09:19:50 HYDRA avahi-daemon[4170]: Registering new address record for 192.168.122.1 on virbr0.IPv4. May 12 09:19:50 HYDRA dnsmasq[20823]: started, version 2.79 cachesize 150 May 12 09:19:50 HYDRA dnsmasq[20823]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify May 12 09:19:50 HYDRA dnsmasq-dhcp[20823]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h May 12 09:19:50 HYDRA dnsmasq-dhcp[20823]: DHCP, sockets bound exclusively to interface virbr0 May 12 09:19:50 HYDRA dnsmasq[20823]: reading /etc/resolv.conf May 12 09:19:50 HYDRA dnsmasq[20823]: using nameserver 10.0.10.2#53 May 12 09:19:50 HYDRA dnsmasq[20823]: using nameserver 10.0.10.1#53 May 12 09:19:50 HYDRA dnsmasq[20823]: using nameserver 8.8.8.8#53 May 12 09:19:50 HYDRA dnsmasq[20823]: read /etc/hosts - 2 addresses May 12 09:19:50 HYDRA dnsmasq[20823]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses May 12 09:19:50 HYDRA dnsmasq-dhcp[20823]: read /var/lib/libvirt/dnsmasq/default.hostsfile May 12 09:19:50 HYDRA kernel: virbr0: port 1(virbr0-nic) entered disabled state May 12 09:19:53 HYDRA kernel: vfio-pci 0000:05:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none May 12 09:19:53 HYDRA kernel: br0: port 2(vnet0) entered blocking state May 12 09:19:53 HYDRA kernel: br0: port 2(vnet0) entered disabled state May 12 09:19:53 HYDRA kernel: device vnet0 entered promiscuous mode May 12 09:19:53 HYDRA kernel: br0: port 2(vnet0) entered blocking state May 12 09:19:53 HYDRA kernel: br0: port 2(vnet0) entered forwarding state May 12 09:19:55 HYDRA kernel: vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none May 12 09:19:55 HYDRA kernel: br0: port 3(vnet1) entered blocking state May 12 09:19:55 HYDRA kernel: br0: port 3(vnet1) entered disabled state May 12 09:19:55 HYDRA kernel: device vnet1 entered promiscuous mode May 12 09:19:55 HYDRA kernel: br0: port 3(vnet1) entered blocking state May 12 09:19:55 HYDRA kernel: br0: port 3(vnet1) entered forwarding state May 12 09:19:55 HYDRA avahi-daemon[4170]: Joining mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe5a:4500. May 12 09:19:55 HYDRA avahi-daemon[4170]: New relevant interface vnet0.IPv6 for mDNS. May 12 09:19:55 HYDRA avahi-daemon[4170]: Registering new address record for fe80::fc54:ff:fe5a:4500 on vnet0.*. May 12 09:19:56 HYDRA avahi-daemon[4170]: Joining mDNS multicast group on interface vnet1.IPv6 with address fe80::fc54:ff:fe4c:8dc1. May 12 09:19:56 HYDRA avahi-daemon[4170]: New relevant interface vnet1.IPv6 for mDNS. May 12 09:19:56 HYDRA avahi-daemon[4170]: Registering new address record for fe80::fc54:ff:fe4c:8dc1 on vnet1.*. The "warning" lines: May 12 09:19:32 HYDRA kernel: bond0: the permanent HWaddr of eth0 - a0:36:9f:e5:34:d4 - is still in use by bond0 - set the HWaddr of eth0 to a different address to avoid conflicts May 12 09:19:38 HYDRA kernel: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond I am not sure if these are warnings/errors that are common when cycling a bridge like this or if this indeed points to the issue. Could enabling the bridge be causing ARP issues because of a MAC conflict? It looks like the LACP response warning comes right after only one interface is up, so maybe this error is very temporary. On the issue mentioned before where I mentioned "only after toggling br0 upon a fresh boot," the unRAID server experiences severe network lag "spikes" if rebooted with bridging enabled. You can see this just by continually pinging the server: (ping) Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Request timed out. Request timed out. Request timed out. Reply from 10.0.10.3: bytes 32 time=8ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Request timed out. Request timed out. Reply from 10.0.10.3: bytes 32 time=13ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 This "lag" is absolutely felt when using the server. Whatever you are doing at the time has to wait until the server comes back. Toggling br0 after every reboot fixes this: (ping) Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 Reply from 10.0.10.3: bytes 32 time<1ms TTL=64 This issue is weird as everything WORKS, just inconsistently. I have used Wireshark to see what I can gleam from connections to the unRAID server, only to find that there is sometimes simply a large delay between packets being received from unRAID, as if it is busy doing something else, which is not usually the case with more than 12GB of RAM free and 0-8 % CPU usage on a hyper-threaded 6 core. unRAID Server Specs: Xeon E5-1650 V3 Supermicro X10SRL-FO 32GB (4x8GB) RDIMM ECC DDR4 X540-T2 H310 in IT Mode CSE846 with Expander Backplane, Dual Linked Dual 8TB Parity 7200RPM 12 8TB Data 5400RPM 512GB RAID0 (Data) x2 256GB 850 Pro for Cache 512GB 850 Pro - Unassigned Devices for VMs 480GB Intel 730 - Unassigned Devices for VMs USB is Swissbit 2GB SLC As stated above, I have tested this without LACP and Jumbo frames enabled, only to see the same issue. I have been having this issue since I built the machine in early 2016 with only a 1GB NIC, and just always assumed unRAID was a "slow" product. It is only recently with the move to 10GB that I went and disabled bridging and VMs completely and found unRAID to -NOT- be a slow product. I have had this issue with every version I have upgraded to since 2016. Any help or insight you can provide would be of use to me. I am debating moving my virtualization needs back to ESXI and building a much more basic core system for unRAID, but if I can figure out why bridging is so problematic on this system, I can keep it as-is. Thank you. Link to comment
0perat0r Posted August 9, 2018 Author Share Posted August 9, 2018 Hello. I have since completely moved away from using VMs with Unraid, though I have a feeling bridging and VMs were only part of the problem. Thank you. Link to comment
bonienl Posted August 10, 2018 Share Posted August 10, 2018 I can't really tell what is the cause of your problem, but bridging is an essential part of VM working. It is a must have requirement to let VMs talk to the outside world. I have not seen any similar reports as yours and *if* it is a widespread thing sure a lot of users will complain because of the basic requirement to have bridging. From own experiences I can only tell it is working fine without any delays on the different VMs that I am ussing. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.