I am experiencing a new issue, and as of recent, with dockers on my custom vlan's (br0, br0.4, br0.5, br0.6) become unresponsive. However, I can still ping IP's on any of the networks. But I just cannot get to the web services running on any of them when the issues occurs, including the unraid webUI on br0.
No other network changes have occurred and my unraid has been up for 3.5 months so far.
The issue occurs when there is heavier network load on them. For example, Plex mobile app downloading content locally to view offline while another docker is performing downloads, or if multiple people are watching Plex.
Uptime-Kuma docker webUI also becomes inaccessible during the issue, but when it resolves, it shows docker monitor events stating:
'Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
Additionally, external PRTG monitors show HTTPS monitors for Plex and other dockers as timing-out.
Additionally, I cannot get to unraid mgmt. webUI on br0 when the issue occurs either.
***When the issue is occurring, unraid CPU, RAM, and network utilization is low as well.
The issue resolves itself after approx. 3-5min....
I'm on unraid version: 6.12.8
My network config. (not changed in over a year):
Two physical eth interfaces (eth0, eth1) with bonding and bridging enabled.
Bond0 (eth0, eth1) is connected to Cisco switch using LAG port config.
All vLAN's use parent interface bond0
Docker vLAN br0, br0.5, br0.6 use upstream Opnsense firewall for DHCP pool.
Docker custom network type: macvlan
I do not see any kernel 'call trace' in my enhanced syslog plugin output.
Can someone help me narrow this down and/or recommend if I should try switching to Docker network type: ipvlan ?