Problem with 10gbe, unstable Web connection??


Recommended Posts

Hi,

 

Struggling to upgrade my unRaid server with a 10gbe NIC. Everthing was working fine before - Asrock X470D4U with 4x NAS drives, 2x NVME M.2 cache, a few dockers, 2 1Gb nics. Installed the gbe nix (intel chipset) recognised ok, reconfigured it as eth0 so that unRaid would use it. Then I was getting asymetric xfer speeds 1GB/s write to unraid but only 100MB/s read from it (tried using iPerf got similar performance so not storage bound). I then disabled the dockers, disabled on the board NICs and started to get very flakey Web connection (hangs / partial page loads). Can still SSH in to the box no problem.

 

 

iPerf gives the followng results - 42% packet loss when -R from the client (a Win10 box)

(192.168.2.177 is the server 10gbe nic address)

 

iperf3 -c 192.168.2.177 -u -b 1000M -t 3 -P 20

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.047 ms  0/13723 (0%)
[  4] Sent 13723 datagrams
[  6]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.050 ms  0/13723 (0%)
[  6] Sent 13723 datagrams
[  8]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.053 ms  0/13723 (0%)
[  8] Sent 13723 datagrams
[ 10]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.054 ms  0/13723 (0%)
[ 10] Sent 13723 datagrams
[ 12]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.054 ms  0/13723 (0%)
[ 12] Sent 13723 datagrams
[ 14]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.053 ms  0/13723 (0%)
[ 14] Sent 13723 datagrams
[ 16]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.054 ms  0/13723 (0%)
[ 16] Sent 13723 datagrams
[ 18]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.056 ms  0/13723 (0%)
[ 18] Sent 13723 datagrams
[ 20]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.052 ms  0/13723 (0%)
[ 20] Sent 13723 datagrams
[ 22]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.052 ms  0/13723 (0%)
[ 22] Sent 13723 datagrams
[ 24]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.051 ms  0/13723 (0%)
[ 24] Sent 13723 datagrams
[ 26]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.050 ms  0/13723 (0%)
[ 26] Sent 13723 datagrams
[ 28]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.049 ms  0/13723 (0%)
[ 28] Sent 13723 datagrams
[ 30]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.049 ms  0/13723 (0%)
[ 30] Sent 13723 datagrams
[ 32]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.048 ms  0/13723 (0%)
[ 32] Sent 13723 datagrams
[ 34]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.047 ms  0/13723 (0%)
[ 34] Sent 13723 datagrams
[ 36]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.043 ms  0/13723 (0%)
[ 36] Sent 13723 datagrams
[ 38]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.041 ms  0/13723 (0%)
[ 38] Sent 13723 datagrams
[ 40]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.041 ms  0/13723 (0%)
[ 40] Sent 13723 datagrams
[ 42]   0.00-3.00   sec   107 MBytes   300 Mbits/sec  0.040 ms  0/13723 (0%)
[ 42] Sent 13723 datagrams
[SUM]   0.00-3.00   sec  2.09 GBytes  6.00 Gbits/sec  0.049 ms  0/274460 (0%)

 

iperf-3.1.3-win64>iperf3 -c 192.168.2.177 -R -u -b 1000M -t 3 -P 20

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.124 ms  9136/21988 (42%)
[  4] Sent 21988 datagrams
[  6]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.130 ms  9136/21988 (42%)
[  6] Sent 21988 datagrams
[  8]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.134 ms  9136/21988 (42%)
[  8] Sent 21988 datagrams
[ 10]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.140 ms  9136/21988 (42%)
[ 10] Sent 21988 datagrams
[ 12]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.137 ms  9136/21988 (42%)
[ 12] Sent 21988 datagrams
[ 14]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.136 ms  9136/21988 (42%)
[ 14] Sent 21988 datagrams
[ 16]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.142 ms  9134/21986 (42%)
[ 16] Sent 21986 datagrams
[ 18]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.133 ms  9134/21986 (42%)
[ 18] Sent 21986 datagrams
[ 20]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.138 ms  9135/21987 (42%)
[ 20] Sent 21987 datagrams
[ 22]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.140 ms  9136/21987 (42%)
[ 22] Sent 21987 datagrams
[ 24]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.137 ms  9137/21989 (42%)
[ 24] Sent 21989 datagrams
[ 26]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.148 ms  9136/21988 (42%)
[ 26] Sent 21988 datagrams
[ 28]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.142 ms  9136/21988 (42%)
[ 28] Sent 21988 datagrams
[ 30]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.137 ms  9136/21988 (42%)
[ 30] Sent 21988 datagrams
[ 32]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.144 ms  9136/21988 (42%)
[ 32] Sent 21988 datagrams
[ 34]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.136 ms  9136/21988 (42%)
[ 34] Sent 21988 datagrams
[ 36]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.120 ms  9136/21987 (42%)
[ 36] Sent 21987 datagrams
[ 38]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.146 ms  9136/21988 (42%)
[ 38] Sent 21988 datagrams
[ 40]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.113 ms  9136/21987 (42%)
[ 40] Sent 21987 datagrams
[ 42]   0.00-3.00   sec   172 MBytes   481 Mbits/sec  0.142 ms  9136/21988 (42%)
[ 42] Sent 21988 datagrams
[SUM]   0.00-3.00   sec  3.36 GBytes  9.63 Gbits/sec  0.136 ms  182716/439753 (42%)

 

Diagnostics attached....

 

butterflyserver-diagnostics-20200502-1221.zip

Link to comment

So I saw lots on dropped packets on the switch due to oversized frames, so I reset everything to MTU 1500.

 

I'm now getting speeds of 1.1GB/s to 600MB/s writes to unRaid, and 400-500MB/s reads to unRaid. And the web interface seems stable. Still not sure what's going on here...

Link to comment

Ok, so one issue was that I was setting the MTU to 9014 on Win10 and unraid, which I then set on the switch as well, but on the switch I was setting 'maximum frame size' to 9014, which apparently is wrong - the frame size is larger. Fixing that means I can have jumbo frame renabled now without packet loss. But it hasn't improved the transfer speed from the medium results I was getting - 1GB/s writes but 400-500Mb/s reads.

Link to comment

Well, this is strange. Installing the 10Gbe card seemed to mess up half my dockers, so spent some time fixing those, and upgrading to latest Unraid release (was previously 6.7.x). Everything working ok, so I thought I'd check the file xfer speed again. Now, reads are up to 1.1GB/s (great!) but writes have dropped to 110MB/s (what?!?). I've run iPerf, I'm getting full 10Gbit/s in both directions. Watched the Dashboard during the transfer (a 7Gb file) and the traffic is definitely coming in on eth0, the 10Gbe card. Source drive is a SSD, can get 2GB/s off it to another local SSD, so that's not the issue. Both PC and server are connected to the same DLink DGS-1510-28X switch each in a SPF+ port (one via DAC, one via fibre link). Occasionally I see the writes go up to 250MB/s but mostly stuck at the 100 ish mark. Double checked the switch, no dropped packets.

 

What on earth is going on now? (Diagnostics attached)

butterflyserver-diagnostics-20200503-2120.zip

Link to comment
On 5/2/2020 at 4:19 PM, robest said:

Ok, so one issue was that I was setting the MTU to 9014 on Win10 and unraid, which I then set on the switch as well, but on the switch I was setting 'maximum frame size' to 9014, which apparently is wrong - the frame size is larger. Fixing that means I can have jumbo frame renabled now without packet loss. But it hasn't improved the transfer speed from the medium results I was getting - 1GB/s writes but 400-500Mb/s reads.

 

When I saw the title first thing I thought was MTU. Did you check they were working at 9000?

 

If you're getting 10G with iperf it must be a physical read/write speed issue.

Edited by cinereus
Link to comment

When i changed to more recent Unraid version it reordered my ethx: ports. The 10 Gb was eth0: and now it became eth1: after unraid update.

 

eth0 is connected to br0 and possibly vibr0 used in dockers and VMs or something.

So old settings when dockers and VMs were created don't match (1 and 10 Gb ports are not at same brx, vibrx).

My solution was to reorder NICs (eth0, eth1) back in Network settings.

 

If this is your problem (i do not know, just speculating) then my fix is as follows.

 

To change Network Settings you must go to Settings and Disable in VM Engine and Docker first. Then you can edit Netw. Sett.

Once you changed MAC numbers in "Interface Rules" so 10 Gb is back on eth0: or whatever you had there you have to apply setting and REBOOT.

Only after reboot will they change to what they were set to.

 

I'm not sure this is the cause behind your problems. Just trying to help.

If you like me have several NICS (ethernet ports) and your 10 Gb has been reordered in the ehtx: numbering this might fix it.

 

A wild guess is that the new NIC order is an effect of a new linux kernel used in more recent unraid versions.

Maybe some linux user problems were fixed for multiple NIC boards by this change, but it messed up old unraid configs when updating unraid version.

Edited by Alexander
  • Like 1
Link to comment

A bit more info: originally when I was setting this up I had disabled all dockers etc. Then when I thought I'd got it sorted (ie fixed the MTU issue) I had re-enabled everything - including motionEye recording 4 video streams.

 

If I stop the motionEye docker, I get better performance (but "only" 700-800Mb/s rather than the 1Gb/s I was getting before). 

 

However - the motionEye docker has it's own NIC, and it's own share with cache disabled, and there's plenty of horsepower in the CPU, so I'm still a bit surprised at the performance drop. I'd have thought the video recording to the hard drives wouldn't have affected a write to the cache (nvme ssd).

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.