Jump to content

10gbit network


Recommended Posts

Hello, I have moved my unRAID server to the new hardware but got strange issue with 10gbit LAN:

- iperf3 shows only ~2.66gbits/sec on the test;

- copying files over lan to nvme share also slow as ~170-210mb/second.

 

At first - before moving the HDD's - I have installed on that new server hardware Windows 10 pro - on internal m2 nvme SSD (Samsung 990 pro 2tb) - and checked everything including my network and ssd speed - and it worked just fine:

- I had copy files speed ~1+gb/sec (10gbit)

- somewhy iperf3 showed half of my connection speed - no idea why (mb bugs?), copying files over the lan were fine

 

After these tests I plugged in my unRAID HDD drives and booted from the unRAID flash, network still connects 10ggbit full duplex but performing speed is low. So by these tests I'm assuming that hardware is fine and there is some unRAID miss-configuration, Is there anything I can do to find the root of the issue?

 

Hardware: it's some custom server in the Supermicro 846 4U case:

lan: Intel X520-DA2 10G (Intel 82599 10 Gigabit Dual Port)

switch: Mikrotik CRS309-1G-8S IN (hardcoded to force connection as 10gb full duplex)

mobo: Asus Proart B650-Creator (supports 2x8 pci)

cpu: Ryzen 9 7900

ssd: Samsung 990 pro 2tb, 1tb, 1tb

...

 

I got multimode optic wires.

 

2024-06-11 21_05_37-Task Manager.jpg

2024-06-15 13_33_26-root@Tower_ ~ _ bash --login (Tower).jpg

2024-06-15 13_34_00-root@Tower_ ~ _ bash --login (Tower).jpg

Edited by Ariloum
Added some info
Link to comment

root@Tower:~# lspci | grep -i 'net'
03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

 

 

root@Tower:~# ethtool -i eth0
driver: ixgbe
version: 6.1.79-Unraid
firmware-version: 0x800003df
expansion-rom-version: 
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Link to comment

same network card works also fine on debian:

root@ISP001:~# ethtool -i enp1s0f0
driver: ixgbe
version: 6.5.11-8-pve
firmware-version: 0x80000868
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Link to comment

iperf is just standalone piece of software that I expect could have many bugs.

Did you mean that I need to post bugreport to the iperf github first and wait for resolution before accepting the proof of work with direct files copying on windows and debian?

Actually I was thinking about posting a bugreport to them but I got deleted Windows already so I will be unable to do some additional tests they will require for sure.

Link to comment

Do not force the switch to 10G!

 

If autonegotiation does not work correctly, there is something wrong with your hardware. 

Go and investigate!

Also, there maybe a problem with the Intel Driver (my faith in Intel cards is limited, I prefer Mellanox), see if you can find a special one in the App section.

 

Iperf is usually ok, it does not lie much.

Link to comment
6 minutes ago, Vr2Io said:

You should check the iperf result at client side and not on server side, your windows show TX near 10Gbps.

at my side iperf3 shows same speed no matter who is client and who is the server - I though that it should work like that by it's nature?

 

1 hour ago, MAM59 said:

Do not force the switch to 10G!

 

If autonegotiation does not work correctly, there is something wrong with your hardware. 

Go and investigate!

Also, there maybe a problem with the Intel Driver (my faith in Intel cards is limited, I prefer Mellanox), see if you can find a special one in the App section.

 

Iperf is usually ok, it does not lie much.

 

I tried turning on the negotiation on the Mikrotik switch and restarted both - switch and unraid server, but it still saying there is no negotiation and connects 10gbit full duplex.

 

2024-06-15 18_32_58-NA - Interface _sfp-sfpplus5_ at admin@192.168.1.44 - Webfig v6.49.7 (stable) on.jpg

2024-06-15 18_25_58-Total Commander.jpg

2024-06-15 18_34_46-cmd (Admin).jpg

2024-06-15 18_36_09-Settings.jpg

Link to comment
12 minutes ago, MAM59 said:

You need to allow flow control in both directions (btw, where is that screenshot comeming from? its neither windows, nor unraid nor mikrotik ???)

Do you run the switch with SWOS or RouterOS ?

 

yep it's RouterOS - at first I tried direct connection with LAN adapters between Windows <-> Windows w/o switch and it worked with copying files at 10gbps, then I added that switch and it was working fine too. 

 

I changed flow control to auto and restarted switch - still same image on the unraid side:

root@Tower:~# ethtool eth0
Settings for eth0:
        Supported ports: [ FIBRE ]
        Supported link modes:   10000baseT/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  10000baseT/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

2024-06-15 19_11_32-NA - Interface _sfp-sfpplus5_ at admin@192.168.1.44 - Webfig v6.49.7 (stable) on.jpg

2024-06-15 19_14_16-Total Commander.jpg

Link to comment

With RouterOS I am out, too complicated for me. I only use SWOS.

 

Maybe your card only can run at 10G with this SFP+ module?

It only annouces 10G and negotiation is done with 10G, nothing to worry about.

You do not want less anyway :-)))


And Flow Control is essential for 10G Operation because there need to be a way to stop 10G from sending if it overruns slower ports. Also, the now becoming common 2.5G is only "10G with pauses". It depends on flow control. Without it packets use the wrong time slot and are lost at the (non)receiving side.

 

Link to comment
Posted (edited)
3 minutes ago, MAM59 said:

With RouterOS I am out, too complicated for me. I only use SWOS.

 

Maybe your card only can run at 10G with this SFP+ module?

 

 

It's already working on that hardware with 10gbit - if I unplug unraid and boot windows from internal m2 ssd files copy speed will be 10gbit as on the screen above.

It doesn't works if I run unraid on that hardware.

Edited by Ariloum
Link to comment
Just now, MAM59 said:

Thats why I said "look for a driver". The stock one does not seem to work well.

 

is there any manual for the unraid network driver change? I tried to google but didn't found anything usable (some peeps offering to compile custom kernels).

Link to comment
30 minutes ago, MAM59 said:

Also, the now becoming common 2.5G is only "10G with pauses". It depends on flow control. Without it packets use the wrong time slot and are lost at the (non)receiving side.

Not true, flow control were some old technic to tell pause to others, it usually won't happen in nowadays network and make negative effect because it will make unnecessary pause to all other clients. For 10G with 2.5G communicate ( high speed with low speed communication ), the protocol stack will address this instead achieve by flow control.

Edited by Vr2Io
Link to comment
11 hours ago, Vr2Io said:

it usually won't happen in nowadays network and make negative effect because it will make unnecessary pause to all other clients

Not true. But dont let us start a fight. And it happens all the time where speed adjustments are needed. And it does not block everything, just this port.

grafik.thumb.png.50c68c3a273300eaaef7e055dc4fbe62.png

(ok, "all the time" is a bit too much (the packet counts here in these stats are over a billion, but you see, it CAN happen. "Garage 2" has a cheap unmanaged switch with a 10GSFP+ Port and 5*2,5G and without flow control the communication is poor to unstable. "Downlink1G" is the management port connected to another 24*1G switch)

Edited by MAM59
Link to comment

Taking a look into your syslog file shows hundreds of errors of the kind "nginx running amok" (these "authinfo" lines). For security reasons the GUI shuts off everything for a time and pauses because there are too many requests coming in. This also can effect network speed.

I had this too a long time ago, but I cannot remember anymore how to solve it. I think I needed to switch to a different browser on the client that connected to unraid, but to be sure, search the forum for "authlimit" (or, maybe I have increased the limit, but as I said, i cannot remember)

un 15 19:45:02 Tower nginx: 2024/06/15 19:45:02 [error] 9966#9966: *39745 limiting requests, excess: 20.468 by zone "authlimit", client: 192.168.1.111, server: , request: "GET /login HTTP/2.0", host: "192.168.1.121", referrer: "https://192.168.1.121/Main"
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Jun 15 19:45:02 Tower nginx: 2024/06/15 19:45:02 [error] 9966#9966: *39745 limiting requests, excess: 20.467 by zone "authlimit", client: 192.168.1.111, server: , request: "GET /login HTTP/2.0", host: "192.168.1.121", referrer: "https://192.168.1.121/Main"
### [PREVIOUS LINE REPEATED 4 TIMES] ###
Jun 15 19:45:02 Tower nginx: 2024/06/15 19:45:02 [error] 9966#9966: *39745 limiting requests, excess: 20.466 by zone "authlimit", client: 192.168.1.111, server: , request: "GET /login HTTP/2.0", host: "192.168.1.121", referrer: "https://192.168.1.121/Main"
### [PREVIOUS LINE REPEATED 2 TIMES] ###
Jun 15 19:45:02 Tower nginx: 2024/06/15 19:45:02 [error] 9966#9966: *39745 limiting requests, excess: 20.465 by zone "authlimit", client: 192.168.1.111, server: , request: "GET /login HTTP/2.0", host: "192.168.1.121", referrer: "https://192.168.1.121/Main"
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Jun 15 19:45:02 Tower nginx: 2024/06/15 19:45:02 [error] 9966#9966: *39745 limiting requests, excess: 20.464 by zone "authlimit", client: 192.168.1.111, server: , request: "GET /login HTTP/2.0", host: "192.168.1.121", referrer: "https://192.168.1.121/Main"
Jun 15 19:45:10 Tower nginx: 2024/06/15 19:45:10 [error] 9966#9966: *39745 limiting requests, excess: 20.834 by zone "authlimit", client: 192.168.1.111, server: , request: "GET /login HTTP/2.0", host: "192.168.1.121", referrer: "https://192.168.1.121/Settings"
Jun 15 19:45:10 Tower nginx: 2024/06/15 19:45:10 [error] 9966#9966: *39745 limiting requests, excess: 20.833 by zone "authlimit", client: 192.168.1.111, server: , request: "GET /login HTTP/2.0", host: "192.168.1.121", referrer: 

aah! found it: follow

 

 

Edited by MAM59
Link to comment
2 hours ago, MAM59 said:

Taking a look into your syslog file shows hundreds of errors of the kind "nginx running amok" (these "authinfo" lines). For security reasons the GUI shuts off everything for a time and pauses because there are too many requests coming in. This also can effect network speed.

I had this too a long time ago, but I cannot remember anymore how to solve it. I think I needed to switch to a different browser on the client that connected to unraid, but to be sure, search the forum for "authlimit" (or, maybe I have increased the limit, but as I said, i cannot remember)

 

 

Yeah that "authlimit" is kinda annoying error if you want to just read the logs. I got my pc on 24/7 and Chrome browser has 1 open page with unraid UI. I also tried to google about that problem before and I have seen the link you attached. Here I seen a post of admin ljm42 and he said significant part "If an open page becomes unauthenticated" - at this point I started to count this error as unRAID bug and stopped discovering it, my thought was that unraid webUI must not lose auth that easy (why did it happens btw? token expiration timer too small or ...? I have all my Chrome cache turned off so it should not try to load any cache/history attached to that page).

 

Also that error is not filling my memory much and I got that network speed bug after fresh reboot of unraid server so I think that network speed issue has some other roots.

Link to comment
29 minutes ago, Ariloum said:

If an open page becomes unauthenticated" - at this point I started to count this error as unRAID bug and stopped discovering it, my thought was that unraid webUI must not lose auth that easy

Thats a wrong impression. If there is an "unauthenticated" reply, the client should stop any requests and ask the user for new credentials. There is nothing UNRAID could fix here.

If you read the thread fully, there are various ways how people got away from it. No clear "solution", just "try this" and "I did this".., You have to try them all to see what helps you.

(For me it was disabling IPV6 for a while, which is totally stupid normally. After some Updates it worked again, no idea why). But I remember that while I had this problem, I already noticed "LAN-Hickups" where file transfer was halted for small periods of time without reason.

 

The other thing is that the error is logged rather late. Thousands of denied requests have been hammered poor UNRAID before already before first line of error is logged.

 

But, of course, this maybe unrelated to your speed problem. Or maybe not.

 

One other thing I have noticed from your diagnostics is that you use a very recent AMD 7900X processor. UNRAID is not the fastest OS when it comes to AMD adaptions. The current version may still be a bit incompatible with this guy. But again, its a MAY, not a WILL. I guess most of the development crew runs Intel machines.

(also I think the 7900x is a bit too much for a file server... but your milage may vary of course. I use a 7950x for Picture and Video Editing on Windows, for Unraid a 5700G is even enough already)

 

Else I did not find any real hint that may be related to LAN speed, sorry.

 

Link to comment
Posted (edited)
2 hours ago, MAM59 said:

Thats a wrong impression. If there is an "unauthenticated" reply, the client should stop any requests and ask the user for new credentials. There is nothing UNRAID could fix here.

If you read the thread fully, there are various ways how people got away from it. No clear "solution", just "try this" and "I did this".., You have to try them all to see what helps you.

(For me it was disabling IPV6 for a while, which is totally stupid normally. After some Updates it worked again, no idea why). But I remember that while I had this problem, I already noticed "LAN-Hickups" where file transfer was halted for small periods of time without reason.

 

That "unauthenticated" error comes from the webUI page of the unraid itself, if we have to do guesswork ourselves with tricking that error handling that mostly means poorly error-handled code server-side or poorly written client-side or both (I have a few years of exp on enterprise java backend development).

 

About ipv6 I got it turned off for ages...

 

Other solutions from that link doesn't looks good - why would I increase nginx limits to endless - it's not secure if some1 gonna try to bruteforce passwords I will not see it...

 

And switching browser to Firefox - what kind of solution is this? Laughtable :P ?

 

I agreed there could be some browser bugs, but they (Google in case of Chrome) fixing it pretty fast - imagine having bugs with only unraid UI and blaming browser... The only thing comes to my mind that could cause issues for some kind of webUI's is browser's addon's, I will try to investigate if this error still happens on incognito Chrome page.

 

2 hours ago, MAM59 said:

One other thing I have noticed from your diagnostics is that you use a very recent AMD 7900X processor. UNRAID is not the fastest OS when it comes to AMD adaptions. The current version may still be a bit incompatible with this guy.

 

I have upgraded that unraid thingy from Ryzen 5 5600G and looks like 7900x works overall well except that network speed issue. And it's not only fileserver, I have many usages of it with DB's/VMs/neural stuff and heavy tasks... I got 7950x too on my main PC and these both is nice cpu's (except the fact it can handle only 2 of 4 memory sticks on full speed otherwise speed is halfed).

 

Tbh that 7900x is not too new - it has been released 8 months ago. Also there is many AMD cpu's for servers like threadripper and epyc. check the cpu benchmark top list, did you see Intel here? https://www.cpubenchmark.net/high_end_cpus.html

 

I personally do not like Intel since they still got too fat nm tech for their cpu's and thus got too high TDP (too hot), so more noise for fans - I'm not a fan of their fanbase for long time). And, at the moment, I don't like their efficiency cores which is not covering many software cases (same like these new Mac m2-m3 arm cpu's with "e-cores" not covering many audio DAW's plugins processing - that is including Apple's one Logic Pro - looks like it's hard to sync hardware and software layers even for Apple..). I still prefer flat, stable, cold and clear systems/cores which just works for everything. 

 

2 hours ago, MAM59 said:

Else I did not find any real hint that may be related to LAN speed, sorry.

 

Thanks for trying to help. Most likely this network speed problem leads to the drivers, need some other hand here please - the one who knows how to tweak the drivers. 

 

 

Edited by Ariloum
Link to comment

I did some investigations, here is update:

 

1. I reinstalled win10 again on the unraid server machine and my network hits easy 10gbe with direct smb files copying. But.. iperf3 still shows half speed, so I ended up with bug-reporting this case at https://github.com/esnet/iperf/issues/1718

 

2. I've tested that log autherror idea with the Chrome addons and have unraud webui page running 24/7 in the Chrome incognito window (no addons) - it has no any auth errors in the log for last ~40 hours. So it must be something related to the addons functionality... I have ~23 addons in my Chrome, but all them works well with many of web UI's I run locally (4-5 java, a few REST api's with different types of auth and frameworks, stability UI's like stable diffusion/forge/swarm etc., some handwritten media server with auth and so on). I didn't see cascades of connections on my webUI's logs with these addons like unraid webUI has with it's nginx...

 

and the main problem is still unresolved - unraid declines to work at 10gbe speed with that Intel X520-DA2 network adapter and barely hits 2gbps... 

I purchased this network adapter because of many other forums confirmations/recommendations on it, like these - that is very sad I can't get it working with unraid: 

 

and here: 

 

Link to comment
9 hours ago, Ariloum said:

the main problem is still unresolved - unraid declines to work at 10gbe speed with that Intel X520-DA2 network adapter and barely hits 2gbps... 

You may try test in parallel streams by option "-P". With two streams it reach 9.37Gbps for my X520. All my Unraid use X520 and a ConnectX-3 with Windows, so far so good.

I like OpenSpeedTest docker more then iperf3, btw both available per the need.

 

For Unraid network sharing not reach 10Gbps, this is other issue and you may not fix by change different type NIC.

 

image.png.dfe81f3a86875d2e8d737828bc48e0c3.png

 

[  4]   6.00-7.00   sec   686 MBytes  5.75 Gbits/sec
[  4]   7.00-8.00   sec   691 MBytes  5.80 Gbits/sec
[  4]   8.00-9.00   sec   691 MBytes  5.79 Gbits/sec
[  4]   9.00-10.00  sec   689 MBytes  5.78 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  6.71 GBytes  5.77 Gbits/sec                  sender
[  4]   0.00-10.00  sec  6.71 GBytes  5.76 Gbits/sec                  receiver

 

[  4]   8.00-9.00   sec   583 MBytes  4.89 Gbits/sec
[  6]   8.00-9.00   sec   538 MBytes  4.51 Gbits/sec
[SUM]   8.00-9.00   sec  1.09 GBytes  9.40 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   9.00-10.00  sec   582 MBytes  4.89 Gbits/sec
[  6]   9.00-10.00  sec   534 MBytes  4.48 Gbits/sec
[SUM]   9.00-10.00  sec  1.09 GBytes  9.37 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  5.68 GBytes  4.88 Gbits/sec                  sender
[  4]   0.00-10.00  sec  5.68 GBytes  4.88 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  5.24 GBytes  4.50 Gbits/sec                  sender
[  6]   0.00-10.00  sec  5.24 GBytes  4.50 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec                  receiver

 

 

Edited by Vr2Io
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...