Slow 10Gbe speeds


Recommended Posts

This time one of the most extreme results while downloading a file from the cache (NVME SSD). Just for fun I started a parallel iperf3 from disk (unraid, same file) to memory (windows). What is happening here?!

 

718257397_2020-09-0212_47_18.png.d03fe0afe288197a0808a92b67b87edc.png

 

htop while transfer is running:

2055378540_2020-09-0212_48_07.png.9050653533d5b3926b006dff38652c99.png

 

 

EDIT: Ok, seems to be related to "Tunable (enable Direct IO)". I reset it to "Auto" (was set to "Yes") and now the transfer is faster again:

832740941_2020-09-0213_03_15.png.adaf35c74733044c607afdcec23ee3bf.png

 

But I'm still disappointed. As you can see iperf3 reached up to 900 MB/s. I made an additional iperf3 memory2memory test from my 10G Synolgy NAS to my windows pc:

marc@diskstation:/$ iperf3 -c 192.168.178.21
Connecting to host 192.168.178.21, port 5201
[  5] local 192.168.178.11 port 42952 connected to 192.168.178.21 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.02 GBytes  8.79 Gbits/sec    0    218 KBytes
[  5]   1.00-2.00   sec  1.04 GBytes  8.89 Gbits/sec    0    218 KBytes
[  5]   2.00-3.00   sec  1.04 GBytes  8.98 Gbits/sec    0    218 KBytes
[  5]   3.00-4.00   sec  1.06 GBytes  9.07 Gbits/sec    0    218 KBytes
[  5]   4.00-5.00   sec  1.06 GBytes  9.08 Gbits/sec    0    218 KBytes
[  5]   5.00-6.00   sec  1.05 GBytes  9.02 Gbits/sec    0    218 KBytes
[  5]   6.00-7.00   sec  1.05 GBytes  9.04 Gbits/sec    0    218 KBytes
[  5]   7.00-8.00   sec  1.05 GBytes  9.03 Gbits/sec    0    218 KBytes
[  5]   8.00-9.00   sec  1.05 GBytes  8.99 Gbits/sec    0    218 KBytes
[  5]   9.00-10.00  sec  1.05 GBytes  9.05 Gbits/sec    0    218 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.5 GBytes  8.99 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  10.5 GBytes  8.99 Gbits/sec                  receiver

And again a memory2memory test of the unraid server to my windows pc:

root@Thoth:/boot/config# iperf3 -c 192.168.178.21
Connecting to host 192.168.178.21, port 5201
[  5] local 192.168.178.9 port 51080 connected to 192.168.178.21 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   969 MBytes  8.13 Gbits/sec    0    317 KBytes       
[  5]   1.00-2.00   sec   975 MBytes  8.18 Gbits/sec    0    317 KBytes       
[  5]   2.00-3.00   sec  1002 MBytes  8.41 Gbits/sec    0    320 KBytes       
[  5]   3.00-4.00   sec  1012 MBytes  8.50 Gbits/sec    0    325 KBytes       
[  5]   4.00-5.00   sec   974 MBytes  8.17 Gbits/sec    0    334 KBytes       
[  5]   5.00-6.00   sec   980 MBytes  8.22 Gbits/sec    0    311 KBytes       
[  5]   6.00-7.00   sec  1014 MBytes  8.50 Gbits/sec    0    328 KBytes       
[  5]   7.00-8.00   sec   972 MBytes  8.15 Gbits/sec    0    308 KBytes       
[  5]   8.00-9.00   sec  1.00 GBytes  8.62 Gbits/sec    0    322 KBytes       
[  5]   9.00-10.00  sec   999 MBytes  8.38 Gbits/sec    0    328 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  9.69 GBytes  8.33 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  9.68 GBytes  8.32 Gbits/sec                  receiver

P.S. I tried to change my MTU setting, but it fails.

Edited by mgutt
Link to comment

Today I found this interesting thread where a user wrote directly to the Cache disk as a share

 

I stopped the array, enabled disk shares in the global share settings and started as a first test a transfer to the the music share (that has persistent cache):

2054475197_2020-09-0308_31_05.png.193d8f2cb79ce1d65b43b878cea859f8.png

 

No joke.

 

Then I started a transfer to \\tower\cache\Music

1566609346_2020-09-0308_40_01.png.fbb81e79f9ac014ed6157f12251b089c.png

 

Then I realized, that the shfs processes are still producing load:

527579468_2020-09-0308_41_33.png.cef65e388e3a39d038003468405ef779.png

 

Could it be possible that enabling the disk share causes an "indexing" process or similar? I mean what is this shfs process doing? Nobody is accessing the shares and there are no IOs on the disks:

587355535_2020-09-0308_45_07.png.f5e8ac61fba99dd62893bcf30f725779.png

 

EDIT: Ok, now its becoming more interesting. After rebooting my windows pc the shfs processes are not present anymore:

670435811_2020-09-0309_06_38.png.108b7b1f15065631395efde1b8cfedb9.png

 

Now I can re-test the direct access to the cache share. Transfer to \\tower\Music:

1086396102_2020-09-0309_11_35.png.64fb5bf5bf530f252b3384803c86fafc.png

 

Transfer to \\tower\Cache\Music

579839287_2020-09-0309_10_51.png.1c5315720b4a9d5e9a8ee4724677749d.png

 

So its the same as @falconexe has prooven. Writing to a share causes an extreme speeddown. Sadly I can not use this trick as enabling disk shares is a huge security breach as no user logins are needed.

 

EDIT: Ok, I'll created the file /boot/config/smb-extra.conf with the following content:

[cache]
        path = /mnt/cache
        comment = 
        browseable = yes
        # Private
        writeable = no
        read list = 
        write list = marc
        valid users = marc
        case sensitive = auto
        preserve case = yes
        short preserve case = yes

Then I stopped the array, disabled disk shares in the global share settings and started the array again. Now it shows only "cache" as the only disk share and it accepts only the user "marc":

 

1980370076_2020-09-0309_28_22.png.c89527ac956c90752defa3f395f6f5f2.png

Edited by mgutt
Link to comment
1 hour ago, mgutt said:

. Sadly I can not use this trick as enabling disk shares is a huge security breach as no user logins are needed.

Disk shares can have their security set just like User Shares.    What you cannot do is then separately control access to the files/folders on the share which could be a security issue.

Link to comment
1 hour ago, itimpi said:

Click on the Shares tab

Ok 😳

 

I deleted "/boot/config/smb-extra.conf" again 😉

 

Now I need to find out why one windows pc is able to produce such a high load on the shfs processes. At the moment I guess its the Windows File History backup. Or it was the mounted NFS share. I removed today the NFS features of my windows pc. We will see if this helps.

 

EDIT: Ok, it must be the Windows File History. I started the backup and parallel transfering to \\tower\cache\music (to bypass the shfs process) is extremely slow:

250435399_2020-09-0310_53_05.thumb.png.0ed010b79eee1e53da75d0d1e3185142.png

 

After stopping the backup:

19589767_2020-09-0310_53_41.thumb.png.5019ea8e1c3d5d9559e720fa61c5d59b.png

 

Edited by mgutt
Link to comment
2 hours ago, mgutt said:

So its the same as @falconexe has prooven. Writing to a share causes an extreme speeddown.

Thanks for confirming @mgutt. I am still having this issue without my custom workaround. I appreciate all of the testing you did. Interesting that 2 computers give different results with regards to load/processes.

 

I am going to build a new Windows 10 rig next month. Once it is up, I'll try again and see if I get any different results. I am also almost finished building a new house with CAT7 wired throughout and a dedicated server room. I am curious if the new network environment will afford me higher speeds. Currently, I lose 66% of my throughput due to this SHFS overhead.

 

I'll subscribe to this thread and continue to watch. Cheers!

Link to comment
7 minutes ago, falconexe said:

Interesting that 2 computers give different results with regards to load/processes.

I'm using only one pc while testing. And I have new test results: Transfering to \\tower\cache\music (again bypassing shfs) is slow while backup is running, but transfer to \\192.168.178.9\cache\music with the same user credentials and same ethernet ports is fast?!

1435234214_2020-09-0311_16_44.thumb.png.21cb8418d4325965bd6a108ca6bc6768.png

 

 

Maybe you can test this scenario as well? For me it looks like Unraid is not good at multi-threading one smb user.

 

Edited by mgutt
Link to comment
8 minutes ago, mgutt said:

I'm using only one pc while testing. And I have new test results: Transfering to \\tower\cache\music (again bypassing shfs) is slow while backup is running, but transfer to \\192.168.178.9\cache\music with the same user credentials and same ethernet ports is fast?!

 

 

Maybe you can test this scenario as well? For me it looks like Unraid is not good at multi-threading one smb user.

 

Hmm, that is VERY ODD. Almost sounds like a DNS issue within your network. When I get some time, I'll run side-by-side comparisons as follows and report back.

 

  • \\HostName\Cache\Share
  • \\IPAddress\Cache\Share

 

 

Also, did you edit hour HOSTS file in Windows 10? This is how I can resolve both the 10GBe peer-to-peer connection along side the 1GBe standard network connection. Both are resolvable by my UNRAID host name (or IP).

 

https://www.groovypost.com/howto/edit-hosts-file-windows-10/

Edited by falconexe
Link to comment
On 9/3/2020 at 11:32 AM, falconexe said:

Also, did you edit hour HOSTS file in Windows 10?

Nope. Server and PC are both connected to a 10G switch and are on both sides the only connections.

 

Regarding the performance problems and DNS: I changed the File History Backup to \\192.168.178.9\marc\backup and started it. A parallel transfer returned the following speeds:

to \\192.168.178.9\cache\music 400 MB/s (this time the impact was not as huge as in my last tests)

to \\tower\cache\music 600 MB/s (same server, same cable, same smb credentials)

 

Then I reverted the File History Backup to \\tower\marc\backup and I have the completely same results as with the IP (this time reverted as expected):

\\tower\cache\music 400 MB/s and

\\192.168.178.9\cache\music 600 MB/s

 

So its not DNS related.

 

And the really bad thing is that Windows File History is active for a very long time (in my case). After copying all files it seems to compare everything as "inotifywait -mr /mnt/disk8/Marc/Backup/PC" returned an "infinite" amount of files that are accessed:

1082404448_2020-09-0318_20_30.thumb.png.825e3151205b16f69f84a249621f0a0a.png

 

This results in very low transfer on the disk:

1043478076_2020-09-0318_03_07.png.9e3c8a6832bc1ccfb7366fd99fdfb105.png

 

But it produces high CPU load on the shfs processes:

1846114077_2020-09-0318_07_35.png.e95802eabb2c3d73c80a61d0b1f9b2b4.png

 

It seems Unraid has a problem with many parallel running SMB sessions (NumOpens) per SMB connection as this is the only difference between the DNS name and the IP of the server while the backup is running:

PS C:\WINDOWS\system32> Get-smbconnection

ServerName    ShareName UserName             Credential                    Dialect NumOpens
----------    --------- --------             ----------                    ------- --------
192.168.178.9 cache     DESKTOP-123456\John MicrosoftAccount\john         3.1.1   4
192.168.178.9 IPC$      DESKTOP-123456\John MicrosoftAccount\john         3.1.1   0
THOTH         cache     DESKTOP-123456\John MicrosoftAccount\[email protected] 3.1.1   4
THOTH         manictime DESKTOP-123456\John MicrosoftAccount\[email protected] 3.1.1   0
THOTH         Marc      DESKTOP-123456\John MicrosoftAccount\[email protected] 3.1.1   495

 

Edited by mgutt
Link to comment

At the moment I'm trying to find unraids SMB limits. To test this I created on the unraid server 10000 random files as follows:

# create random files on unraid server
mkdir /mnt/cache/Marc/tower3
for n in {1..10000}; do
    dd status=none if=/dev/urandom of=/mnt/cache/Marc/tower3/$( printf %03d "$n" ).bin bs=1 count=$(( RANDOM + 1024 ))
done

To exclude DNS issues of my network and to be able to target the server with two different smb connections I added to following to the windows hosts file:

192.168.178.9 tower3
192.168.178.9 tower4

Then I copied a 10GB file to \\tower3\marc\tower3 and to \\tower4\marc\tower4 at the same time. Both parallel tranfers reached stable 300 MB/s. After 50% of the file was uploaded, I started robocopy as follows:

start robocopy \\tower3\marc\tower3\ C:\Users\Marc\Documents\Benchmark\ /E /ZB /R:10 /W:5 /TBD /IS /IT /NP /MT:128

And while the transfer to \\tower4 even rises (370 - 450 MB/s) the transfer to \\tower3 is nearly dead:

1291736904_2020-09-1114_05_40.png.be92a520c7ea7737d5d526be44a741e8.png

 

 

Can anyone verify my results?

 

Edited by mgutt
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.