SMB Performance Tuning


mgutt

Recommended Posts

I've got some fun things I've noticed (without any in-depth research) but simple anecdota - if things are "clean" - I haven't done anything to lock up the samba connection, I can get 100-200MB/sec between my Windows system and Unraid (2.5G onboard ethernet on both connected to the same 2.5G switch) and that's great. What sucks is when samba locks up (and seems to happen frequently enough to go to Google once again) and everything stalls out for what feels like an eternity.

 

Just minutes ago I tried to move one folder to another inside of the same share (/mnt/user/foo) and same mapped drive and all, not even that much data (~5 gig) and my entire Windows explorer process wound up locking up for well over 5 minutes. It never timed out or gave up, it just sat there. I can't figure out a discernable pattern so far, other than shfs processes do seem to be busier at the moment (I am doing some other stuff on the array, usually, but nothing that should be completely freezing up simple samba operations)

 

Link to comment

Disk shares are really night and day. That shfs overhead is a killer. Seems like my system gets bogged down possibly with I/O having to pass through the shfs layer and that locks things up from the SMB server reading from it... because right now mounting a disk share directly is like it's directly attached to my Windows system.

 

At least for now so I don't have to worry about data corruption weird stuff I'm only going to do activities inside of the specific disk share itself. Not move things in and out of it. It'll let me at least do a lot of cleanup on the specific disk, stuff that was sometimes super painful when trying to go through the user share.

Link to comment
  • 4 weeks later...
  • 1 month later...
On 3/13/2023 at 9:47 AM, meganie said:

The non-pro variants support RDMA just fine: https://network.nvidia.com/pdf/user_manuals/ConnectX-3_VPI_Single_and_Dual_QSFP_Port_Adapter_Card_User_Manual.pdf

 

"Client RDMA Capable: False" is just listed because Windows 10 Pro doesn't support RDMA. That's why I would have to upgrade my client to Windows 10 Pro for Workstations to support it.

But as far as I know Unraid/Samba don't support it, still listed as prototype: https://wiki.samba.org/index.php/Roadmap#SMB2.2FSMB3

From what I understand, RDMA is no longer prototype/experimental, since it is a server multi channel feature (out of experimental since Samba 4.15 release) and it is somewhat detailed in the interfaces option in their documentation https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html . This gives me some confidence that I won't run into bugs with using RDMA.

 

I tried this on Unraid 6.12.2, Windows 11 Education client, and both machines using MCX314A-BCCT NICs. Both machines are directly connected since I don't have 40gbps network switches and I don't think I can get multi NIC SMB to work when both ports on each machine are connected to different networks (my home network and direct connection between the two machines).

 

The NIC from Windows side does show it is RSS and RDMA capable from the command:

PS C:\Windows\system32> Get-SmbClientNetworkInterface

Interface Index RSS Capable RDMA Capable Speed   IpAddresses                               Friendly Name
--------------- ----------- ------------ -----   -----------                               -------------
16              True        True         40 Gbps {10.6.13.18}                              mlx_direct_40g

For the SMB Extras configuration, I have this:

interfaces = "10.6.13.17;capability=RSS,capability=RDMA,speed=40000000000" "10.32.0.46;capability=RSS,speed=10000000000"

I only set up the 40gbps direct connection to have RDMA since I don't have any other clients in my home network that can use RDMA. Maybe in the future someday since I am considering in building a test bench pc, so I'll just get another Mellanox card to pair that with it.

 

The Unraid machine is using a 6 core i5-8600k and Windows machine is using a 12 core i7-12700k, so I am not sure why there is only 4-5 TCP connections only between the two machines...

root@Alagaesia:~# netstat -tnp | grep smb
tcp      234      0 10.6.13.17:445          10.6.13.18:50885        ESTABLISHED 3983/smbd           
tcp        0      0 10.6.13.17:445          10.6.13.18:58921        ESTABLISHED 28874/smbd          
tcp        0      0 10.32.0.46:445          10.32.1.32:42472        ESTABLISHED 26795/smbd          
tcp      117   1528 10.6.13.17:445          10.6.13.18:50883        ESTABLISHED 3983/smbd           
tcp      234      0 10.6.13.17:445          10.6.13.18:50577        ESTABLISHED 3983/smbd           
tcp      234      0 10.6.13.17:445          10.6.13.18:50884        ESTABLISHED 3983/smbd 

I'm assuming 4 of those connections are for RSS and one of them is for RDMA.

 

 

This next command on the Windows machine is a little interesting though:

PS C:\Windows\system32> Get-SmbMultichannelConnection -IncludeNotSelected

Server Name Selected Client IP   Server IP  Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
----------- -------- ---------   ---------  ---------------------- ---------------------- ------------------ -------------------
10.6.13.17  False    10.6.13.18  10.6.13.17 16                     7                      False              True
10.6.13.17  True     10.6.13.18  10.6.13.17 16                     7                      True               False
10.6.13.17  False    172.20.0.46 10.32.0.46 4                      10                     True               False
10.6.13.17  False    172.20.0.46 10.6.13.17 4                      7                      False              True
10.6.13.17  False    172.20.0.46 10.6.13.17 4                      7                      True               False

The connections that are probably useful here are those between 10.6.13.18 and 10.6.13.17. For some reason I have two connections where RSS is enabled while RDMA is disabled, and vise versa. I don't know if this is by design or there is some other setting that I missed to have a single connection be RSS and RDMA capable.

 

I don't have the time to do further testing with this for a while, but it seems like the RSS capable connection is selected most of the time, while I only saw the RDMA capable connection get selected once in a random test. 

Edited by Percutio
Link to comment

From some of the charts that I looked at, SMB Direct is available for Workstation/Enterprise/Education editions, excluding the server editions. I am using Windows 11 Education in the test but Education features have been very difficult to track over the years since when I first got Windows 10 Education, it is basically Enterprise without Cortana and a little bit extra, but Windows 11 Education has a lot more differences...

 

EDIT:

This is what I see in the Windows Features window so hopefully this does mean I can get RDMA working someday when I get more time again :D

 

Screenshot 2023-07-10 020819.png

Edited by Percutio
Link to comment
  • 2 months later...

 

On 2/8/2023 at 5:37 PM, meganie said:

But in unraid I get zero lines with "egrep 'CPU|eth1' /proc/interrupts":

1197314073_UnraidRSS.thumb.JPG.e4b12e87d4fe769ee10d3abbb8354274.JPG

(eth0 is the non RSS capable onboard NIC)

 


I'm using the same card, and I'm pretty sure my configuration is working for RSS.  The "this command must show..." just doesn't account for how the mellanox driver reports its interrupts (maybe only in certain configurations of the card/driver/firmware?). Try:

# egrep 'CPU|eth*|mlx' /proc/interrupts

image.thumb.png.5407c8a6084eff83cf181a3a4ce33355.png

 

On the Windows side, you may also want to run something like the following, in an admin Powershell:

> Set-NetAdapterRss  -Name "Ethernet_10Gbe" -MaxProcessors 6 -NumberOfReceiveQueues 6 -BaseProcessorNumber 4 -MaxProcessorNumber 16 -Profile Closest

image.thumb.png.d4dc10ec25ce3e52c7ddb1adbdd3934a.png

 

I'm not sure why Windows is reporting 'Client RDMA Capable = False' for me; I thought at one point it showed 'True' during my setup/config of the RSS feature, but I might be mis-remembering. Docs seem to indicate that you need the Windows Workstation license for RDMA as a *server*, but not for RDMA *client*, so I'd hoped to be able to get that flipped on as well

 

Though OTOH, I think the ConnectX 3 [non-Pro] require correctly configured PFC or Global Pause in order to function over ethernet/fiber (ie, non-infiniband). This requires support in your switch, too.

 

RoCEv2 (in the ConnectX 3 Pro and beyond) might be more easily managed in that regard

Edited by nick5429
Link to comment
  • 2 weeks later...

Unraid 6.12 added a feature called "exclusive shares" that essentially avoids FUSE / shfs for shares where:

  1. The share is pool-only with no secondary storage.
  2. All files for the share are in the pool (ie there's no files in the array from if you ever used secondary storage for that share in the past)
  3. It's not shared over NFS.

You just need to enable it in the Global Share Settings.

 

Using this feature, I was finally able to achieve fast speeds from my NVMe SSD pool (two NVMe drives in a ZFS mirror) over a 10Gbps network using SMB:

 

1328046850_37_complete_03-09_25_27.png.4aa454f56df9de71a5e15242c3749bb3.png

 

Transfers from regular spinning drives are faster too, but obviously not as fast as this.

Edited by Daniel15
Link to comment

I was able to drastically improve my transfer speed from my windows machine to unraid cache drive following the RSS configuration for my intel X540-t 10gig cards, I have two 970 evo mirrored on ZFS in unraid as my cache drive. I used one of the suggested comments to access a custom "share" to bypass the overhead you can get from unraid file handling. I'm getting close to full 800MB/sec speed copying from my windows VM to unraid. Far from it the other way around. 280MB/sec max. Looks like RSS is not working the other way ? C is windows drive Z is the share on unraid.

Screenshot 2023-10-08 at 17.13.21.png

Edited by ArthurM
Link to comment
18 hours ago, ArthurM said:

I was able to drastically improve my transfer speed from my windows machine to unraid cache drive following the RSS configuration for my intel X540-t 10gig cards, I have two 970 evo mirrored on ZFS in unraid as my cache drive. I used one of the suggested comments to access a custom "share" to bypass the overhead you can get from unraid file handling. I'm getting close to full 800MB/sec speed copying from my windows VM to unraid. Far from it the other way around. 280MB/sec max. Looks like RSS is not working the other way ? C is windows drive Z is the share on unraid.

Screenshot 2023-10-08 at 17.13.21.png

I've been able to get 10gig speeds one way from my windows machine to unraid. However when copying from my unraid nvme cache drive to my windows machine, I'm only getting 300MB/sec transfer speeds.

 

Ive turned all offloading OFF on windows and unraid machine

Buffer is at 4096, MTU 9014 on both sides.

Windows (5950x) 32gig - Unraid machine has 1800x with 32gig ram

Drive on unraid is a pair of NVME 970 evo mirrored in ZFS.

 

Iperf benchmarks:

 

With iperf3 no multiple connections:

  • windows running iperf3 -s and unraid as client: I'm getting 9.05Gbit/sec
  • unraid as server and windows as client: 7.05 Gbit/sec

Iperf3 with -P4

9.90 Gbit/sec both ways.

 

When switching configuration and rebooting I'm seeing this error in unraid logs:

Oct 9 11:06:12 Tower smbd[2441]: [2023/10/09 11:06:12.276350, 0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)

Oct 9 11:06:12 Tower smbd[2441]: smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 4 (sequence id 4) (granted = 1, low = 1, range = 1)

Oct 9 11:06:12 Tower smbd[2441]: [2023/10/09 11:06:12.278272, 0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)

Oct 9 11:06:12 Tower smbd[2441]: smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 3 (sequence id 3) (granted = 1, low = 1, range = 1)

 

Feels like a negotioation error, it only happens when i restart my machine and initiate a transfer. (windows file explorer gui seems stuck for a few seconds (20-30 sec) then start transferring...

 

Here are my latest crystal disk benchmarks:

I had missconfigured my PCI lanes on my windows machines my nvme on my windows machine is now getting enough speed to saturate 10GIG both ways. (C is windows machine, Z is unraid) Benchmarks looks good but when copying files from Z to C only getting 300MB/sec transfer speed...

 

diskmark.JPG

Link to comment
1 hour ago, ArthurM said:

I've been able to get 10gig speeds one way from my windows machine to unraid. However when copying from my unraid nvme cache drive to my windows machine, I'm only getting 300MB/sec transfer speeds.

 

Ive turned all offloading OFF on windows and unraid machine

Buffer is at 4096, MTU 9014 on both sides.

Windows (5950x) 32gig - Unraid machine has 1800x with 32gig ram

Drive on unraid is a pair of NVME 970 evo mirrored in ZFS.

 

Iperf benchmarks:

 

With iperf3 no multiple connections:

  • windows running iperf3 -s and unraid as client: I'm getting 9.05Gbit/sec
  • unraid as server and windows as client: 7.05 Gbit/sec

Iperf3 with -P4

9.90 Gbit/sec both ways.

 

When switching configuration and rebooting I'm seeing this error in unraid logs:

Oct 9 11:06:12 Tower smbd[2441]: [2023/10/09 11:06:12.276350, 0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)

Oct 9 11:06:12 Tower smbd[2441]: smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 4 (sequence id 4) (granted = 1, low = 1, range = 1)

Oct 9 11:06:12 Tower smbd[2441]: [2023/10/09 11:06:12.278272, 0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)

Oct 9 11:06:12 Tower smbd[2441]: smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 3 (sequence id 3) (granted = 1, low = 1, range = 1)

 

Feels like a negotioation error, it only happens when i restart my machine and initiate a transfer. (windows file explorer gui seems stuck for a few seconds (20-30 sec) then start transferring...

 

Here are my latest crystal disk benchmarks:

I had missconfigured my PCI lanes on my windows machines my nvme on my windows machine is now getting enough speed to saturate 10GIG both ways. (C is windows machine, Z is unraid) Benchmarks looks good but when copying files from Z to C only getting 300MB/sec transfer speed...

 

diskmark.JPG

Adding the two bolded lines and changing MTU on unraid to 9000 not 9014 like my windows client. (read that it can be an issue with linux...) gets me 10gig speeds for a few gigabytes then it drops for a bit. 960 evo nvme cache running out ?

I think this should be good enough now :) Its much more stable on the other end...

 

server multi channel support = yes
aio read size = 1
aio write size = 1

interfaces = "192.168.5.20;capability=RSS,speed=10000000000" "192.168.0.199;speed=1000000000"

 

Screeshots are two large video files totalling 20GB in size.

 

nice.JPG

mhhhh.JPG

Edited by ArthurM
cleanup
Link to comment
  • 2 months later...

Hi all, popping in here as I have a thread where I was troubleshooting over here which lead to me discovering that SMB multichannel doesn't work with a single nic, unless you make the smb-extras changes.

Unfortunately, with both those changes, and now with new transceivers having arrived and being installed, I still don't get a multichannel smb connection to unraid, the server doesn't even show up in powershell under that command at all. I mapped it using \\servername\sharename through this pc, but I can't imagine that would cause it to not work and I'd have to map it through cli to get a multichannel connection, right?

I've verified that I get a multichannel connection on the server if I fire up windows server, I've mapped, unmapped the share, rebooted, and turned smb multichannel on and off on the unraid box, but it seems to just not want to establish a multichannel link at all for some reason.

So, as the start of this thread is fairly old now, I'm wondering what the "smb multichannel connection for dummies" steps are to break it down to basics, as I still have a whoooole mess of troubleshooting stuck in my head, and a fresh starting point of a known good setup would be helpful. Perhaps I should just redo my usb and have an all new start.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.