Jump to content

Raw Samba performance


bubbaQ

Recommended Posts

I have been trying to do some tuning for raw Samba performance, independent of unRAID or parity protection, so to evaluate the benefits of a faster cache disk subsystem.

 

I set up 2GB ramdisks on my test Linux box, and XP SP3 workstation, so to eliminate any disk I/O issues.  I implemented all the registry tweaks for XP, to increase TCP windowing, etc., and did the same on Linux.  No LIP, MTU left at 1500.  Test was a single 2GB file made from /dev/random (i.e. uncompressible single file).  Latency is 0ms.  Systems are connected with a crossover cable and no switch.

 

I was able to do sustained well over 800Mb/sec  (megabits per second) via http with Apache, downloading files from Linux (ramdisk) to XP (ramdisk).  I was able to get similar stats using FTP.

 

With Samba, the best I could get was 480Mb/sec (ramdisk to ramdisk).  The graph below is for HTTP transfer, followed by the Samba copy.  That says it all.

 

It ain't the disk, and it ain't the network.  It ain't CPU, as I profiled both, and CPU was a flat 10 to 12% for all runs. 

 

So has anyone done better with raw Samba?  Suggestions?

image1.jpg.8be04a7b2a8d8f1a9b1ed8f6cbcf10e8.jpg

Link to comment

Here ya go....

 

[global]
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=65536 SO_SNDBUF=65536
strict sync=off
read raw=yes
write raw=yes

workgroup = WORKGROUP
server string = SambaTest

security = SHARE
guest account = root
guest ok = Yes
guest only = Yes

load printers = no
log file = /var/log/samba.%m

max log size = 50
log level=1
dns proxy = no

[root]
path = /
read only = No
force user = root
map archive = no
map system = no
map hidden = no
create mask = 0644
directory mask = 0755

Link to comment

Gah, just saw this but I just got a call.

 

Try dropping the buffers way down. 8192 ought to be safe. I know 64K is the common tip for performance but beyond a few K it has diminishing effects and could be causing issues elsewhere. On that note what nic/driver are you using?

 

Disable logging, too.

Link to comment
Okay, so what kernel is this?

 

2.6.33.4

 

 

What's the client to server rtt?

 

64 bytes from 192.168.0.44: icmp_req=1 ttl=64 time=0.043 ms

64 bytes from 192.168.0.44: icmp_req=2 ttl=64 time=0.042 ms

64 bytes from 192.168.0.44: icmp_req=3 ttl=64 time=0.042 ms

64 bytes from 192.168.0.44: icmp_req=4 ttl=64 time=0.042 ms

64 bytes from 192.168.0.44: icmp_req=5 ttl=64 time=0.042 ms

64 bytes from 192.168.0.44: icmp_req=6 ttl=64 time=0.043 ms

 

 

Try dropping the buffers way down. 8192 ought to be safe.

 

Much worse performance.... under 200Mbps.

 

Disabling logging has no effect.

Link to comment

NIC:

*-network
               description: Ethernet interface
               product: RTL8111/8168B PCI Express Gigabit Ethernet controller
               vendor: Realtek Semiconductor Co., Ltd.
               physical id: 0
               bus info: pci@0000:02:00.0
               logical name: eth0
               version: 02
               serial: 00:24:1d:1e:e5:c8
               size: 1Gbit/s
               capacity: 1Gbit/s
               width: 64 bits
               clock: 33MHz
               capabilities: pm msi pciexpress msix vpd bus_master cap_list rom ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
               configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full ip=192.168.0.53 latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
               resources: irq:25 ioport:de00(size=256) memory:fdaff000-fdafffff(prefetchable) memory:fdae0000-fdaeffff(prefetchable) memory:fda00000-fda0ffff(prefetchable)

 

 

But don't forget, I am able to sustain well over 800Mbps with http and ftp on this same system and same NIC.

Link to comment

Disable the buffer settings, leave nodelay & lowdelay. See what autotuning can manage. Make sure it's enabled:

 

 cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf

 

1=on

 

Yea, I didn't notice your logging was already down at 1. Not much room for improvement there.

Link to comment

You mentioned profiling but I couldn't tell if you compared disk utilization between http & Samba transfers? Anything interesting there? The problem should show itself as cpu, net, or disk. e.g. If the Samba transfer is slower but disk utilization similar or higher than with http...

Link to comment

Here is mpstat output.  There is no iowait since the file is on ramdisk. 

 

root@dev4:~# mpstat 2 300
Linux 2.6.33.4-smp (dev4)       08/11/2011      _i686_  (2 CPU)

12:19:02 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
12:19:04 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:19:06 AM  all    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
12:19:08 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:19:10 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:19:12 AM  all    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
12:19:14 AM  all    0.00    0.00    0.00    0.00    0.00    0.25    0.00    0.00   99.75
12:19:16 AM  all    1.18    0.00    4.50    0.00    0.24    2.37    0.00    0.00   91.71
12:19:18 AM  all    1.69    0.00    9.16    0.00    0.00    6.02    0.00    0.00   83.13
12:19:20 AM  all    1.91    0.00    9.33    0.00    0.00    5.98    0.00    0.00   82.78
12:19:22 AM  all    1.72    0.00    9.36    0.00    0.00    5.67    0.00    0.00   83.25
12:19:24 AM  all    1.56    0.00    8.89    0.00    0.00    5.33    0.00    0.00   84.22
12:19:26 AM  all    1.66    0.00    8.79    0.00    0.00    5.46    0.00    0.00   84.09
12:19:28 AM  all    2.45    0.00    8.46    0.00    0.00    4.90    0.00    0.00   84.19
12:19:30 AM  all    2.01    0.00    8.26    0.00    0.22    5.58    0.00    0.00   83.93
12:19:32 AM  all    2.00    0.00    9.13    0.00    0.00    4.90    0.00    0.00   83.96
12:19:34 AM  all    1.58    0.00    8.33    0.00    0.00    4.73    0.00    0.00   85.36
12:19:36 AM  all    1.88    0.00    8.47    0.00    0.00    5.88    0.00    0.00   83.76
12:19:38 AM  all    1.83    0.00    8.47    0.00    0.00    5.26    0.00    0.00   84.44
12:19:40 AM  all    2.03    0.00    7.88    0.00    0.23    4.73    0.00    0.00   85.14
12:19:42 AM  all    1.44    0.00    8.65    0.00    0.00    6.49    0.00    0.00   83.41
12:19:44 AM  all    1.70    0.00    8.50    0.00    0.00    5.83    0.00    0.00   83.98
12:19:46 AM  all    1.78    0.00    8.67    0.00    0.00    5.11    0.00    0.00   84.44
12:19:48 AM  all    1.69    0.00    8.70    0.00    0.00    6.28    0.00    0.00   83.33
12:19:50 AM  all    1.81    0.00    8.62    0.00    0.00    5.44    0.00    0.00   84.13
12:19:52 AM  all    1.58    0.00    8.14    0.00    0.00    4.75    0.00    0.00   85.52
12:19:54 AM  all    2.01    0.00    8.50    0.00    0.00    5.37    0.00    0.00   84.12
12:19:56 AM  all    1.42    0.00    7.78    0.00    0.24    4.72    0.00    0.00   85.85
12:19:58 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:20:00 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:20:02 AM  all    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
12:20:04 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

Copy same file via HTTP:

12:24:04 AM  all    0.00    0.00    0.28    0.00    0.00    0.00    0.00    0.00   99.72
12:24:06 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:24:08 AM  all    0.00    0.00    0.24    0.00    0.00    0.00    0.00    0.00   99.76
12:24:10 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:24:12 AM  all    0.23    0.00    2.25    0.00    0.00    2.93    0.00    0.00   94.59
12:24:14 AM  all    0.23    0.00    9.17    0.00    0.00   12.39    0.00    0.00   78.21
12:24:16 AM  all    0.00    0.00    8.62    0.00    0.00   13.55    0.00    0.00   77.83
12:24:18 AM  all    0.00    0.00    6.68    0.00    0.00    6.93    0.00    0.00   86.39
12:24:20 AM  all    0.00    0.00    7.67    0.00    0.00    8.91    0.00    0.00   83.42
12:24:22 AM  all    0.25    0.00    5.96    0.00    0.00    8.68    0.00    0.00   85.11
12:24:24 AM  all    0.00    0.00    3.98    0.00    0.00    4.73    0.00    0.00   91.29
12:24:26 AM  all    0.00    0.00    3.74    0.00    0.00    4.49    0.00    0.00   91.77
12:24:28 AM  all    0.00    0.00    3.51    0.00    0.00    3.76    0.00    0.00   92.73
12:24:30 AM  all    0.00    0.00    3.99    0.00    0.00    5.24    0.00    0.00   90.77
12:24:32 AM  all    0.00    0.00    3.74    0.00    0.25    4.99    0.00    0.00   91.02
12:24:34 AM  all    0.00    0.00    4.22    0.00    0.00    5.21    0.00    0.00   90.57
12:24:36 AM  all    0.00    0.00    4.27    0.00    0.00    4.52    0.00    0.00   91.21
12:24:38 AM  all    0.00    0.00    1.10    0.00    0.00    1.37    0.00    0.00   97.53
12:24:40 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:24:42 AM  all    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75
12:24:44 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
12:24:46 AM  all    0.00    0.00    0.25    0.00    0.00    0.00    0.00    0.00   99.75

Link to comment

FWIW, offloading was disabled.

 

root@dev4:~# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off

Link to comment

mpstat *is* part of sysstat.

 

No, the file could not have been in all cache.  I created the ramdisk to leave under 1GB of RAM, disabled swap, and the file being copied is 2GB.

 

But whether it was all in cache is not the issue.  The whole purpose of this is to eliminate any hardware latency, so as to isolate the application layer performance so it could be tuned to establish a solid top end benchmark to shoot far when tuning for real-world conditions (i.e. reading from spinning disk and not ramdisk).

Link to comment

My cache comment was off hand. More about things working against us than a flaw in testing.

 

Granted, I've been out of this for a few years but I think my approach is getting back on track. Your Samba numbers are far enough off that I'd want to find the cause first, because if it's a generic install the same is likely to happen again. To me the problem could still be almost anywhere. I better understand your tests now but nothing so far suggests an answer for the poor performance.

 

What I was hoping for was network utilization between transfer methods. mpstat doesn't do that. sar does, which is where the "other systat tools" came in. Really, start/end reads from ethtool, netstat, whatever. I'm trying to simplify side-effects (logging, offloading) then look for what's markedly different to expose the problem system. In this case, how much is moving across the wire for each method. If more is transferred with Samba, what? If it's the same, just stretched out, then packet dumps should tell more.

Link to comment

I did check data size for each method., and it was nearly identical.  So the next check will be packet rate.  If number of packets is high, that's a samba protocol inefficiency.  If the inter-packet delay is higher with Samba over other protocols, that's a samba core inefficiency.

Link to comment

I checked the packet numbers, and both Samba and http are within less than 1% difference in number of packets.

 

I wonder if Samba is somehow limiting the efficiency of TCP windowing by demanding replies before proceeding in a way that hampers windowing.  That will take a session with etherreal to determine.

Link to comment

I checked the packet numbers, and both Samba and http are within less than 1% difference in number of packets.

 

I wonder if Samba is somehow limiting the efficiency of TCP windowing by demanding replies before proceeding in a way that hampers windowing.  That will take a session with etherreal to determine.

very old, but possibly interesting: http://lists.samba.org/archive/samba/2003-December/077198.html
Link to comment

I've been forcing myself to not suggest the nic. Now google is turning up reports of Realteks affecting Samba serving performance, even when http & ftp are working fine. Don't know if there's a common driver or chipset but it's hard to nail down with that brand anyway. Have anything else you could try? At least it'd be a quick test.

 

Link to comment

I was planning on trying a different NIC, but I dug through my parts box and I've used all the good ones in various boxen, and all that is left is crap, except for an old Intel Pro 1000MT, but it is PCI, not PCI-e.

 

Any suggestions pro/con on an Intel PRO/1000 PT?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...