Jump to content
MatzeHali

SMB & ZFS speeds I don't understand ...

13 posts in this topic Last Reply

Recommended Posts

Hi there,

 

I'm trying to find a sweet spot for a ZFS configuration with 15 disks and potentially a level 2 ARC, and thus, am benchmarking around a bit.

Configuration is a 16core XEON with 144GB of RAM, 15 WD RED 14TB drives, a 1TB EVO970 NVME and 10GBE ethernet connectivity.

I access via Samba with MacOSX.

While trying different RAID-Z configurations, with trying 16GB write once, read multiple times scenario:

 

Writing directly to the NVME as a single drive pool (for testing only) I can reach 880MB/s.

Reading from that same place multiple times I get 574MB/s.

Writing to a 2x7raidz1 I'm getting write speeds at about 760MB/s, but the exact same read speeds of 574MB/s.

 

Since the file is easily small enough to fit in the ARC, I would assume, latest with the second read it comes from RAM, so I doubt the read speeds on the box are the problem.

 

So, my question is, why is my read speed somehow capped at 574MB/s?

 

MTU-size is configured to 9000 and SMB-signing is switched off, I don't have any other ideas to try.

 

Thanks,

 

M

 

Share this post


Link to post

Unraid doesn't support ZFS, no yet anyway, if you're using the plugin you should use the existing thread for support.

Share this post


Link to post
Posted (edited)

Hi Johnnie,

 

since this is not a ZFS-specific question, but rather a SMB-specific question, I thought I'd go with the general forum, since from the box itself, the read and write speeds are much higher than that, so it seems to be a network rather than a ZFS problem.

 

Thanks,

 

M

 

edit: I also now tested with a share on the UnRAID array, now, and it's the same pattern. Write speeds at about 840MB/s, reads at 570-580MB/s.

Edited by MatzeHali

Share this post


Link to post

Hi Johnnie,

 

I have not.

I'll try to find out how to do that from a Mac to an UnRAID box and report back. ;)

 

Thanks for pointing me in that direction. While I'm quite firm in doing this technical stuff and can learn quickly, I'm more the creative mind and lack a lot of background, so forgive me when it takes me a day or two to get this going.

 

Cheers,

 

M

Share this post


Link to post

So, here's the result for the iperf-run, which was done with switch -d, so this speed goes in both directions:

 

Accepted connection from 192.168.0.51, port 55664
[  5] local 192.168.0.111 port 5201 connected to 192.168.0.51 port 55665
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  1.15 GBytes  1174 MBytes/sec                  
[  5]   1.00-2.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   2.00-3.00   sec  1.15 GBytes  1177 MBytes/sec                  
[  5]   3.00-4.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   4.00-5.00   sec  1.15 GBytes  1177 MBytes/sec                  
[  5]   5.00-6.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   6.00-7.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   7.00-8.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   8.00-9.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   9.00-10.00  sec  1.15 GBytes  1176 MBytes/sec                  
[  5]  10.00-10.00  sec  1.83 MBytes  1255 MBytes/sec

 

I understand that I have to live with an overhead in SMB, but reading at less than half consistently seems odd.

 

Thanks for any ideas.

Share this post


Link to post

If the NVMe device is assigned to cache try enabling disk shares and see if read speeds are better using that, it won't help with ZFS pools though, but in my experience ZFS has almost always faster writes than reads.

Share this post


Link to post
Posted (edited)

Do you mean, assign it as a Cache for the UnRAID-array?

Just to make sure I'm understanding correctly. ;)

 

At the moment I'm running some FIO benches on the box and getting these read speeds from the ZFS-array doing sequential reads bigger than the RAM available without any ARC2-cache drive attached to the pool:

seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
...
fio-3.15
Starting 32 processes
seqread: Laying out IO file (1 file / 262144MiB)
Jobs: 32 (f=32): [R(32)][100.0%][r=16.4GiB/s][r=16.8k IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=32): err= 0: pid=29109: Mon Jun 29 15:55:39 2020
  read: IOPS=17.2k, BW=16.8GiB/s (18.1GB/s)(8192GiB/487301msec)
    slat (usec): min=58, max=139996, avg=1848.09, stdev=1446.29
    clat (nsec): min=348, max=4571.0k, avg=3394.21, stdev=7243.75
     lat (usec): min=59, max=140000, avg=1853.75, stdev=1446.70
    clat percentiles (nsec):
     |  1.00th=[   956],  5.00th=[  1336], 10.00th=[  1608], 20.00th=[  2024],
     | 30.00th=[  2352], 40.00th=[  2640], 50.00th=[  2928], 60.00th=[  3184],
     | 70.00th=[  3440], 80.00th=[  3792], 90.00th=[  4448], 95.00th=[  5856],
     | 99.00th=[ 17280], 99.50th=[ 20352], 99.90th=[ 26752], 99.95th=[ 37120],
     | 99.99th=[111104]
   bw (  MiB/s): min= 3007, max=18110, per=99.90%, avg=17196.46, stdev=33.55, samples=31168
   iops        : min= 3005, max=18110, avg=17183.94, stdev=33.48, samples=31168
  lat (nsec)   : 500=0.01%, 750=0.19%, 1000=1.09%
  lat (usec)   : 2=17.99%, 4=64.96%, 10=12.76%, 20=2.45%, 50=0.51%
  lat (usec)   : 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=0.66%, sys=80.82%, ctx=9014952, majf=0, minf=8636
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=16.8GiB/s (18.1GB/s), 16.8GiB/s-16.8GiB/s (18.1GB/s-18.1GB/s), io=8192GiB (8796GB), run=487301-487301msec

And here is the same size config as sequential reads:

 

seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
...
fio-3.15
Starting 32 processes
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
Jobs: 32 (f=32): [W(32)][100.0%][w=11.9GiB/s][w=12.2k IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=32): err= 0: pid=22114: Mon Jun 29 16:14:14 2020
  write: IOPS=8024, BW=8025MiB/s (8414MB/s)(4702GiB/600003msec); 0 zone resets
    slat (usec): min=77, max=158140, avg=3949.10, stdev=4442.43
    clat (nsec): min=558, max=63118k, avg=14186.33, stdev=71735.56
     lat (usec): min=78, max=158155, avg=3970.60, stdev=4445.31
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[    5], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[   10], 50.00th=[   14], 60.00th=[   16],
     | 70.00th=[   18], 80.00th=[   20], 90.00th=[   23], 95.00th=[   25],
     | 99.00th=[   31], 99.50th=[   36], 99.90th=[   56], 99.95th=[  141],
     | 99.99th=[ 2737]
   bw (  MiB/s): min= 2346, max=14115, per=99.87%, avg=8014.13, stdev=101.56, samples=38383
   iops        : min= 2339, max=14114, avg=8008.87, stdev=101.61, samples=38383
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.23%, 4=6.22%, 10=33.92%, 20=41.43%, 50=18.09%
  lat (usec)   : 100=0.05%, 250=0.03%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=1.07%, sys=55.61%, ctx=4682973, majf=0, minf=442
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4814770,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=8025MiB/s (8414MB/s), 8025MiB/s-8025MiB/s (8414MB/s-8414MB/s), io=4702GiB (5049GB), run=600003-600003msec

 

Edited by MatzeHali

Share this post


Link to post
1 minute ago, MatzeHali said:

Do you mean, assign it as a Cache for the UnRAID-array?

I mean if in the tests you did with the NVMe was as a cache device, but I guess reading again you meant a single zfs pool, in that case:

8 minutes ago, johnnie.black said:

but in my experience ZFS has almost always faster writes than reads.

You could try formatting the NMVe device with another filesystem, just for testing.

Share this post


Link to post

I'll try some stuff with the NVME tomorrow.

Seeing those FIO results directly on the server with the spinning rust gives me hope that probably there's no need for any Cache-disks or similar, and I can utilize the NVME as a passthrough for a VM and rather try NFS to see if this sets me up better on the networking side, since apparently SMB here is the limiting factor, be it on the UnRAID or on the MacOSX side.

 

Problem will be to consistently get the ZFS to share to NFS, but that's one step further. First I'll check if it actually goes faster at all.

 

Cheers and thanks for your input so far,

 

M

Share this post


Link to post

Hey hey,

 

so, after some decent tuning of the ZFS parameters and adding a cache-drive to the UnRAID array, I'm quite happy with the performance of the pool and the UnRAID array on the box, running FIO there, I'm getting anywhere from 1200MiB/s to 1900MiB/s sequential writes and up to 2200MiB/s sequential reads on the ZFS pool and between 800MiB/s and 1200MiB/s reads on the UnRAID cache drive.

Since the box is mainly for video editing and VFX work and this fully saturates a 10Gbe connection, now on to the main problem: Samba.

 

I'm playing around with sysctl.conf-tunings on MacOSX at the moment, but since officially the sysctl.conf is not supported anymore, even though after deactivating SIL on Catalina, I'm not even sure it takes those values and uses them.

 

So, should I open a thread where I'm addressing the SAMBA-MacOSX-problem solely, or is someone here still reading this and would be able to share advice?

 

Thx,

 

M

Share this post


Link to post
37 minutes ago, MatzeHali said:

So, should I open a thread where I'm addressing the SAMBA-MacOSX-problem solely

Probably best, since there are much less Mac users here, more likely to you get their attention that way.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.