SMB & ZFS speeds I don't understand ...


Recommended Posts

Hi there,

 

I'm trying to find a sweet spot for a ZFS configuration with 15 disks and potentially a level 2 ARC, and thus, am benchmarking around a bit.

Configuration is a 16core XEON with 144GB of RAM, 15 WD RED 14TB drives, a 1TB EVO970 NVME and 10GBE ethernet connectivity.

I access via Samba with MacOSX.

While trying different RAID-Z configurations, with trying 16GB write once, read multiple times scenario:

 

Writing directly to the NVME as a single drive pool (for testing only) I can reach 880MB/s.

Reading from that same place multiple times I get 574MB/s.

Writing to a 2x7raidz1 I'm getting write speeds at about 760MB/s, but the exact same read speeds of 574MB/s.

 

Since the file is easily small enough to fit in the ARC, I would assume, latest with the second read it comes from RAM, so I doubt the read speeds on the box are the problem.

 

So, my question is, why is my read speed somehow capped at 574MB/s?

 

MTU-size is configured to 9000 and SMB-signing is switched off, I don't have any other ideas to try.

 

Thanks,

 

M

 

Link to comment

Hi Johnnie,

 

since this is not a ZFS-specific question, but rather a SMB-specific question, I thought I'd go with the general forum, since from the box itself, the read and write speeds are much higher than that, so it seems to be a network rather than a ZFS problem.

 

Thanks,

 

M

 

edit: I also now tested with a share on the UnRAID array, now, and it's the same pattern. Write speeds at about 840MB/s, reads at 570-580MB/s.

Edited by MatzeHali
Link to comment

Hi Johnnie,

 

I have not.

I'll try to find out how to do that from a Mac to an UnRAID box and report back. ;)

 

Thanks for pointing me in that direction. While I'm quite firm in doing this technical stuff and can learn quickly, I'm more the creative mind and lack a lot of background, so forgive me when it takes me a day or two to get this going.

 

Cheers,

 

M

Link to comment

So, here's the result for the iperf-run, which was done with switch -d, so this speed goes in both directions:

 

Accepted connection from 192.168.0.51, port 55664
[  5] local 192.168.0.111 port 5201 connected to 192.168.0.51 port 55665
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  1.15 GBytes  1174 MBytes/sec                  
[  5]   1.00-2.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   2.00-3.00   sec  1.15 GBytes  1177 MBytes/sec                  
[  5]   3.00-4.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   4.00-5.00   sec  1.15 GBytes  1177 MBytes/sec                  
[  5]   5.00-6.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   6.00-7.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   7.00-8.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   8.00-9.00   sec  1.15 GBytes  1176 MBytes/sec                  
[  5]   9.00-10.00  sec  1.15 GBytes  1176 MBytes/sec                  
[  5]  10.00-10.00  sec  1.83 MBytes  1255 MBytes/sec

 

I understand that I have to live with an overhead in SMB, but reading at less than half consistently seems odd.

 

Thanks for any ideas.

Link to comment

Do you mean, assign it as a Cache for the UnRAID-array?

Just to make sure I'm understanding correctly. ;)

 

At the moment I'm running some FIO benches on the box and getting these read speeds from the ZFS-array doing sequential reads bigger than the RAM available without any ARC2-cache drive attached to the pool:

seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
...
fio-3.15
Starting 32 processes
seqread: Laying out IO file (1 file / 262144MiB)
Jobs: 32 (f=32): [R(32)][100.0%][r=16.4GiB/s][r=16.8k IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=32): err= 0: pid=29109: Mon Jun 29 15:55:39 2020
  read: IOPS=17.2k, BW=16.8GiB/s (18.1GB/s)(8192GiB/487301msec)
    slat (usec): min=58, max=139996, avg=1848.09, stdev=1446.29
    clat (nsec): min=348, max=4571.0k, avg=3394.21, stdev=7243.75
     lat (usec): min=59, max=140000, avg=1853.75, stdev=1446.70
    clat percentiles (nsec):
     |  1.00th=[   956],  5.00th=[  1336], 10.00th=[  1608], 20.00th=[  2024],
     | 30.00th=[  2352], 40.00th=[  2640], 50.00th=[  2928], 60.00th=[  3184],
     | 70.00th=[  3440], 80.00th=[  3792], 90.00th=[  4448], 95.00th=[  5856],
     | 99.00th=[ 17280], 99.50th=[ 20352], 99.90th=[ 26752], 99.95th=[ 37120],
     | 99.99th=[111104]
   bw (  MiB/s): min= 3007, max=18110, per=99.90%, avg=17196.46, stdev=33.55, samples=31168
   iops        : min= 3005, max=18110, avg=17183.94, stdev=33.48, samples=31168
  lat (nsec)   : 500=0.01%, 750=0.19%, 1000=1.09%
  lat (usec)   : 2=17.99%, 4=64.96%, 10=12.76%, 20=2.45%, 50=0.51%
  lat (usec)   : 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=0.66%, sys=80.82%, ctx=9014952, majf=0, minf=8636
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=8388608,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=16.8GiB/s (18.1GB/s), 16.8GiB/s-16.8GiB/s (18.1GB/s-18.1GB/s), io=8192GiB (8796GB), run=487301-487301msec

And here is the same size config as sequential reads:

 

seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1
...
fio-3.15
Starting 32 processes
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
seqwrite: Laying out IO file (1 file / 262144MiB)
Jobs: 32 (f=32): [W(32)][100.0%][w=11.9GiB/s][w=12.2k IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=32): err= 0: pid=22114: Mon Jun 29 16:14:14 2020
  write: IOPS=8024, BW=8025MiB/s (8414MB/s)(4702GiB/600003msec); 0 zone resets
    slat (usec): min=77, max=158140, avg=3949.10, stdev=4442.43
    clat (nsec): min=558, max=63118k, avg=14186.33, stdev=71735.56
     lat (usec): min=78, max=158155, avg=3970.60, stdev=4445.31
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[    5], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[   10], 50.00th=[   14], 60.00th=[   16],
     | 70.00th=[   18], 80.00th=[   20], 90.00th=[   23], 95.00th=[   25],
     | 99.00th=[   31], 99.50th=[   36], 99.90th=[   56], 99.95th=[  141],
     | 99.99th=[ 2737]
   bw (  MiB/s): min= 2346, max=14115, per=99.87%, avg=8014.13, stdev=101.56, samples=38383
   iops        : min= 2339, max=14114, avg=8008.87, stdev=101.61, samples=38383
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.23%, 4=6.22%, 10=33.92%, 20=41.43%, 50=18.09%
  lat (usec)   : 100=0.05%, 250=0.03%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=1.07%, sys=55.61%, ctx=4682973, majf=0, minf=442
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4814770,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=8025MiB/s (8414MB/s), 8025MiB/s-8025MiB/s (8414MB/s-8414MB/s), io=4702GiB (5049GB), run=600003-600003msec

 

Edited by MatzeHali
Link to comment
1 minute ago, MatzeHali said:

Do you mean, assign it as a Cache for the UnRAID-array?

I mean if in the tests you did with the NVMe was as a cache device, but I guess reading again you meant a single zfs pool, in that case:

8 minutes ago, johnnie.black said:

but in my experience ZFS has almost always faster writes than reads.

You could try formatting the NMVe device with another filesystem, just for testing.

Link to comment

I'll try some stuff with the NVME tomorrow.

Seeing those FIO results directly on the server with the spinning rust gives me hope that probably there's no need for any Cache-disks or similar, and I can utilize the NVME as a passthrough for a VM and rather try NFS to see if this sets me up better on the networking side, since apparently SMB here is the limiting factor, be it on the UnRAID or on the MacOSX side.

 

Problem will be to consistently get the ZFS to share to NFS, but that's one step further. First I'll check if it actually goes faster at all.

 

Cheers and thanks for your input so far,

 

M

Link to comment

Hey hey,

 

so, after some decent tuning of the ZFS parameters and adding a cache-drive to the UnRAID array, I'm quite happy with the performance of the pool and the UnRAID array on the box, running FIO there, I'm getting anywhere from 1200MiB/s to 1900MiB/s sequential writes and up to 2200MiB/s sequential reads on the ZFS pool and between 800MiB/s and 1200MiB/s reads on the UnRAID cache drive.

Since the box is mainly for video editing and VFX work and this fully saturates a 10Gbe connection, now on to the main problem: Samba.

 

I'm playing around with sysctl.conf-tunings on MacOSX at the moment, but since officially the sysctl.conf is not supported anymore, even though after deactivating SIL on Catalina, I'm not even sure it takes those values and uses them.

 

So, should I open a thread where I'm addressing the SAMBA-MacOSX-problem solely, or is someone here still reading this and would be able to share advice?

 

Thx,

 

M

Link to comment
  • 1 month later...

Were you running those fio tests on Unraid or another system?  I ask because I'm testing a scenario running Unraid in a VM on Proxmox with an all NVMe zfs pool added as a single "drive" as cache in Unraid.  However, I'm unable to run fio even after it's installed via NerdTools.  I just get "illegal instruction" no matter what switches I use.

Link to comment
On 7/3/2020 at 7:04 AM, MatzeHali said:

...now on to the main problem: Samba.

Do you want to try disabling ACL in Samba config on the server?

Add that line in the [global] section of  /boot/config/smb-extra.conf

nt acl support = No

Restart Samba.  Re-run your speed tests.  Let us know.

Link to comment

Hi Pourko,

 

thanks for the suggestion. I did just that and it sadly changed nothing.

I'm still on the same speed limitations via SMB.

Not only that. If I, for example, render a JPG2000-sequence from the Mac to a share on UnRAID via SMB, having a constant render speed, meaning, there is basically always the same calculation involved, this goes well with an expected render speed of in my case 5fps, roughly, and then gradually degrades down to under 1fps.

Finder will take ages to display the render folders files in them and it's all painfully slow.

If I now break that render, rename the render folder in Krusader on UnRAID and resume the render, which recreates the original render folder, speed is up again until half an hour later or so, when it's down to about an 8th or so.

So, I'm still trying to wrap my head around how I can access UnRAID on 10GB-speeds with lots of files from MacOSX. So far, I'm really not sure what to do.

 

Thx,

 

M

Link to comment
  • 2 months later...

I'm interested in this -- very similar problem and have confirmed with Windows & Ubuntu clients -- also using ZFS.

Don't assume ZFS is the problem -- have confirmed ZFS speeds. 

 

My config:   Unraid w/ZFS on 20 core Xeon server w/ 128GB RAM.   16x6TB NAS Drives in RAID10 configuration (pool of mirrors), Optane 905p SLOG, NVME L2ARC.

Local testing:   With this configuration, I'm getting multi-GB/s reads and writes with local testing  (as expected -- 'fio' and 'dd')

Network testing:   Iperf3 (server on Unraid box, client on various machines) -- all 9+GBE iperf3 results send/receive.  Confirmed on Windows10 and Ubuntu client

 

Real testing using SMB w/10GBE clients:

-  Windows 10: using RAMdisk  copy large movie files to Unraid ZFS Samba share:   write speed 1.1GB/s  (perfect)

   -- :  copy same movies back to Windows 10 RAMdisk ... @ ~500 MB/s peak 

- Ubuntu 20.04:  using RAMdisk copy large movie files to Unraid ZFS Samba share:   write speed 1.1GB/s  

   -- :  copy same movies back to Ubuntu RAMdisk ... @ ~500 MB/s peak 

 

Something is limiting my read-speeds from the Unraid server over SMB.

 

Edited by spankaroo
Link to comment

RAMDisk to RAMDisk:   same problem.

Confirmed with SMB shared RAMDisk from Unraid.    Unraid /dev/shm (RAMDisk) shared over SMB using smb-extras.conf

 

[ramdisk]
      path = /dev/shm
      comment =
      browseable = yes
      write list = myuser
      valid users = myuser
      vfs objects =
      writeable = yes
      read only = no
      create mask = 0775
      directory mask = 0775
 

Copy from WIndows 10 or Ubuntu box over 10GBE ...  large video files copied from client to Unraid @ 1.1 GB/s.    When copying from Unraid RAMDisk back to Windows or Ubuntu RAMDisk, getting max ~512 MB/s.

 

 

Link to comment
  • 3 months later...
On 7/3/2020 at 7:04 AM, MatzeHali said:

n

On 7/3/2020 at 7:04 AM, MatzeHali said:

Hey hey,

 

so, after some decent tuning of the ZFS parameters and adding a cache-drive to the UnRAID array, I'm quite happy with the performance of the pool and the UnRAID array on the box, running FIO there, I'm getting anywhere from 1200MiB/s to 1900MiB/s sequential writes and up to 2200MiB/s sequential reads on the ZFS pool and between 800MiB/s and 1200MiB/s reads on the UnRAID cache drive.

Since the box is mainly for video editing and VFX work and this fully saturates a 10Gbe connection, now on to the main problem: Samba.

 

I'm playing around with sysctl.conf-tunings on MacOSX at the moment, but since officially the sysctl.conf is not supported anymore, even though after deactivating SIL on Catalina, I'm not even sure it takes those values and uses them.

 

So, should I open a thread where I'm addressing the SAMBA-MacOSX-problem solely, or is someone here still reading this and would be able to share advice?

 

Thx,

 

M

 

MatzeHali, 

 

I'd be very interested in seeing any details on your tuning of ZFS parameters & UnRAID setup.  I'm moving from Synology to UnRAID, and thought the ZFS was an interesting project to learn a new FS prior to "making the move".  

 

I've got 1 NVME drive (currently as unassigned device & zpool), & 4 16TB Exos drives across 2 mirrored vdevs.  While everything is setup fine, with a "throwaway" Samsung SSD 870 as the array drive - I'm thinking there's a better way to setup.

Link to comment
  • 2 years later...

NVME CACHE
I transferred from btrfs to zfs and have had a very noticeable decrease in write speed before and after memory cache has been filled not only that but the ram being used as a cache makes me nerves even though I have a ups and ecc memory, I have noticed that my duel nvme raid1 in zfs cache pool gets full 7000mbps read but only 1500mbps max write which is a far cry from what it should be when using zfs. I will be swtich my appdata pools back to btrfs as it has nearly all the same features as zfs but is much faster from my tests.
The only thing that is missing to take advantage of the btrfs is the nice gui plugin and scripts that have been done to deal with snapshots which i'm sure someone could manage to bang up pretty quick using existing zfs scripts and plugins.

Its important to note here that my main nvme cache pool was both raid1 in btrfs and zfs local to that file system type of raid1 obviously.


ARRAY
I also started doing some of my array drives to single disk zfs as per spaceinvadors videos as the array parity and expansion abilities would be handled by unraid which is where I noticed the biggest downside to me personally which was that zfs single disk  unlike any zpools is obviously missing a lot of features but more so is very heavily impacted write performance and you still only get single disk read speed obviously once the ram cache was exhausted. I noticed 65% degrading in write speed to the zfs single drive.

I did a lot of research into BTRFS vs ZFS and have decided to migrate all my drives and cache to BTRFS  and let unraid handle parity much the same as the way spaceinvader is doing zfs but this way I don't see the performance impact I was seeing in zfs and should still be able to do all the same snapshot shifting and replication that zfs does. Doing it this way I avoid the dreaded unstable btrfs local file system raid 5/6 and I get nearly all the same features as ZFS but without the speed bug issues in Unraid.


DISCLAIMER
I'm sure ZFS is very fast when it comes to an actual zpool and not on a single disk drives situation but it also very much feels like zfs is a deep storage only file system and not really molded for an active so to speak array.

Given my testing all my cache pools and the drives within them will be btrfs raid 1 or 0 ( Raid 1 giving you active bitrot protection) and my Array will be Unraid handled parity with individual BTRFS files system single disks

Hope this helps others in some way to avoid days of data transfer only to realis the pitfalls.

 

Link to comment

I'm not sure that what you see has anything to do with SMB.

My ZFS pool has been running for years now, and when using internally on the server, there is no speed problems at all.

My problem is still that the SMB-protocol is totally bottlenecking any really fast network activity in combination with MacOSX clients. What I did to mitigate is that the software I'm mainly working with is now caching files on internal NVMEs under MacOSX, so the data which is needed when working on something is fastly available, but the ZFS in the background covers all the important stuff to make it safe.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.