Help Needed Identifying Bottleneck (ZFS Cache)

DiscoverIt · July 15, 2023

I have a system that I just can't seem to get reads/writes to what they should be. I can usually saturate my 25Gbps NIC but with that MikroTik 100Gb switch tempting me daily...I want to make sure if I get a deal on it I'm prepared hardware-wise.

Cache consists of 2 x 4 raidz1 zpool comprised of 8 x 1TB PCIe 3.0 NVME's in two ASUS Hyper M.2 carriers. System is an EPYC 7302p with 256GB of 2133 DDR4 memory and a 25Gbps NIC. LUKS encryption is enabled through UnRaid's implementation.

root@UNRAID:/# zpool status
  pool: cache
 state: ONLINE
  scan: scrub repaired 0B in 00:03:48 with 0 errors on Sun Jul  9 04:03:49 2023
config:

        NAME           STATE     READ WRITE CKSUM
        cache          ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            nvme2n1p1  ONLINE       0     0     0
            nvme3n1p1  ONLINE       0     0     0
            nvme1n1p1  ONLINE       0     0     0
            nvme0n1p1  ONLINE       0     0     0
          raidz1-1     ONLINE       0     0     0
            nvme4n1p1  ONLINE       0     0     0
            nvme5n1p1  ONLINE       0     0     0
            nvme6n1p1  ONLINE       0     0     0
            nvme7n1p1  ONLINE       0     0     0

errors: No known data errors

Writing to Cache pool;

root@UNRAID:/mnt/cache/appdata# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 3.78222 s, 2.8 GB/s

Writing to RAM;

root@UNRAID:/tmp# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 6.7859 s, 1.6 GB/s

Possible culprits I feel it could be but I don't know what rock to turn over to find additional bandwidth.

1) Slow memory

2) PCIe 3.0 though I should theoretically be capable of 4 GB/s per drive

3) A slow NVME bringing down the pool

4) LUKS in some form as the culprit

Edited July 15, 2023 by DiscoverIt

JorgeB · July 16, 2023

I usually don't put much on those kind of benchmarks, difficult to gather real-world numbers from them, but for comparison here are my results with 7 x NVMe in raiz1, Epyc 7232P with DDR4-3200MT/s RAM:

root@Tower7:/mnt/nvmeraid/TV# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 3.92476 s, 2.7 GB/s

root@Tower7:/tmp# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 5.17479 s, 2.1 GB/s

Now I only have 10GbE so cannot test real world network speeds of more than 1GB/s, pool to pool copies seem to be limited to around 1.5GB/s, but it's not a bandwidth issue, since for example during a scrub I can get much higher pool speed:

DiscoverIt · July 16, 2023

Thx Jorge, an interesting lead was given in Discord in that dd appears capped or restricted in some manner. Going to spin up a basic Ubuntu container and try a common benchmarking tool next.

Help Needed Identifying Bottleneck (ZFS Cache)

Recommended Posts

DiscoverIt

Link to comment

JorgeB

Link to comment

DiscoverIt

Link to comment

Join the conversation