Jump to content

Help Needed Identifying Bottleneck (ZFS Cache)


Recommended Posts

I have a system that I just can't seem to get reads/writes to what they should be. I can usually saturate my 25Gbps NIC but with that MikroTik 100Gb switch tempting me daily...I want to make sure if I get a deal on it I'm prepared hardware-wise.

 

Cache consists of 2 x 4 raidz1 zpool comprised of 8 x 1TB PCIe 3.0 NVME's in two ASUS Hyper M.2 carriers. System is an EPYC 7302p with 256GB of 2133 DDR4 memory and a 25Gbps NIC. LUKS encryption is enabled through UnRaid's implementation.

 

root@UNRAID:/# zpool status
  pool: cache
 state: ONLINE
  scan: scrub repaired 0B in 00:03:48 with 0 errors on Sun Jul  9 04:03:49 2023
config:

        NAME           STATE     READ WRITE CKSUM
        cache          ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            nvme2n1p1  ONLINE       0     0     0
            nvme3n1p1  ONLINE       0     0     0
            nvme1n1p1  ONLINE       0     0     0
            nvme0n1p1  ONLINE       0     0     0
          raidz1-1     ONLINE       0     0     0
            nvme4n1p1  ONLINE       0     0     0
            nvme5n1p1  ONLINE       0     0     0
            nvme6n1p1  ONLINE       0     0     0
            nvme7n1p1  ONLINE       0     0     0

errors: No known data errors

 

Writing to Cache pool;

root@UNRAID:/mnt/cache/appdata# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 3.78222 s, 2.8 GB/s

 

Writing to RAM;

root@UNRAID:/tmp# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 6.7859 s, 1.6 GB/s

 

Possible culprits I feel it could be but I don't know what rock to turn over to find additional bandwidth.

1) Slow memory

2) PCIe 3.0 though I should theoretically be capable of 4 GB/s per drive

3) A slow NVME bringing down the pool

4) LUKS in some form as the culprit

Edited by DiscoverIt
Link to comment

I usually don't put much on those kind of benchmarks, difficult to gather real-world numbers from them, but for comparison here are my results with 7 x NVMe in raiz1, Epyc 7232P with DDR4-3200MT/s RAM:

 

root@Tower7:/mnt/nvmeraid/TV# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 3.92476 s, 2.7 GB/s

root@Tower7:/tmp# dd if=/dev/zero of=test.img bs=1G count=10 oflag=dsync && rm test.img
10+0 records in
10+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 5.17479 s, 2.1 GB/s

 

Now I only have 10GbE so cannot test real world network speeds of more than 1GB/s, pool to pool copies seem to be limited to around 1.5GB/s, but it's not a bandwidth issue, since for example during a scrub I can get much higher pool speed:

 

image.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...