IOwait exponentially worse as btrfs pool device number increases

IamSpartacus · April 24, 2020

I've been aware of btrfs pools in unraid causing high iowait for a while and up to this point I've avoided the issue by using a single NVMe using XFS. But circumstances have changed and I'm exploring using a cache pool of Intel S4600 480GB SSD's now. I've been doing extensive testing with both the number of drives in the pool and the raid balance. It seems that as the number of drives in the pool increases, so does the amount of IOwait. It appears there is a specific IOwait number attached per drive. The raid balance does not seem to have any effect other than shortening/prolonging the IOwait issue depending on the balance (ie. raid1/raid10 has a longer period of high iowait then raid0 obviously since the write takes longer).

I have tested these drives connected both to my onboard SATA controller (SuperMicro X11SCH-F motherboard with C246 chipset) and also connected to my LSI 9300-8e SAS controller. There is zero difference.

I'm curious if any one has any insight on how to mitigate these IOwait issues. My only solution at this moment appears to be using a RAID0 balance so that the writes are very fast (ie. 2.0GB/s with four S4600's) so that the iowait only lasts say 10-15 seconds for a 20GB write. But this is obviously not sustainable unless I can ensure I never do large transfers to cache enabled shares which is kind of the whole point of having a cache.

EDIT: I should note, these tests were done using dd. It appears that writes from an unassigned pool to cache does show much less iowait. I guess that would make sense being that RAM is so much faster than storage.

Edited April 24, 2020 by IamSpartacus

testdasi · April 24, 2020

Details on test methodology please.

IamSpartacus · April 24, 2020

9 minutes ago, testdasi said:

Details on test methodology please.

Initially testing with the following command:

Quote

dd bs=1M count=32068 if=/dev/zero of=test conv=fdatasync

Then testing from an unassigned disk to cache using rsync, iowait was much lower. I'm trying to test writes from one server to the next using a cache enabled share but I seem to be only getting 200-250MB/s across my network right now which isn't making much sense since it's a 40GbE connection and iperf3 is showing it connected as such.

bonienl · April 24, 2020

I am not sure how you are testing, but I have a cache pool of 4 devices in RAID10 mode, and can reach near 10Gb/s transfer speeds when copying files to or from the cache pool over SMB.

IamSpartacus · April 24, 2020

5 minutes ago, bonienl said:

I am not sure how you are testing, but I have a cache pool of 4 devices in RAID10 mode, and can reach near 10Gb/s transfer speeds when copying files to or from the cache pool over SMB.

These servers are direct connected. So I don't really have any other way of testing other than an rsync or some other internal transfer tool. I get poor speed whether I use nfs or smb.

JorgeB · April 24, 2020

11 minutes ago, bonienl said:

I am not sure how you are testing, but I have a cache pool of 4 devices in RAID10 mode, and can reach near 10Gb/s transfer speeds when copying files to or from the cache pool over SMB.

Same, I have a 6 SSD RAID5 pool, get around 800MB/s each way.

testdasi · April 24, 2020

7 minutes ago, IamSpartacus said:

Initially testing with the following command:

And what is your current directory when executing the dd command?

I have found dd + dsync gives unrealistically low results. I think it's because dsync is sequential so your 32068 blocks are done one-by-one so the high IOWAIT is the 32k times dd had to wait to confirm data has been written.

In reality, and particularly with NVMe, things are done in parallel and/or in aggregation so it's a lot faster.

Have you also done a test on your other server to see if it is capable of more than 250MB/s read?

IamSpartacus · April 24, 2020

2 minutes ago, testdasi said:

And what is your current directory when executing the dd command?

I have found dd + dsync gives unrealistically low results. I think it's because dsync is sequential so your 32068 blocks are done one-by-one so the high IOWAIT is the 32k times dd had to wait to confirm data has been written.

In reality, and particularly with NVMe, things are done in parallel and/or in aggregation so it's a lot faster.

Have you also done a test on your other server to see if it is capable of more than 250MB/s read?

Do you recommend a different test such as FIO? Yes, internal tests with dd on each side of the storage (cache pool on one side, NVMe cache on the other) are each capable of 1.8-2.0GB/s reads/writes to the drive.

testdasi · April 24, 2020

7 minutes ago, IamSpartacus said:

Do you recommend a different test such as FIO? Yes, internal tests with dd on each side of the storage (cache pool on one side, NVMe cache on the other) are each capable of 1.8-2.0GB/s reads/writes to the drive.

I'm a simpleton so rsync --progress or rsync --info=progress2

And I use real data. It's not too hard to find a 40-50GB linux iso nowadays (or just use dd to create a really big test file and do rsync on it).

IamSpartacus · April 24, 2020

5 minutes ago, testdasi said:

I'm a simpleton so rsync --progress or rsync --info=progress2

And I use real data. It's not too hard to find a 40-50GB linux iso nowadays (or just use dd to create a really big test file and do rsync on it).

image.png.b08b4a139250adb4d3767e62689b99d4.png

No where near the speed the storage is capable but that may be an rsync limitation. And IOwait was low during thing. Seems I need to find a better transfer method than rsync and then figure out why my network transfers are still slow assuming they are with that method.

IOwait exponentially worse as btrfs pool device number increases

Recommended Posts

IamSpartacus

Link to comment

testdasi

Link to comment

IamSpartacus

Link to comment

bonienl

Link to comment

IamSpartacus

Link to comment

JorgeB

Link to comment

testdasi

Link to comment

IamSpartacus

Link to comment

testdasi

Link to comment

IamSpartacus

Link to comment

Join the conversation