Jump to content

IOwait exponentially worse as btrfs pool device number increases


Recommended Posts

I've been aware of btrfs pools in unraid causing high iowait for a while and up to this point I've avoided the issue by using a single NVMe using XFS.  But circumstances have changed and I'm exploring using a cache pool of Intel S4600 480GB SSD's now.  I've been doing extensive testing with both the number of drives in the pool and the raid balance.  It seems that as the number of drives in the pool increases, so does the amount of IOwait.  It appears there is a specific IOwait number attached per drive.  The raid balance does not seem to have any effect other than shortening/prolonging the IOwait issue depending on the balance (ie. raid1/raid10 has a longer period of high iowait then raid0 obviously since the write takes longer).

 

I have tested these drives connected both to my onboard SATA controller (SuperMicro X11SCH-F motherboard with C246 chipset) and also connected to my LSI 9300-8e SAS controller.  There is zero difference.

 

I'm curious if any one has any insight on how to mitigate these IOwait issues.  My only solution at this moment appears to be using a RAID0 balance so that the writes are very fast (ie. 2.0GB/s with four S4600's) so that the iowait only lasts say 10-15 seconds for a 20GB write.  But this is obviously not sustainable unless I can ensure I never do large transfers to cache enabled shares which is kind of the whole point of having a cache.

 

 

EDIT:  I should note, these tests were done using dd.  It appears that writes from an unassigned pool to cache does show much less iowait.  I guess that would make sense being that RAM is so much faster than storage.

Edited by IamSpartacus
Link to comment
9 minutes ago, testdasi said:

Details on test methodology please.

 

Initially testing with the following command:

 

Quote

dd bs=1M count=32068 if=/dev/zero of=test conv=fdatasync

 

Then testing from an unassigned disk to cache using rsync, iowait was much lower.  I'm trying to test writes from one server to the next using a cache enabled share but I seem to be only getting 200-250MB/s across my network right now which isn't making much sense since it's a 40GbE connection and iperf3 is showing it connected as such.

Link to comment
5 minutes ago, bonienl said:

I am not sure how you are testing, but I have a cache pool of 4 devices in RAID10 mode, and can reach near 10Gb/s transfer speeds when copying files to or from the cache pool over SMB.

 

These servers are direct connected.  So I don't really have any other way of testing other than an rsync or some other internal transfer tool.  I get poor speed whether I use nfs or smb.  

Link to comment
7 minutes ago, IamSpartacus said:

Initially testing with the following command:

And what is your current directory when executing the dd command?

I have found dd + dsync gives unrealistically low results. I think it's because dsync is sequential so your 32068 blocks are done one-by-one so the high IOWAIT is the 32k times dd had to wait to confirm data has been written.

In reality, and particularly with NVMe, things are done in parallel and/or in aggregation so it's a lot faster.

 

Have you also done a test on your other server to see if it is capable of more than 250MB/s read?

Link to comment
2 minutes ago, testdasi said:

And what is your current directory when executing the dd command?

I have found dd + dsync gives unrealistically low results. I think it's because dsync is sequential so your 32068 blocks are done one-by-one so the high IOWAIT is the 32k times dd had to wait to confirm data has been written.

In reality, and particularly with NVMe, things are done in parallel and/or in aggregation so it's a lot faster.

 

Have you also done a test on your other server to see if it is capable of more than 250MB/s read?

 

Do you recommend a different test such as FIO?  Yes, internal tests with dd on each side of the storage (cache pool on one side, NVMe cache on the other) are each capable of 1.8-2.0GB/s reads/writes to the drive.

Link to comment
7 minutes ago, IamSpartacus said:

 

Do you recommend a different test such as FIO?  Yes, internal tests with dd on each side of the storage (cache pool on one side, NVMe cache on the other) are each capable of 1.8-2.0GB/s reads/writes to the drive.

I'm a simpleton so rsync --progress or rsync --info=progress2

And I use real data. It's not too hard to find a 40-50GB linux iso nowadays (or just use dd to create a really big test file and do rsync on it).

Link to comment
5 minutes ago, testdasi said:

I'm a simpleton so rsync --progress or rsync --info=progress2

And I use real data. It's not too hard to find a 40-50GB linux iso nowadays (or just use dd to create a really big test file and do rsync on it).

 

image.png.b08b4a139250adb4d3767e62689b99d4.png

 

No where near the speed the storage is capable but that may be an rsync limitation.  And IOwait was low during thing.  Seems I need to find a better transfer method than rsync and then figure out why my network transfers are still slow assuming they are with that method.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...