Large copy/write on btrfs cache pool locking up server temporarily


Recommended Posts

I'm not saying btrfs doesn't have something to do with this issue, but very much doubt it's only that, I never had this problem and I use btrfs in all my servers, for array disks and cache, both single and at some point an 8 device raid10 pool, never noticed this problem, just now did a test that with:

 

-mover running (25GB from cache to the array)

-manual copy on the console using cp of another 25GB from cache to array

-copying another 25GB over lan from my desktop to cache disk

 

load average peaked at about 4, this on a dual core pentium G620 and the webGUI was normally usable during all the operations.

 

Link to comment
I'm not saying btrfs doesn't have something to do with this issue, but very much doubt it's only that, I never had this problem and I use btrfs in all my servers, for array disks and cache, both single and at some point an 8 device raid10 pool, never noticed this problem, just now did a test that with:
 
-mover running (25GB from cache to the array)
-manual copy on the console using cp of another 25GB from cache to array
-copying another 25GB over lan from my desktop to cache disk
 
load average peaked at about 4, this on a dual core pentium G620 and the webGUI was normally usable during all the operations.
 
Try copying a 25gb file from a btrfs drive (or pool) to the same drive. That's when I had issues. Also during unrar and repair where there is simultaneous read and write operations on the same disk.
Link to comment

Did another test, pool is 3 x 128GB SSDs in raid0:

 

-cp 25GB in ISOs from one folder on the pool to another

-a second simultaneous cp of 25GB in ISOs from one folder on the pool to another

-transfer 25GB in ISOs from desktop to pool over lan

 

Again WebGUI was always responsive and load average topped at about 3

test.png

Link to comment
5 minutes ago, thomast_88 said:

I have a BTRFS raid1 (2x250gb Samsung EVO), and I'm having the same issues as @aptalca for months. Raid is unusable when copying / moving stuff. If anybody got an idea how to trace this down, let me know. I'm willing to invest my time to get this fixed.

 

 

Two things come to mind, first make sure you're regularly trimming your pool, second, it happened to me once a btrfs filesystem becoming very slow when writing without an apparent reason, it was on a single NVMe device, re-formatting it fixed the problem, you can see if that helps using the replace cache procedure, but format instead of replace:

 

 

Link to comment
  • 2 weeks later...

I have a theory on why some users may be having this issue, if anyone wants to try and post if there was an improvement please do.

 

Currently fstrim on btrfs is only trimming the unallocated space, this is apparently a bug but it's been like this for some time, for some users with a large slack on the filesystem this will be a very small area of the SSD leaving all unused but allocated space untrimmed, this can lead to very poor performance, so first check for slack on the filesystem, i.e., the difference between the allocated and used space, on the main page click on cache and look at the "btrfs filesystem show" section, e.g.,:

 

Label: none  uuid: cea535d2-33f9-4cf2-9ff0-0b51826d48a1
	Total devices 1 FS bytes used 265.61GiB
	devid    1 size 476.94GiB used 427.03GiB path /dev/nvme0n1p1

In this case there's about 161GiB of slack, 476.94GiB is the total device size, 427.03GiB are allocated but only 265.61GiB are in use, since only unallocated space is trimmed, fstrim will only trim 49.9GiB (476.94-427.03) so most free space will remain untrimmed, to fix this run a full balance to reclaim all allocated but unused space, on the console type:

btrfs balance start --full-balance /mnt/cache

This will take some time, in the end it should look like this:

Label: none  uuid: cea535d2-33f9-4cf2-9ff0-0b51826d48a1
	Total devices 1 FS bytes used 265.68GiB
	devid    1 size 476.94GiB used 266.03GiB path /dev/nvme0n1p1

Now slack space is less than 1GiB, so fstrim will work on practically all unused space, trim you pool:

fstrim -v /mnt/cache

And check if performance improves.

 

  • Like 2
Link to comment

I think you are onto something there..

My second system running btfs on cache still is where downloads temp go for decompression before moving into the array.

It has in the FS 14G Used in Unraid but in the btrfs show.

 

Label: none  uuid: 8e82d82d-f6f6-45d7-9e5a-d389fb0e0bb3
        Total devices 1 FS bytes used 17.36GiB
        devid    1 size 238.47GiB used 161.03GiB path /dev/sdl1
 

running a balance now.

This might explain why when i fill 160GB of my 256GB SSD the wheels full off.

I tend to find docker crashes happen when the cache is around the 160-165GB area and the system slows to a crawl.

if i thrash the IO when the cache is 160GB thats when things go wrong.

 

Edited by Maticks
Link to comment

Just finished...

Label: none  uuid: 8e82d82d-f6f6-45d7-9e5a-d389fb0e0bb3
        Total devices 1 FS bytes used 17.31GiB
        devid    1 size 238.47GiB used 18.03GiB path /dev/sdl1
 

root@Vault:~# fstrim -v /mnt/cache
/mnt/cache: 220.5 GiB (236698525696 bytes) trimmed
 

That so far is faster than my cache drive has ever run. ill load it up with data and see if it slows down.

 

So we should cron btrfs balance daily as well ?

 

 

chart.png

Link to comment
8 minutes ago, Maticks said:

So we should cron btrfs balance daily as well ?

 

Typical cache usage, i.e., constantly filling up and emptying the cache exacerbates the large slack issue, this is supposed to improve once we get to kernel 4.14 as there are some modifications to deal with this, but until then it's a good idea to monitor this and/or do a periodic balance, not only because of the trim issues but also because in extreme cases you can run into another issue, btrfs reporting the device full when it's not because it's fully allocated and can't create any new chunks.

 

If doing a periodic balance it should be enough to do a partial balance, it will recover most of the free allocated space but it will be much faster and cause much less wear on the SSD, e.g.:

 

btrfs balance start -dusage=75 /mnt/cache

This will only re-allocate chunks that are up to 75% unused.

Link to comment

Before:

Total devices 2 FS bytes used 186.99GiB
devid    1 size 232.89GiB used 232.88GiB path /dev/sdc1
devid    2 size 232.89GiB used 232.88GiB path /dev/sde1

After:

Total devices 2 FS bytes used 189.18GiB
devid    1 size 232.89GiB used 192.03GiB path /dev/sdc1
devid    2 size 232.89GiB used 192.03GiB path /dev/sde1

Trim:

fstrim -v /mnt/cache
/mnt/cache: 81.7 GiB (87732568064 bytes) trimmed

 

Not sure what those numbers exactly mean, but so far I feel a performance improvement - that is promising! I will try with some large files tomorrow :-) 

Link to comment
  • 3 weeks later...

I'm still having issues. Copied a 11GB file from my primary array, to the cache array (raid 1), and server load went to 25'ish before it ended, making my dockers / VMS crash. @johnnie.black I saw your tests earlier, and noticed you are using raid 0. Have you had any issues with raid 1, or maybe have any idea why this is happening?

 

@aptalca did you get all the issues fixed - and are you running raid 0 or raid 1?

Link to comment
7 hours ago, thomast_88 said:

I'm still having issues. Copied a 11GB file from my primary array, to the cache array (raid 1), and server load went to 25'ish before it ended, making my dockers / VMS crash. @johnnie.black I saw your tests earlier, and noticed you are using raid 0. Have you had any issues with raid 1, or maybe have any idea why this is happening?

 

@aptalca did you get all the issues fixed - and are you running raid 0 or raid 1?

 

I switched to a single disk xfs. No more issues

  • Like 1
Link to comment
20 hours ago, jonp said:

Just had to chime in and thank @johnnie.black for all his work on this topic.  I am marking this thread for future review so we can see if there are further ways using the knowledge in here to make things better for everyone.

 

if this greatly improves performance in general for people using ssd's (possible as either single or cache pool configuration) then im assuming the above commands could be enabled/disabled as options in the webui?, or possibly better, detect if cache drive is ssd, does have trim capability and if so then by default enable partial balance and trim on a configurable schedule, is that the sort of thing your considering @jonp ?

Link to comment
 
if this greatly improves performance in general for people using ssd's (possible as either single or cache pool configuration) then im assuming the above commands could be enabled/disabled as options in the webui?, or possibly better, detect if cache drive is ssd, does have trim capability and if so then by default enable partial balance and trim on a configurable schedule, is that the sort of thing your considering [mention=62528]jonp[/mention] ?
Possibly. Tom and I had a long conversation about proper trim support at one point. The real trick is when you have an SSD assigned to the array. Depending on the method of discard/trim that the SSD supports, it could potentially violate the integrity of parity (changing values on the device without updating the corresponding blocks on the parity disk). I realize the context of this thread is with relation to the cache, but if we are implementing proper support for trim, we will want to address this at the same time.

Sent from my SM-G930P using Tapatalk

Link to comment
1 minute ago, jonp said:

Possibly. Tom and I had a long conversation about proper trim support at one point. The real trick is when you have an SSD assigned to the array. Depending on the method of discard/trim that the SSD supports, it could potentially violate the integrity of parity (changing values on the device without updating the corresponding blocks on the parity disk). I realize the context of this thread is with relation to the cache, but if we are implementing proper support for trim, we will want to address this at the same time.

Sent from my SM-G930P using Tapatalk
 

 

ahh ok, i wasn't thinking about this in relation to array disks, i see your point, maybe though a first step to address the issue with btrfs and cache drives would be welcome, go for the trendy 'agile' approach rather than waterfall ;-)

  • Like 1
  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.