Large copy/write on btrfs cache pool locking up server temporarily

JonathanM · August 24, 2017

6 hours ago, Maticks said:

should this be raised as a bug to be looked at?

Good luck. In the past, BTRFS problems have been dismissed as hardware faults, it's never the file system that's the issue according to the dev's.

Maticks · August 24, 2017

it sounds very much like the Apple Iphone antenna issue, "you are holding it wrong"

JorgeB · August 24, 2017

I'm not saying btrfs doesn't have something to do with this issue, but very much doubt it's only that, I never had this problem and I use btrfs in all my servers, for array disks and cache, both single and at some point an 8 device raid10 pool, never noticed this problem, just now did a test that with:

-mover running (25GB from cache to the array)

-manual copy on the console using cp of another 25GB from cache to array

-copying another 25GB over lan from my desktop to cache disk

load average peaked at about 4, this on a dual core pentium G620 and the webGUI was normally usable during all the operations.

aptalca · August 24, 2017

I'm not saying btrfs doesn't have something to do with this issue, but very much doubt it's only that, I never had this problem and I use btrfs in all my servers, for array disks and cache, both single and at some point an 8 device raid10 pool, never noticed this problem, just now did a test that with:

-mover running (25GB from cache to the array)
-manual copy on the console using cp of another 25GB from cache to array
-copying another 25GB over lan from my desktop to cache disk

load average peaked at about 4, this on a dual core pentium G620 and the webGUI was normally usable during all the operations.

Try copying a 25gb file from a btrfs drive (or pool) to the same drive. That's when I had issues. Also during unrar and repair where there is simultaneous read and write operations on the same disk.

JorgeB · August 24, 2017

22 minutes ago, aptalca said:

Try copying a 25gb file from a btrfs drive (or pool) to the same drive. That's when I had issues. Also during unrar and repair where there is simultaneous read and write operations on the same disk.

Will try that tomorrow.

JorgeB · August 25, 2017

Did another test, pool is 3 x 128GB SSDs in raid0:

-cp 25GB in ISOs from one folder on the pool to another

-a second simultaneous cp of 25GB in ISOs from one folder on the pool to another

-transfer 25GB in ISOs from desktop to pool over lan

Again WebGUI was always responsive and load average topped at about 3

thomast_88 · August 25, 2017

I have a BTRFS raid1 (2x250gb Samsung EVO), and I'm having the same issues as @aptalca for months. Raid is unusable when copying / moving stuff. If anybody got an idea how to trace this down, let me know. I'm willing to invest my time to get this fixed.

Edited August 25, 2017 by thomast_88

JorgeB · August 25, 2017

5 minutes ago, thomast_88 said:

I have a BTRFS raid1 (2x250gb Samsung EVO), and I'm having the same issues as @aptalca for months. Raid is unusable when copying / moving stuff. If anybody got an idea how to trace this down, let me know. I'm willing to invest my time to get this fixed.

Two things come to mind, first make sure you're regularly trimming your pool, second, it happened to me once a btrfs filesystem becoming very slow when writing without an apparent reason, it was on a single NVMe device, re-formatting it fixed the problem, you can see if that helps using the replace cache procedure, but format instead of replace:

Maticks · September 6, 2017

I am thinking it might be this copy on write function in the shares section it's on by default. Maybe that is breaking things it does mention btrfs should be set to nocow.

id have to change back my pool to btrfs to test again though....

JorgeB · September 6, 2017

43 minutes ago, Maticks said:

btrfs should be set to nocow.

Mostly for spinners when using high fragmentable data, like VM images, usually SSDs can tolerate the high fragmentation, though it can still slowdown for heavily modified VM images or large databases, but with nodatacow you also lose checksuming.

JorgeB · September 13, 2017

I have a theory on why some users may be having this issue, if anyone wants to try and post if there was an improvement please do.

Currently fstrim on btrfs is only trimming the unallocated space, this is apparently a bug but it's been like this for some time, for some users with a large slack on the filesystem this will be a very small area of the SSD leaving all unused but allocated space untrimmed, this can lead to very poor performance, so first check for slack on the filesystem, i.e., the difference between the allocated and used space, on the main page click on cache and look at the "btrfs filesystem show" section, e.g.,:

Label: none  uuid: cea535d2-33f9-4cf2-9ff0-0b51826d48a1
	Total devices 1 FS bytes used 265.61GiB
	devid    1 size 476.94GiB used 427.03GiB path /dev/nvme0n1p1

In this case there's about 161GiB of slack, 476.94GiB is the total device size, 427.03GiB are allocated but only 265.61GiB are in use, since only unallocated space is trimmed, fstrim will only trim 49.9GiB (476.94-427.03) so most free space will remain untrimmed, to fix this run a full balance to reclaim all allocated but unused space, on the console type:

btrfs balance start --full-balance /mnt/cache

This will take some time, in the end it should look like this:

Label: none  uuid: cea535d2-33f9-4cf2-9ff0-0b51826d48a1
	Total devices 1 FS bytes used 265.68GiB
	devid    1 size 476.94GiB used 266.03GiB path /dev/nvme0n1p1

Now slack space is less than 1GiB, so fstrim will work on practically all unused space, trim you pool:

fstrim -v /mnt/cache

And check if performance improves.

Maticks · September 13, 2017

I think you are onto something there..

My second system running btfs on cache still is where downloads temp go for decompression before moving into the array.

It has in the FS 14G Used in Unraid but in the btrfs show.

Label: none uuid: 8e82d82d-f6f6-45d7-9e5a-d389fb0e0bb3
Total devices 1 FS bytes used 17.36GiB
devid 1 size 238.47GiB used 161.03GiB path /dev/sdl1

running a balance now.

This might explain why when i fill 160GB of my 256GB SSD the wheels full off.

I tend to find docker crashes happen when the cache is around the 160-165GB area and the system slows to a crawl.

if i thrash the IO when the cache is 160GB thats when things go wrong.

Edited September 13, 2017 by Maticks

Maticks · September 13, 2017

Just finished...

Label: none uuid: 8e82d82d-f6f6-45d7-9e5a-d389fb0e0bb3
Total devices 1 FS bytes used 17.31GiB
devid 1 size 238.47GiB used 18.03GiB path /dev/sdl1

root@Vault:~# fstrim -v /mnt/cache
/mnt/cache: 220.5 GiB (236698525696 bytes) trimmed

That so far is faster than my cache drive has ever run. ill load it up with data and see if it slows down.

So we should cron btrfs balance daily as well ?

thomast_88 · September 13, 2017

@johnnie.black I will test this straight away when I get home. Thanks for putting your findings up!

JorgeB · September 13, 2017

8 minutes ago, Maticks said:

So we should cron btrfs balance daily as well ?

Typical cache usage, i.e., constantly filling up and emptying the cache exacerbates the large slack issue, this is supposed to improve once we get to kernel 4.14 as there are some modifications to deal with this, but until then it's a good idea to monitor this and/or do a periodic balance, not only because of the trim issues but also because in extreme cases you can run into another issue, btrfs reporting the device full when it's not because it's fully allocated and can't create any new chunks.

If doing a periodic balance it should be enough to do a partial balance, it will recover most of the free allocated space but it will be much faster and cause much less wear on the SSD, e.g.:

btrfs balance start -dusage=75 /mnt/cache

This will only re-allocate chunks that are up to 75% unused.

Maticks · September 13, 2017

I might do this once a month i have had those disk full messages before when i had 80G free before on the cache.

thomast_88 · September 13, 2017

Before:

Total devices 2 FS bytes used 186.99GiB
devid    1 size 232.89GiB used 232.88GiB path /dev/sdc1
devid    2 size 232.89GiB used 232.88GiB path /dev/sde1

After:

Total devices 2 FS bytes used 189.18GiB
devid    1 size 232.89GiB used 192.03GiB path /dev/sdc1
devid    2 size 232.89GiB used 192.03GiB path /dev/sde1

Trim:

fstrim -v /mnt/cache
/mnt/cache: 81.7 GiB (87732568064 bytes) trimmed

Not sure what those numbers exactly mean, but so far I feel a performance improvement - that is promising! I will try with some large files tomorrow :-)

JorgeB · September 13, 2017

7 minutes ago, thomast_88 said:

Not sure what those numbers exactly mean, but so far I feel a performance improvement

You should, your filesystem was practically 100% allocated, 232.88 out of 232.89GiB, so only 0.01 GiB were being trimmed.

Tuftuf · September 13, 2017

This has been an issue for me too, useful info!

thomast_88 · October 5, 2017

I'm still having issues. Copied a 11GB file from my primary array, to the cache array (raid 1), and server load went to 25'ish before it ended, making my dockers / VMS crash. @johnnie.black I saw your tests earlier, and noticed you are using raid 0. Have you had any issues with raid 1, or maybe have any idea why this is happening?

@aptalca did you get all the issues fixed - and are you running raid 0 or raid 1?

jonp · October 5, 2017

Just had to chime in and thank @johnnie.black for all his work on this topic. I am marking this thread for future review so we can see if there are further ways using the knowledge in here to make things better for everyone.

aptalca · October 5, 2017

7 hours ago, thomast_88 said:

I'm still having issues. Copied a 11GB file from my primary array, to the cache array (raid 1), and server load went to 25'ish before it ended, making my dockers / VMS crash. @johnnie.black I saw your tests earlier, and noticed you are using raid 0. Have you had any issues with raid 1, or maybe have any idea why this is happening?

@aptalca did you get all the issues fixed - and are you running raid 0 or raid 1?

I switched to a single disk xfs. No more issues

binhex · October 6, 2017

20 hours ago, jonp said:

Just had to chime in and thank @johnnie.black for all his work on this topic. I am marking this thread for future review so we can see if there are further ways using the knowledge in here to make things better for everyone.

if this greatly improves performance in general for people using ssd's (possible as either single or cache pool configuration) then im assuming the above commands could be enabled/disabled as options in the webui?, or possibly better, detect if cache drive is ssd, does have trim capability and if so then by default enable partial balance and trim on a configurable schedule, is that the sort of thing your considering @jonp ?

jonp · October 6, 2017

if this greatly improves performance in general for people using ssd's (possible as either single or cache pool configuration) then im assuming the above commands could be enabled/disabled as options in the webui?, or possibly better, detect if cache drive is ssd, does have trim capability and if so then by default enable partial balance and trim on a configurable schedule, is that the sort of thing your considering [mention=62528]jonp[/mention] ?

Possibly. Tom and I had a long conversation about proper trim support at one point. The real trick is when you have an SSD assigned to the array. Depending on the method of discard/trim that the SSD supports, it could potentially violate the integrity of parity (changing values on the device without updating the corresponding blocks on the parity disk). I realize the context of this thread is with relation to the cache, but if we are implementing proper support for trim, we will want to address this at the same time.

Sent from my SM-G930P using Tapatalk

binhex · October 6, 2017

1 minute ago, jonp said:

Possibly. Tom and I had a long conversation about proper trim support at one point. The real trick is when you have an SSD assigned to the array. Depending on the method of discard/trim that the SSD supports, it could potentially violate the integrity of parity (changing values on the device without updating the corresponding blocks on the parity disk). I realize the context of this thread is with relation to the cache, but if we are implementing proper support for trim, we will want to address this at the same time.

Sent from my SM-G930P using Tapatalk

ahh ok, i wasn't thinking about this in relation to array disks, i see your point, maybe though a first step to address the issue with btrfs and cache drives would be welcome, go for the trendy 'agile' approach rather than waterfall

Large copy/write on btrfs cache pool locking up server temporarily

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

JorgeB

limetech

Allram

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation