Large copy/write on btrfs cache pool locking up server temporarily


Recommended Posts

Not sure if related, but it seems suspect that it could be a similar issue. I am just getting into unRAID and have built a server within the past couple months and have setup my cache pool as BTRFS RAID 1 using 2x1TB Samsung EVO 850s. I bought these large (expensive!) SSDs specifically because I wanted a large cache pool to put torrents and other downloads on so my array disks can be kept spun down, and of course for containers and VMs, all while maintaining minimum foot print and keeping drive bays open for my array data disks. Also, since I'm using such large cache disks where data could potentially reside for a while before being moved to the main array, I wanted to mirror them to avoid potential data loss and downtime if there is a failure. The cache pool is pretty underutilized right now. I have a handful of containers and a few VMs (one active and a couple usually shut off) on there and only using about 20%.

 

I tried running binhex's rtorrentvpn docker container for torrent downloading and immediately ran into problems. I have the container running in appdata which is of course on the mirrored cache pool, and I also setup a cache only "downloads" share to put the torrents in. Running the container by itself was fine, but as soon as I loaded some torrents I would start getting timeouts from the rutorrent web GUI and, even though initially the torrents would seem ok (hitting 3-400+ kbps), in an hour or less though they would basically die off and barely be able to maintain 5-10 kbps. The initial torrents I put on for testing were a few at ~10gb each, so around 25-30gb of data. The first time it happened it did hose up the unRAID web GUI and SSH as well, making them extremely slow, but I was able to get the server to reboot eventually. After that I continued to run into the torrents slowing down to nearly 0 issue, but at least it never caused a complete hang of unRAID itself like the first time.

 

Anyway I posted to binhex's support thread for the container and we weren't able to figure out much. He uses it in a similar way without problem, docker and download location both on cache, but the difference is he does not have it in RAID 1 / mirrored (not sure on filesystem). Playing around though eventually I figured out if I put in another disk and mounted it with unassigned devices, then moved my torrent download location to there instead of the cache pool, all problems were solved - torrents run great and fast (steady, constant 500+ kbps on multiple torrents with no slowdown) and the rutorrent web GUI never gets the timeout messages now. Unfortunately though that means now I have a drive bay taken up by a non-array disk just for torrent downloading, which I'd rather avoid.

 

So not sure how much else I can do, though if you want me to try something non-destructive I might be able to, but wanted to at least +1 that there definitely seems to be something afoot with running a BTRFS mirrored SSD cache pool. At least for me and others in this thread.

Link to comment
On 10/6/2017 at 12:00 PM, thomast_88 said:

@deusxanime can you try to copy a large file > 10 GB from anoter disk to the cache array? While you do it ssh and check "top" and monitor the Load Avarage.

 

Got a chance to try this, though I'm not sure the conditions were clean. Something seems to be bogging down my system, or causing slow write speeds (both to array and cache). But I gave it a try anyway. I copied a ~15 GB file from my array to cache disk(s). Before I started my Load Average (first number since I didn't realize there were 3, not sure what they are for) was hovering anywhere from 1.5 ~ 5. While the copy was going, here's around where it topped out and hovered:

 

load average: 22.64, 19.19, 16.37

 

I accidentally copied just from array to array as well and saw similar Load Average numbers while that was going on, so not sure if that does anything to help or hinder you, but FYI. 

 

Also not sure if something is going on with my system but it seemed to be doing slow copies in general. Copying from array to cache I was only getting 35-40 MBps and array to array was 20 MBps if I was lucky. I know parity calculations can slow down writes to the array, but I still usually got 50-55 MBps previously, and cache should be even faster since there is no parity I would think. It would only be limited by the spinning disk in the array's read speed.

 

edit: Btw, here's my btrfs filesystem show/stats:

 

Label: none  uuid: 5130d84d-e43f-45ed-9fa1-5a50be7ab49c
	Total devices 2 FS bytes used 224.38GiB
	devid    1 size 931.51GiB used 284.03GiB path /dev/sdc1
	devid    2 size 931.51GiB used 284.03GiB path /dev/sdb1

 

I see there is a Balance Status section and a Balance button below that. Any reason not to use that rather than the command from the previous posts in this thread?

 

Edited by deusxanime
Link to comment

Hmm, so yesterday my server was acting up again. Seems like it's because of this:

root@unRAID:~# btrfs fi show /mnt/cache
Label: none  uuid: 4ad605bd-2713-453f-916b-699068fd9790
        Total devices 2 FS bytes used 203.20GiB
        devid    1 size 232.89GiB used 232.88GiB path /dev/sdc1
        devid    2 size 232.89GiB used 232.88GiB path /dev/sde1

I did as per @johnnie.black advice:

btrfs balance start -dusage=75 /mnt/cache

And it became:

root@unRAID:~# btrfs fi show /mnt/cache
Label: none  uuid: 4ad605bd-2713-453f-916b-699068fd9790
        Total devices 2 FS bytes used 175.13GiB
        devid    1 size 232.89GiB used 195.05GiB path /dev/sdc1
        devid    2 size 232.89GiB used 195.05GiB path /dev/sde1

This morning I'm back to:

root@unRAID:~# btrfs fi show /mnt/cache
Label: none  uuid: 4ad605bd-2713-453f-916b-699068fd9790
        Total devices 2 FS bytes used 203.20GiB
        devid    1 size 232.89GiB used 232.88GiB path /dev/sdc1
        devid    2 size 232.89GiB used 232.88GiB path /dev/sde1

Why is this happening? I did write around 30 gb to the cache during the night, but why is it showing as full again? Maybe you have an idea @johnnie.black ? You seem to be the expert on this topic B|

Link to comment
8 minutes ago, thomast_88 said:

Why is this happening? I did write around 30 gb to the cache during the night, but why is it showing as full again?

 

This behavior is currently normal with brtfs, specially on a cache type fs where data is constantly being added and deleted, this should improve once we get on kernel 4.14, until then if you like you can experiment with this, you'll still need to run a balance to bring it down (and use a higher usage value like -dusage=95) but it should maintain the allocated space closer to the used, create this file on the flash:

 

config/extra.cfg

and put in there this line:

cacheExtra="nossd"

Stop and re-start the array.

 

Note: this only works for cache pools but you can still have a single-device "pool" if the number of defined cache slots >= 2.

 

 

Link to comment
15 minutes ago, thomast_88 said:

What exactly will this setting do? 

 

Best explained here:

 

https://btrfs.wiki.kernel.org/index.php/Gotchas#The_ssd_mount_option

 

well it was edited, this was what it said:


 

Quote

 

=== The ssd mount option ===

 

− 

This option results in allocation, and thus writing, of new nodes/extents to be spread across the chunks of the volume, in an attempt to wear-level writes. This results in a few problems:

 

− 

* It might result in many chunks being allocated but only partially filled ([https://www.mail-archive.com/[email protected]/msg63056.html -o ssd] vs. [https://www.mail-archive.com/[email protected]/msg63076.html -o nossd]), requiring much more frequent balance operations to avoid running out of free chunks, causing even more writes to the ssd to happen, defeating the purpose.

 −

* In some cases the ssd option is set wrongly anyhow as /sys/block/$DEV/queue/rotational is set to a value that does not reflect the actual type of the physical device. I.e. this flag is not a reliable way of determining if the device is a locally attached SSD. For example, iSCSI or NBD displays as non-rotational, while a loop device on an SSD shows up as rotational.

 −

* The attempts to make assumptions about the actual layout of data on the SSD are invalid since the introduction of the [https://www.micron.com/~/media/documents/products/technical-marketing-brief/brief_ssd_effect_data_placement_writes.pdf Flash Translation Layer (FTL)] many years ago.

 

− 

Currently using mount option nossd explicitly in nearly all cases [https://www.mail-archive.com/[email protected]/msg64041.html is probably best].

 

 

Edited by johnnie.black
  • Like 1
Link to comment

Hi,

 

I wanted to give some info as well. I have seen this issue starting with 6.3.x and a BTRFS Cache pool. What i can tell you that the cause is based on the kernel in the system. 

 

An easy way to test the behaviour is to use dd command to simulate the write operations. 

 

"dd bs=1M count=8096 if=/dev/zero of=test conv=fdatasync; rm test" 

 

This command creates an 8GB Test file and will delete it afterwards.  You can execute the command in /mnt/cache/ on the system. You will see high IO waits and Load going up. So far nothing weird.This will also work on mounted SSD´s from unassigned devices or on any other disk btw. 

 

If you use a program like netdata. ( you can find an easy to use docker to start it on your unraid server) you should watch for three things. 

 

Interrupts 

Network

Memory->Kernel (especially the dirty pages).

 

On a "good" (4.9.10 Unraid 6.3.2 ) kernel, you will see that Interrupts & Network are still flowing along while you perform the DD on the SSD. 

On a "bad" (4.12.12 Unraid 6.4rc9f)  kernel, you will see that interrupts & networking break down while the DD is running. 

 

Especially interesting is that in this case, Dirty Pages on the Memory tab are either going together well with writeback OR there will be a gap and the writeback is delayed. 

 

IMHO: I believe that the behaviour has to do with the amount of available memory in your system and the kernel's handling of dirty pages and writebacks. Why certain kernels are going in this blocking behaviour and other not so much or not at all I cannot explain easily. 

 

I believe that BTRFS raid 0/1 modify some of the behaviours that is why I switched to an XFS based single Cache drive. I do not have any lockings since i moved to XFS. 

 

My system:

 

Model: Custom
M/B: Supermicro - X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
CPU: Intel® Xeon® CPU E5-2650 0 @ 2.00GHz
HVM: Enabled
IOMMU: Enabled
Cache: 512 kB, 2048 kB, 20480 kB
Memory: 96 GB (max. installable capacity 512 GB)

 

 

Edited by half
  • Like 1
Link to comment
On 10/8/2017 at 9:49 AM, johnnie.black said:

 

This behavior is currently normal with brtfs, specially on a cache type fs where data is constantly being added and deleted, this should improve once we get on kernel 4.14

 

 

I too am experiencing these issues (btrfs cache pool of 2 x 500gb EVO SSD). 

 

Just wondering the ETA of the 4.14 kernel.

 

Wait until that appears or drop to a single XFS based cache drive? Or keep running balances every few days?

 

Appreciate any advice. Seems to me alot of people are going to the XFS route and having the spare SSD as an unassigned device?

Edited by Zangief
Link to comment
3 hours ago, Zangief said:

I too am experiencing these issues (btrfs cache pool of 2 x 500gb EVO SSD). 

 

Just wondering the ETA of the 4.14 kernel.

 

Wait until that appears or drop to a single XFS based cache drive? Or keep running balances every few days?

 

Appreciate any advice. Seems to me alot of people are going to the XFS route and having the spare SSD as an unassigned device?

 

I'm myself converting down to a single cache drive. Multiple drives is just too unstable for me (and for many other it seems). Even when balancing each day (!!).

 

Is it possible to run raid1 cache with XFS on unraid? Or does it even make any sense?

Link to comment
17 minutes ago, thomast_88 said:

 

I'm myself converting down to a single cache drive. Multiple drives is just too unstable for me (and for many other it seems). Even when balancing each day (!!).

 

Is it possible to run raid1 cache with XFS on unraid? Or does it even make any sense?

 

What's your backup plan for your Cache drive? I'm trying to figure out the best way to proceed with a single drive.  Thanks.

 
Link to comment
  • 1 month later...
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.