Large copy/write on btrfs cache pool locking up server temporarily

October 6, 20178 yr

Not sure if related, but it seems suspect that it could be a similar issue. I am just getting into unRAID and have built a server within the past couple months and have setup my cache pool as BTRFS RAID 1 using 2x1TB Samsung EVO 850s. I bought these large (expensive!) SSDs specifically because I wanted a large cache pool to put torrents and other downloads on so my array disks can be kept spun down, and of course for containers and VMs, all while maintaining minimum foot print and keeping drive bays open for my array data disks. Also, since I'm using such large cache disks where data could potentially reside for a while before being moved to the main array, I wanted to mirror them to avoid potential data loss and downtime if there is a failure. The cache pool is pretty underutilized right now. I have a handful of containers and a few VMs (one active and a couple usually shut off) on there and only using about 20%.

I tried running binhex's rtorrentvpn docker container for torrent downloading and immediately ran into problems. I have the container running in appdata which is of course on the mirrored cache pool, and I also setup a cache only "downloads" share to put the torrents in. Running the container by itself was fine, but as soon as I loaded some torrents I would start getting timeouts from the rutorrent web GUI and, even though initially the torrents would seem ok (hitting 3-400+ kbps), in an hour or less though they would basically die off and barely be able to maintain 5-10 kbps. The initial torrents I put on for testing were a few at ~10gb each, so around 25-30gb of data. The first time it happened it did hose up the unRAID web GUI and SSH as well, making them extremely slow, but I was able to get the server to reboot eventually. After that I continued to run into the torrents slowing down to nearly 0 issue, but at least it never caused a complete hang of unRAID itself like the first time.

Anyway I posted to binhex's support thread for the container and we weren't able to figure out much. He uses it in a similar way without problem, docker and download location both on cache, but the difference is he does not have it in RAID 1 / mirrored (not sure on filesystem). Playing around though eventually I figured out if I put in another disk and mounted it with unassigned devices, then moved my torrent download location to there instead of the cache pool, all problems were solved - torrents run great and fast (steady, constant 500+ kbps on multiple torrents with no slowdown) and the rutorrent web GUI never gets the timeout messages now. Unfortunately though that means now I have a drive bay taken up by a non-array disk just for torrent downloading, which I'd rather avoid.

So not sure how much else I can do, though if you want me to try something non-destructive I might be able to, but wanted to at least +1 that there definitely seems to be something afoot with running a BTRFS mirrored SSD cache pool. At least for me and others in this thread.

Quote

October 6, 20178 yr

@deusxanime can you try to copy a large file > 10 GB from anoter disk to the cache array? While you do it ssh and check "top" and monitor the Load Avarage.

Edited October 6, 20178 yr by thomast_88

Quote

October 6, 20178 yr

Another guy on reddit seems to have problems with Raid 1 as well:

Quote

October 8, 20178 yr

On 10/6/2017 at 12:00 PM, thomast_88 said:

@deusxanime can you try to copy a large file > 10 GB from anoter disk to the cache array? While you do it ssh and check "top" and monitor the Load Avarage.

Got a chance to try this, though I'm not sure the conditions were clean. Something seems to be bogging down my system, or causing slow write speeds (both to array and cache). But I gave it a try anyway. I copied a ~15 GB file from my array to cache disk(s). Before I started my Load Average (first number since I didn't realize there were 3, not sure what they are for) was hovering anywhere from 1.5 ~ 5. While the copy was going, here's around where it topped out and hovered:

load average: 22.64, 19.19, 16.37

I accidentally copied just from array to array as well and saw similar Load Average numbers while that was going on, so not sure if that does anything to help or hinder you, but FYI.

Also not sure if something is going on with my system but it seemed to be doing slow copies in general. Copying from array to cache I was only getting 35-40 MBps and array to array was 20 MBps if I was lucky. I know parity calculations can slow down writes to the array, but I still usually got 50-55 MBps previously, and cache should be even faster since there is no parity I would think. It would only be limited by the spinning disk in the array's read speed.

edit: Btw, here's my btrfs filesystem show/stats:

Label: none  uuid: 5130d84d-e43f-45ed-9fa1-5a50be7ab49c
	Total devices 2 FS bytes used 224.38GiB
	devid    1 size 931.51GiB used 284.03GiB path /dev/sdc1
	devid    2 size 931.51GiB used 284.03GiB path /dev/sdb1

I see there is a Balance Status section and a Balance button below that. Any reason not to use that rather than the command from the previous posts in this thread?

Edited October 8, 20178 yr by deusxanime

Quote

October 8, 20178 yr

Hmm, so yesterday my server was acting up again. Seems like it's because of this:

root@unRAID:~# btrfs fi show /mnt/cache
Label: none  uuid: 4ad605bd-2713-453f-916b-699068fd9790
        Total devices 2 FS bytes used 203.20GiB
        devid    1 size 232.89GiB used 232.88GiB path /dev/sdc1
        devid    2 size 232.89GiB used 232.88GiB path /dev/sde1

I did as per @johnnie.black advice:

btrfs balance start -dusage=75 /mnt/cache

And it became:

root@unRAID:~# btrfs fi show /mnt/cache
Label: none  uuid: 4ad605bd-2713-453f-916b-699068fd9790
        Total devices 2 FS bytes used 175.13GiB
        devid    1 size 232.89GiB used 195.05GiB path /dev/sdc1
        devid    2 size 232.89GiB used 195.05GiB path /dev/sde1

This morning I'm back to:

root@unRAID:~# btrfs fi show /mnt/cache
Label: none  uuid: 4ad605bd-2713-453f-916b-699068fd9790
        Total devices 2 FS bytes used 203.20GiB
        devid    1 size 232.89GiB used 232.88GiB path /dev/sdc1
        devid    2 size 232.89GiB used 232.88GiB path /dev/sde1

Why is this happening? I did write around 30 gb to the cache during the night, but why is it showing as full again? Maybe you have an idea @johnnie.black ? You seem to be the expert on this topic

Quote

October 8, 20178 yr

Community Expert

8 minutes ago, thomast_88 said:

Why is this happening? I did write around 30 gb to the cache during the night, but why is it showing as full again?

This behavior is currently normal with brtfs, specially on a cache type fs where data is constantly being added and deleted, this should improve once we get on kernel 4.14, until then if you like you can experiment with this, you'll still need to run a balance to bring it down (and use a higher usage value like -dusage=95) but it should maintain the allocated space closer to the used, create this file on the flash:

config/extra.cfg

and put in there this line:

cacheExtra="nossd"

Stop and re-start the array.

Note: this only works for cache pools but you can still have a single-device "pool" if the number of defined cache slots >= 2.

Quote

October 8, 20178 yr

Thanks - i will try that. What exactly will this setting do?

cacheExtra="nossd"

Quote

October 8, 20178 yr

Community Expert

15 minutes ago, thomast_88 said:

What exactly will this setting do?

~~Best explained here:~~

~~https://btrfs.wiki.kernel.org/index.php/Gotchas#The_ssd_mount_option~~

well it was edited, this was what it said:

Quote

=== The ssd mount option ===

−

−

This option results in allocation, and thus writing, of new nodes/extents to be spread across the chunks of the volume, in an attempt to wear-level writes. This results in a few problems:

−

−

* It might result in many chunks being allocated but only partially filled ([https://www.mail-archive.com/[email protected]/msg63056.html -o ssd] vs. [https://www.mail-archive.com/[email protected]/msg63076.html -o nossd]), requiring much more frequent balance operations to avoid running out of free chunks, causing even more writes to the ssd to happen, defeating the purpose.

−

* In some cases the ssd option is set wrongly anyhow as /sys/block/$DEV/queue/rotational is set to a value that does not reflect the actual type of the physical device. I.e. this flag is not a reliable way of determining if the device is a locally attached SSD. For example, iSCSI or NBD displays as non-rotational, while a loop device on an SSD shows up as rotational.

−

* The attempts to make assumptions about the actual layout of data on the SSD are invalid since the introduction of the [https://www.micron.com/~/media/documents/products/technical-marketing-brief/brief_ssd_effect_data_placement_writes.pdf Flash Translation Layer (FTL)] many years ago.

−

−

Currently using mount option nossd explicitly in nearly all cases [https://www.mail-archive.com/[email protected]/msg64041.html is probably best].

Edited October 8, 20178 yr by johnnie.black

Quote

1

October 8, 20178 yr

Just a thank you to @johnnie.black this is causing me issues as well and good that someone has kept tracking it.

Will take a look at the new suggestions.

Quote

October 11, 20178 yr

Hi,

I wanted to give some info as well. I have seen this issue starting with 6.3.x and a BTRFS Cache pool. What i can tell you that the cause is based on the kernel in the system.

An easy way to test the behaviour is to use dd command to simulate the write operations.

"dd bs=1M count=8096 if=/dev/zero of=test conv=fdatasync; rm test"

This command creates an 8GB Test file and will delete it afterwards. You can execute the command in /mnt/cache/ on the system. You will see high IO waits and Load going up. So far nothing weird.This will also work on mounted SSD´s from unassigned devices or on any other disk btw.

If you use a program like netdata. ( you can find an easy to use docker to start it on your unraid server) you should watch for three things.

Interrupts

Network

Memory->Kernel (especially the dirty pages).

On a "good" (4.9.10 Unraid 6.3.2 ) kernel, you will see that Interrupts & Network are still flowing along while you perform the DD on the SSD.

On a "bad" (4.12.12 Unraid 6.4rc9f) kernel, you will see that interrupts & networking break down while the DD is running.

Especially interesting is that in this case, Dirty Pages on the Memory tab are either going together well with writeback OR there will be a gap and the writeback is delayed.

IMHO: I believe that the behaviour has to do with the amount of available memory in your system and the kernel's handling of dirty pages and writebacks. Why certain kernels are going in this blocking behaviour and other not so much or not at all I cannot explain easily.

I believe that BTRFS raid 0/1 modify some of the behaviours that is why I switched to an XFS based single Cache drive. I do not have any lockings since i moved to XFS.

My system:

Model: Custom

M/B: Supermicro - X9SRE/X9SRE-3F/X9SRi/X9SRi-3F

CPU: Intel® Xeon® CPU E5-2650 0 @ 2.00GHz

HVM: Enabled

IOMMU: Enabled

Cache: 512 kB, 2048 kB, 20480 kB

Memory: 96 GB (max. installable capacity 512 GB)

Edited October 11, 20178 yr by half

Quote

1

October 13, 20178 yr

Just to chime in here:

I'm experiencing the same issue, also using 2x 850 Evos. Not a fan. Haven't tried any troubleshooting things mentioned in this thread or anything yet, but just to say that it is also affecting me, etc.

Quote

1
1

October 16, 20178 yr

On 10/13/2017 at 10:26 PM, -Daedalus said:

Just to chime in here:

I'm experiencing the same issue, also using 2x 850 Evos. Not a fan. Haven't tried any troubleshooting things mentioned in this thread or anything yet, but just to say that it is also affecting me, etc.

Raid 0 or Raid 1 ?

Quote

October 16, 20178 yr

1.

Quote

October 16, 20178 yr

1 hour ago, -Daedalus said:

1.

Same here. Seems most of us have problem with raid 1. @johnnie.black is using raid 0 and he doesn't seem to have this problem.

For now raid 1 is pretty useless on a BTRFS cache array it seems.

Quote

October 16, 20178 yr

Community Expert

2 minutes ago, thomast_88 said:

Same here. Seems most of us have problem with raid 1. @johnnie.black is using raid 0 and he doesn't seem to have this problem.

I'm currently using a single NVMe device, but I did use in the past raid 0, 1 and 10 and never had that issue.

Quote

October 19, 20178 yr

On 10/8/2017 at 9:49 AM, johnnie.black said:

This behavior is currently normal with brtfs, specially on a cache type fs where data is constantly being added and deleted, this should improve once we get on kernel 4.14

I too am experiencing these issues (btrfs cache pool of 2 x 500gb EVO SSD).

Just wondering the ETA of the 4.14 kernel.

Wait until that appears or drop to a single XFS based cache drive? Or keep running balances every few days?

Appreciate any advice. Seems to me alot of people are going to the XFS route and having the spare SSD as an unassigned device?

Edited October 19, 20178 yr by Zangief

Quote

October 19, 20178 yr

Community Expert

9 minutes ago, Zangief said:

Just wondering the ETA of the 4.14 kernel.

3 to 4 weeks, this is the ETA for the Linux kernel, not for unRAID using that kernel.

Quote

October 19, 20178 yr

3 hours ago, Zangief said:

I too am experiencing these issues (btrfs cache pool of 2 x 500gb EVO SSD).

Just wondering the ETA of the 4.14 kernel.

Wait until that appears or drop to a single XFS based cache drive? Or keep running balances every few days?

Appreciate any advice. Seems to me alot of people are going to the XFS route and having the spare SSD as an unassigned device?

I'm myself converting down to a single cache drive. Multiple drives is just too unstable for me (and for many other it seems). Even when balancing each day (!!).

Is it possible to run raid1 cache with XFS on unraid? Or does it even make any sense?

Quote

October 19, 20178 yr

Community Expert

8 minutes ago, thomast_88 said:

Is it possible to run raid1 cache with XFS on unraid? Or does it even make any sense?

unRAID has no support for this. The closest you would get is to use a hardware RAID controller to provide the RAID1 capability - and let unRAID treat it as a single drive.

Quote

October 19, 20178 yr

17 minutes ago, thomast_88 said:

I'm myself converting down to a single cache drive. Multiple drives is just too unstable for me (and for many other it seems). Even when balancing each day (!!).

Is it possible to run raid1 cache with XFS on unraid? Or does it even make any sense?

What's your backup plan for your Cache drive? I'm trying to figure out the best way to proceed with a single drive. Thanks.

Quote

October 19, 20178 yr

I'll use appdata backup plugin for the appdata share (which is on the cache array). For VM data and docker.img, not sure yet. Crossing my fingers my disk won't die

Edited October 19, 20178 yr by thomast_88

Quote

December 12, 20178 yr

Has anyone tested since moving to 6.4.0_rc15e (Kernel: Linux 4.14.4-unRAID x86_64)? Wondering if the issue is resolved.

Quote

December 12, 20178 yr

Community Expert

6 minutes ago, the_larizzo said:

Has anyone tested since moving to 6.4.0_rc15e (Kernel: Linux 4.14.4-unRAID x86_64)? Wondering if the issue is resolved.

It looks to be resolved, I've disabled my weekly balance and so far so good, but I'd like to wait a few more weeks before saying it's fixed for sure.

Quote

3

December 17, 20178 yr

I'm still having issues on rc15e

Quote

February 7, 20188 yr

@johnnie.black can you provide a status after running for a few months?

Quote

Large copy/write on btrfs cache pool locking up server temporarily

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

JorgeB

limetech

Allram

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

JorgeB

limetech

Allram

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)