[6.8.3] docker image huge amount of unnecessary writes on cache

limetech · July 24, 2020

13 minutes ago, Dephcon said:

So that's a 2.5x difference.

But negligible given absolute amount of data written.

A loopback is always going to incur more overhead because there is the overhead of the file system within the loopback and then there is the overhead of the file system hosting the loopback. In most cases the benefit of the loopback far outweighs the extra overhead.

Dephcon · July 24, 2020

23 minutes ago, limetech said:

But negligible given absolute amount of data written.

A loopback is always going to incur more overhead because there is the overhead of the file system within the loopback and then there is the overhead of the file system hosting the loopback. In most cases the benefit of the loopback far outweighs the extra overhead.

In this case, yes, however I purposely removed some of my higher IO loads from this test to limit the variability of writes so i could have shorter test periods. This test is purely container appdata, excluded is:

transcoding
download/extract
folder caching
array backup staging

In @johnnie.black's case, a huge amount of SSD wear can be avoided, which is on the opposite end of the spectrum of my test case. I still might end up using BTRFS RAID for one or more pool devices, i just wanted to provide a reasonably solid number that other users could to apply to their own loads and decide of themselves if X times less writes it's worth switching to XFS.

Either way it was fun to investigate!

Edited July 24, 2020 by Dephcon

TexasUnraid · July 24, 2020

I think the beta allows the loopback image to be formatted as XFS?

Might be interesting to test an XFS docker image on a btrfs cache to see the difference.

Dephcon · August 4, 2020

Just wanted to circle back to this now that my testing is over and I've finalized my caching config (for now).

Previously I was using a 4 SSD BTRFS RAID10 for "cache", using 4K partitioning

Now I have a 2 SSD BTRFS RAID1, 1M partitioned for array cache and docker-xfs.img and a XFS formatted pool device to use for scratch-space. currently this includes plex transcoding and duplicacy cache. I might move my usenet download/extract over to this scratch pool as well, but i want to get extended performance data before changing anything further.

I'm, pretty happy with the reduction in IO from space_cache v2 and 1MiB partitioning. All XFS would have been "better" for disk longevity, but I really like the extra level of protection from BTRFS RAID.

last 48hrs:

Edited August 4, 2020 by Dephcon

TexasUnraid · August 4, 2020

I did my own testing as well, still waiting to get final overnight numbers for the setup I think I have settled on but overall it seems the lowest I could get writes on a BTRFS raid5 cache pool was ~1.5GB/hour or so, give or take some depending on what was running at the time (got more dockers now then when I did my prior testing). It seems that total writes to the pool scale somewhat with how many drives are in the pool.

Does the 1mib partition alignment also effect xfs partitions? My writes on the XFS cache seem lower even with more dockers?

When tested with less drives the writes to the pool were less, with more drives would get upwards of 2gb/hour+ total to the pool (had up to 9 drives in the pool during testing).

Vs using an XFS cache pool where I am consistently getting ~200mb/hour writes or less and btrfs pool gets zero writes unless something is actively using it.

Overall I think I have settled on moving my docker/appdata and a few other shares over to the XFS drive. All things that could be easily recreated if it died from regular backups that are made to the array. Then using the BTRFS cache for normal caching duties.

Edited August 4, 2020 by TexasUnraid

nickp85 · August 7, 2020

On 8/4/2020 at 3:41 PM, Dephcon said:

currently this includes plex transcoding and duplicacy cache.

Shift your plex transcoding to memory by putting it in /tmp! I did this not long ago, create a Ramdisk on boot in /tmp with 4 GB of space and let Plex use it for transcoding. Can put more if you want it. There is a post on the forum about it somewhere. Great to reduce wear and tear on disk.

Taddeusz · August 7, 2020

1 minute ago, nickp85 said:

Shift your plex transcoding to memory by putting it in /tmp! I did this not long ago, create a Ramdisk on boot in /tmp with 4 GB of space and let Plex use it for transcoding. Can put more if you want it. There is a post on the forum about it somewhere. Great to reduce wear and tear on disk.

The only caveat to this is that if you use Plex to record TV it uses the temp while recording. I discovered this myself when recordings were failing due to running out of space. It does this to produce a streamable version of the recording just in case you want to watch it while it's being recorded.

Dephcon · August 7, 2020

2 hours ago, nickp85 said:

Shift your plex transcoding to memory by putting it in /tmp! I did this not long ago, create a Ramdisk on boot in /tmp with 4 GB of space and let Plex use it for transcoding. Can put more if you want it. There is a post on the forum about it somewhere. Great to reduce wear and tear on disk.

i used to do it in ram when i had 32GB, when i upgraded i only had 16GB of DDR4 available so its a bit tight now.

vakilando · August 10, 2020

Damn! My Server seems also to be affected...
I had an unencrypted BTRFS RAID 1 with two SanDisk Plus 480 GB.
Both died in quick succession (mor or less 2 weeks) after 2 year of use!

So I bought two 1 TB Crucial MX500.
As I didn't know about the problem I again made a unencrypted BTRFS RAID 1 (01 July 2020).
As I found it strange that they died in quick succession I did some researches and found all those threads about massive writes on BTRFS cache disks.
I made some tetst and here are the results.

### Test 1:

running "iotop -ao" for 60 min: 2,54 GB [loop2] (see pic1)

pic1.png.1f36cc11a2b99512e6724481eb27c8a8.png

Docker Container running:

The docker containers running during this test are the most important for me.
I stopped Pydio and mariadb though its also important for me - see other tests for the reason...

- ts-dnsserver
- letsencrypt
- BitwardenRS
- Deconz
- MQTT
- MotionEye
- Homeassistant
- Duplicacy

shfs writes:

- Look pic1, are the shfs writes ok? I don't know...

VMs running (all on Unassigned disk):
- Linux Mint (my primary Client)
- Win10
- Debian with SOGo Mail Server

/usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 10.9
/usr/sbin/smartctl -A /dev/sdh | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 10.9

### Test 2:

running "iotop -ao" for 60 min: 3,29 GB [loop2] (see pic2)

Docker Container running (almost all of my dockers):
- ts-dnsserver
- letsencrypt
- BitwardenRS
- Deconz
- MQTT
- MotionEye
- Homeassistant
- Duplicacy
----------------
- mariadb
- Appdeamon
- Xeoma
- NodeRed-OfficialDocker
- hacc
- binhex-emby
- embystat
- pydio
- picapport
- portainer

shfs writes:

- Look pic2, there are massive shfs writes too!

VMs running (all on Unassigned disk)
- Linux Mint (my primary Client)
- Win10
- Debian with SOGo Mail Server

/usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11
/usr/sbin/smartctl -A /dev/sdh | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11

### Test 3:

running "iotop -ao" for 60 min: 3,04 GB [loop2] (see pic3)

Docker Container running (almost all my dockers except mariadb/pydio!):
- ts-dnsserver
- letsencrypt
- BitwardenRS
- Deconz
- MQTT
- MotionEye
- Homeassistant
- Duplicacy
----------------
- Appdeamon
- Xeoma
- NodeRed-OfficialDocker
- hacc
- binhex-emby
- embystat
- picapport
- portainer

shfs writes:

- Look at pic3, the shfs writes are clearly less without mariadb!
(I also stopped pydio as it needs mariadb...)

VMs running (all on Unassigned disk)
- Linux Mint (my primary Client)
- Win10
- Debian with SOGo Mail Server

/usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11
/usr/sbin/smartctl -A /dev/sdh | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11

### Test 4:

running "iotop -ao" for 60 min: 6,23 M [loop2] (see pic4)

Docker Container running:

- none, but docker service is started

shfs writes:

- none

VMs running (all on Unassigned disk)
- Linux Mint (my primary Client)
- Win10
- Debian with SOGo Mail Server

/usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 *
PLEASE resolve this problem in next stable release!!!!!!!

Next weenkend I will remove the BTRFS RAID 1 Cache and go with one single XFS cache disk.

If I ca do more analysis and research, please let me know. I'll do my best!

Edited August 10, 2020 by vakilando

vakilando · August 10, 2020

perhaps I should mention, that I had my VMs on the cache pool before, but the performance was terrible.

Since moving them to an unassigned disk their performance is really fine!

Perhaps the poor performance was due to the massive writes on the cache pool....?

vakilando · August 10, 2020

oh....sorry... I did not read the whole thread...

Now I did!

I'll try the fix now an do this:

mount -o remount -o space_cache=v2 /mnt/cache

TexasUnraid · August 10, 2020

The 6.9 beta fixes all the issues with BTRFS and just leaves the inherent BTRFS write amplification.

In my case I decided to move appdata and docker to an XFS cache pool on 6.9 and leave everything else on the BTRFS pool.

This dropped writes down to ~200mb an hour vs 2GB and should the XFS drive die I can easily rebuild it from backups / re-downloading the dockers.

boomam · August 10, 2020

5 minutes ago, TexasUnraid said:

The 6.9 beta fixes all the issues with BTRFS and just leaves the inherent BTRFS write amplification.

(Unless I've misread), then BTRFS issue persisting just because it's BTRFS + BTRFS is a shame, but for cache redundancy, kind of nessessity.

TexasUnraid · August 11, 2020

I would not say btrfs is a shame, in fact I am really liking it overall. The only issue I have had is the write amplification which is known and kind of a perfect storm with the docker/appdata (a lot of small writes, most to a image file).

During normal cache use I see negligible write amplification and have had no issues. Since switching docker to XFS everything is working great just using btrfs for cache / scratch drive.

boomam · August 11, 2020

The XFS docker conversion is interesting.

Had anyone done any in-depth comparisons on speed, writes, etc. ?

TexasUnraid · August 11, 2020

I was referring to moving the btrfs docker image to an XFS formatted cache pool.

In 6.9 beta you can have multiple cache pools.

boomam · August 11, 2020

2 minutes ago, TexasUnraid said:

I was referring to moving the btrfs docker image to an XFS formatted cache pool.

In 6.9 beta you can have multiple cache pools.

I know

But I'm asking if anyone has done some proper testing into it, comparative to BTRFS.

TexasUnraid · August 11, 2020

Yes, I did and so did others. I was sitting at ~2GB/ hour with everything on the BTRFS cache pool (although the actual writes would be much higher with more devices int he pool)

Moving docker and appdata to the XFS cache I am seeing ~200-250mb/hour on the XFS pool and basically zero to the BTRFS pool unless I do something.

boomam · August 11, 2020

But what about BTRFS cache with xfs docker IMG?

TexasUnraid · August 11, 2020

Oh, someone posted some results a page or 2 back for that. I tried it but the writes were still quite a bit higher then using the XFS option and the writes were limited to a single SSD I don't care about instead of being spread out over all of them.

vakilando · August 11, 2020

ok, after I've executed the recommended command:

mount -o remount -o space_cache=v2 /mnt/cache

this ist the result after 7 hours of

iotop -ao

The running dockers were the same as my "Test 2" (all my dockers including mariadb and pydio)

See the picture:

It's better than before (less writes for loop2 and shfs) but it should be even less or what do you think?

testdasi · August 11, 2020

11 hours ago, TexasUnraid said:

In my case I decided to move appdata and docker to an XFS cache pool on 6.9 and leave everything else on the BTRFS pool.

Only docker image would need to be in xfs cache. Appdata isn't subjected to the same loop2 overhead. Which is great because the docker image doesn't really need protection.

5 hours ago, vakilando said:

It's better than before (less writes for loop2 and shfs) but it should be even less or what do you think?

Are you using 6.9.0? Did you also align the parition to 1MiB? That requires wiping the pool so I would assume quite few people would do it.

Edited August 11, 2020 by testdasi

testdasi · August 11, 2020

This is my quick test.

Unraid 6.9.0-beta25
2x Intel 750 1.2TB
BTRFS RAID-0 for data chunks, RAID-1 for metadata + system chunks
Both partitions aligned to 1MiB
35 dockers running but mostly idle

403.41 MB / 70 minutes or 345.78 MB/hr.

About 100MB/hr worse than @TexasUnraid XFS image but only about 1/3 of @vakilando test.

Maybe I'll do an overnight run or something to see if there's any diff.

TexasUnraid · August 11, 2020

3 hours ago, testdasi said:

Only docker image would need to be in xfs cache. Appdata isn't subjected to the same loop2 overhead. Which is great because the docker image doesn't really need protection.

Are you using 6.9.0? Did you also align the parition to 1MiB? That requires wiping the pool so I would assume quite few people would do it.

If you go back a ways in this thread, you will find a few pages of me testing every possible scenario.

While the docker image is the main culprit for sure, appdata was not far behind. With just appdata on the BTRFS I was still seeing around 800mb/hour IIRC. Vs both on the XFS and ~200mb/hour combined.

The issue is that small writes have a very large write amplification with btrfs and appdata sees a lot of these small writes as well (logs etc).

This write amplification goes up in proportion to the number of drives in the pool as well it seems (the small writes get spread over the drives) thus total writes for a 5 drive pool was much higher then a 2 drive pool. With a 9 device pool at one point I was seeing 1GB/hour PER DRIVE.

I was able to reduce the writes a fair amount by increasing the dirty writeback to like 4 minutes but that is not a practical solution.

CS01-HS · August 11, 2020

Here's my test with 1 hour of iotop -ao

Standard 6.8.3 with no customization
2x 500GB WD Blue SSD
BTRFS RAID-1 default
Cache, docker.img and libvirt.img with a dozen dockers and 2 VMs, mostly idle

I'd like to know what's responsible for the shfs /mnt/user -disks 7 entries

[6.8.3] docker image huge amount of unnecessary writes on cache

User Feedback

Recommended Comments

limetech 3328

Link to comment

Dephcon 20

Link to comment

TexasUnraid 113

Link to comment

Dephcon 20

Link to comment

TexasUnraid 113

Link to comment

nickp85 17

Link to comment

Taddeusz 104

Link to comment

Dephcon 20

Link to comment

vakilando 73

Link to comment

vakilando 73

Link to comment

vakilando 73

Link to comment

TexasUnraid 113

Link to comment

boomam 15

Link to comment

TexasUnraid 113

Link to comment

boomam 15

Link to comment

TexasUnraid 113

Link to comment

boomam 15

Link to comment

TexasUnraid 113

Link to comment

boomam 15

Link to comment

TexasUnraid 113

Link to comment

vakilando 73

Link to comment

testdasi 501

Link to comment

testdasi 501

Link to comment

TexasUnraid 113

Link to comment

CS01-HS 77

Link to comment

Join the conversation