[6.8.3] docker image huge amount of unnecessary writes on cache

testdasi · July 20, 2020

11 minutes ago, TexasUnraid said:

Good to know, interesting use case as well. How would the script know that an attack is taking place?

So no gotchas with symlinks on unraid? works just like any other linux system (aka, I can look up generic symlink tutorials online)?

Very simple really. I took inspiration from the protect against cryptovirus plugin I saw on the app store. I put a few traps on various SMB locations and have a script run periodically to check on those traps if they change. It is just part of my overall strategy.

No gotcha. Have been using symlinks for years.

I think you don't need to wait for 6.9 RC. This is beta 25, not beta 1. It's pretty rock solid for me.

JorgeB · July 20, 2020

5 minutes ago, testdasi said:

I think you don't need to wait for 6.9 RC. This is beta 25, not beta 1. It's pretty rock solid for me.

Yeah, while a few bugs are still normal there should be no show stoppers, and nothing that would put the data at risk, I updated one of my main servers today, and based on these past few hours the NVMe device that was writing about 3TB per day with v6.8 (without any optimizations) is now on course to write less than 200GB per day.

11 minutes ago, boomam said:

Do we know if there will be a definitive guide created for this issue once 6.9 drops, to help people convert over/transfer data/etc?

The release notes mention how to re-partition the device(s), backup/restore would depend on your current backups, but you can always move the cache data to the array and then restore, like mentioned in the FAQ for the cache replacement procedure.

boomam · July 20, 2020

2 minutes ago, johnnie.black said:

The release notes mention how to re-partition the device(s), backup/restore would depend on your current backups, but you can always move the cache data to the array and then restore, like mentioned in the FAQ for the cache replacement procedure.

It needs to be clearer than that - lets not encourage the usual 'linux/FOSS' views of "its easy, here's 10 articles and 3 different blogs that when mushed together easily fix the issue, oh and most of it is out-of-date as a recent kernel change makes 50% of the commands invalid now" .

Joking aside, whilst for us following this thread will be fine, for anyone coming into this new, there needs to be a clear blog/guide somewhere saying what the issue is, listing exact resources and steps to fix it, etc.

Speaking for myself, i know if i hadn't found a spare bit of time to dive into it a few months back, posted on here/Reddit and got linked to this thread, that i wouldn't have a clue about this.

So by being more proactive about it, and having it as one article/guide, pinned in the forum, perhaps even included in the monthly news, will ensure that no member of the current or future community can miss this issue or its fix.

TexasUnraid · July 20, 2020

So there are no outstanding changes from beta 25 to be implemented in the next release? I have not been following the beta thread very closely.

ChatNoir · July 21, 2020

17 hours ago, boomam said:

It needs to be clearer than that - lets not encourage the usual 'linux/FOSS' views of "its easy, here's 10 articles and 3 different blogs that when mushed together easily fix the issue, oh and most of it is out-of-date as a recent kernel change makes 50% of the commands invalid now" .

Joking aside, whilst for us following this thread will be fine, for anyone coming into this new, there needs to be a clear blog/guide somewhere saying what the issue is, listing exact resources and steps to fix it, etc.

Speaking for myself, i know if i hadn't found a spare bit of time to dive into it a few months back, posted on here/Reddit and got linked to this thread, that i wouldn't have a clue about this.

So by being more proactive about it, and having it as one article/guide, pinned in the forum, perhaps even included in the monthly news, will ensure that no member of the current or future community can miss this issue or its fix.

We can even hope for a nice Spaceinvaderone video on this ?

Along side the guide it would be great.

TexasUnraid · July 21, 2020

Yes, Agreed on a proper guild being really handy for this even though I can figure it out the hard way.

Well, I tried to use the symlink for a UD device but ran into an issue. I put the symlinks on the cache drive pointing towards the UD drive. This works fine for everything except that unraid does not see these as valid shares and thus does not let me edit the settings for them.

So I have no idea how it will treat these shares and if it will break docker / appdata since they will not be set to cache only?

limetech · July 21, 2020

9 minutes ago, TexasUnraid said:

Yes, Agreed on a proper guild being really handy for this even though I can figure it out the hard way.

Well, I tried to use the symlink for a UD device but ran into an issue. I put the symlinks on the cache drive pointing towards the UD drive. This works fine for everything except that unraid does not see these as valid shares and thus does not let me edit the settings for them.

So I have no idea how it will treat these shares and if it will break docker / appdata since they will not be set to cache only?

With due respect, please don't talk about hacks in this topic.

TexasUnraid · July 21, 2020

Sure thing, although with all due respect, until a few weeks ago when this was officially acknowledged, this entire topic was nothing but "hacks" to work around the issue. 😉

So if hacks are not allowed to be discussed, what is an official option to fix the issue? I really would love one, I really hate going outside officially supported channels.

Thus far the only official option I have heard is wait for 6.9 which is ? months away for an RC and 6+ months away from official release?

6.9 does sound like the fix, I am just uncomfortable using betas on an active server, I will always question if an issue if due to the beta or something else. An RC is not ideal but I would consider it.

Dephcon · July 21, 2020

38 minutes ago, TexasUnraid said:

Sure thing, although with all due respect, until a few weeks ago when this was officially acknowledged, this entire topic was nothing but "hacks" to work around the issue. 😉

So if hacks are not allowed to be discussed, what is an official option to fix the issue? I really would love one, I really hate going outside officially supported channels.

Thus far the only official option I have heard is wait for 6.9 which is ? months away for an RC and 6+ months away from official release?

6.9 does sound like the fix, I am just uncomfortable using betas on an active server, I will always question if an issue if due to the beta or something else. An RC is not ideal but I would consider it.

applying space_cache=v2 to your btrfs mounts makes a significant difference in writes, I got a reduction of about 65%, and you can do it live on whatever version you're currently on.

on another note - I did end up installing the beta, bailed on my raid10 btrfs cache and now have three pool devices:

cache pool(xfs):

regular share caching
downloads
docker-xfs.img
plex transcoding

xfs pool:

caching for only /usr/mnt/appdata

btrfs pool(single disk):

nothing currently

I'm going to give it another 5 days with appdata on the XFS pool, then move it to the BTRFS pool for a week, then add a second disk to BTRFS pool and run that for a week. With transcoding removed from the equation it should only be IO from normal container operations, so it should be a pretty fair comparison.

Edited July 21, 2020 by Dephcon

TexasUnraid · July 21, 2020

Yeah, I am thinking I will just use the space_cache=v2 and move things over to the cache for now and see what kind of writes I get.

If they are tolerable then I will wait for 6.9 RC. If they are still too high I will consider the beta. The multiple cache pools would be really handy for me as well.

Keep us posted on how things go and if you notice any bugs with the beta 🙂

Edited July 21, 2020 by TexasUnraid

Dephcon · July 22, 2020

18 hours ago, TexasUnraid said:

Yeah, I am thinking I will just use the space_cache=v2 and move things over to the cache for now and see what kind of writes I get.

If they are tolerable then I will wait for 6.9 RC. If they are still too high I will consider the beta. The multiple cache pools would be really handy for me as well.

Keep us posted on how things go and if you notice any bugs with the beta 🙂

Just to give you some additional info based on my friend's use case who had pretty much identical cache load to me on 6.8.3:

2MB/s brfs cache, btrfs docker.img

650kB/s brfs cache (w/space_cache=v2), btrfs docker.img

250kB/s xfs cache, btrfs docker.img

So if you're not using or don't need BTRFS RAID, re-formatting your cache disk to XFS makes a huge difference. That's a change 60TB/yr to 7.5TB/yr.

Edited July 22, 2020 by Dephcon

StevenD · July 22, 2020

Upgrading to 6.9.0-beta25, and wiping and rebuilding cache seems to have fixed the excessive drive writes. I updated at 1PM yesterday.

Thanks @limetech

Sun Jul 19 00:00:01 CDT 2020	52,349,318 [26.8 TB]
Sun Jul 19 01:00:01 CDT 2020	52,423,388 [26.8 TB]
Sun Jul 19 02:00:01 CDT 2020	52,489,648 [26.8 TB]
Sun Jul 19 03:00:01 CDT 2020	52,555,542 [26.9 TB]
Sun Jul 19 04:00:01 CDT 2020	52,620,891 [26.9 TB]
Sun Jul 19 05:00:02 CDT 2020	52,704,944 [26.9 TB]
Sun Jul 19 06:00:02 CDT 2020	52,781,371 [27.0 TB]
Sun Jul 19 07:00:01 CDT 2020	52,857,676 [27.0 TB]
Sun Jul 19 08:00:01 CDT 2020	52,969,998 [27.1 TB]
Sun Jul 19 09:00:01 CDT 2020	53,060,428 [27.1 TB]
Sun Jul 19 10:00:02 CDT 2020	53,143,267 [27.2 TB]
Sun Jul 19 11:00:01 CDT 2020	53,226,597 [27.2 TB]
Sun Jul 19 12:00:01 CDT 2020	53,302,735 [27.2 TB]
Sun Jul 19 13:00:02 CDT 2020	53,370,136 [27.3 TB]
Sun Jul 19 14:00:01 CDT 2020	53,497,045 [27.3 TB]
Sun Jul 19 15:00:01 CDT 2020	53,570,280 [27.4 TB]
Sun Jul 19 16:00:02 CDT 2020	53,660,287 [27.4 TB]
Sun Jul 19 17:00:01 CDT 2020	53,757,767 [27.5 TB]
Sun Jul 19 18:00:01 CDT 2020	53,843,113 [27.5 TB]
Sun Jul 19 19:00:01 CDT 2020	54,494,403 [27.9 TB]
Sun Jul 19 20:00:01 CDT 2020	54,591,716 [27.9 TB]
Sun Jul 19 21:00:01 CDT 2020	54,684,939 [27.9 TB]
Sun Jul 19 22:00:01 CDT 2020	54,769,497 [28.0 TB]
Sun Jul 19 23:00:01 CDT 2020	54,881,700 [28.0 TB]
Mon Jul 20 00:00:01 CDT 2020	54,962,156 [28.1 TB]
Mon Jul 20 01:00:01 CDT 2020	55,012,101 [28.1 TB]
Mon Jul 20 02:00:01 CDT 2020	55,114,507 [28.2 TB]
Mon Jul 20 03:00:01 CDT 2020	55,199,643 [28.2 TB]
Mon Jul 20 04:00:01 CDT 2020	55,285,523 [28.3 TB]
Mon Jul 20 05:00:01 CDT 2020	55,390,072 [28.3 TB]
Mon Jul 20 06:00:01 CDT 2020	55,492,177 [28.4 TB]
Mon Jul 20 07:00:01 CDT 2020	55,562,868 [28.4 TB]
Mon Jul 20 08:00:01 CDT 2020	55,641,502 [28.4 TB]
Mon Jul 20 09:00:01 CDT 2020	55,709,571 [28.5 TB]
Mon Jul 20 10:00:01 CDT 2020	55,778,340 [28.5 TB]
Mon Jul 20 11:00:01 CDT 2020	55,855,175 [28.5 TB]
Mon Jul 20 12:00:01 CDT 2020	55,937,448 [28.6 TB]
Mon Jul 20 13:00:01 CDT 2020	56,014,597 [28.6 TB]
Mon Jul 20 14:00:01 CDT 2020	56,092,328 [28.7 TB]
Mon Jul 20 15:00:01 CDT 2020	56,156,565 [28.7 TB]
Mon Jul 20 17:00:01 CDT 2020	56,273,142 [28.8 TB]
Mon Jul 20 18:00:01 CDT 2020	56,344,795 [28.8 TB]
Mon Jul 20 19:00:01 CDT 2020	56,364,160 [28.8 TB]
Mon Jul 20 20:00:01 CDT 2020	56,407,275 [28.8 TB]
Mon Jul 20 21:00:01 CDT 2020	56,447,405 [28.9 TB]
Mon Jul 20 22:00:01 CDT 2020	56,471,394 [28.9 TB]
Mon Jul 20 23:00:02 CDT 2020	56,544,547 [28.9 TB]
Tue Jul 21 00:00:01 CDT 2020	56,558,841 [28.9 TB]
Tue Jul 21 01:00:01 CDT 2020	56,572,818 [28.9 TB]
Tue Jul 21 02:00:01 CDT 2020	56,588,893 [28.9 TB]
Tue Jul 21 03:00:01 CDT 2020	56,619,137 [28.9 TB]
Tue Jul 21 04:00:01 CDT 2020	56,649,114 [29.0 TB]
Tue Jul 21 05:00:01 CDT 2020	56,694,088 [29.0 TB]
Tue Jul 21 06:00:01 CDT 2020	56,734,883 [29.0 TB]
Tue Jul 21 07:00:01 CDT 2020	56,740,772 [29.0 TB]
Tue Jul 21 08:00:01 CDT 2020	56,764,329 [29.0 TB]
Tue Jul 21 09:00:01 CDT 2020	56,791,261 [29.0 TB]
Tue Jul 21 10:00:01 CDT 2020	57,390,492 [29.3 TB]
Tue Jul 21 11:00:02 CDT 2020	57,481,471 [29.4 TB]
Tue Jul 21 12:00:01 CDT 2020	57,522,137 [29.4 TB]
	
Tue Jul 21 14:00:01 CDT 2020	58,216,955 [29.8 TB]
Tue Jul 21 15:00:01 CDT 2020	58,222,173 [29.8 TB]
Tue Jul 21 16:00:01 CDT 2020	58,235,354 [29.8 TB]
Tue Jul 21 17:00:01 CDT 2020	58,270,523 [29.8 TB]
Tue Jul 21 18:00:01 CDT 2020	58,300,798 [29.8 TB]
Tue Jul 21 19:00:01 CDT 2020	58,346,858 [29.8 TB]
Tue Jul 21 20:00:01 CDT 2020	58,382,861 [29.8 TB]
Tue Jul 21 21:00:01 CDT 2020	58,403,922 [29.9 TB]
Tue Jul 21 22:00:01 CDT 2020	58,420,439 [29.9 TB]
Tue Jul 21 23:00:01 CDT 2020	58,493,227 [29.9 TB]
Wed Jul 22 00:00:02 CDT 2020	58,494,926 [29.9 TB]
Wed Jul 22 01:00:01 CDT 2020	58,529,097 [29.9 TB]
Wed Jul 22 02:00:01 CDT 2020	58,556,746 [29.9 TB]
Wed Jul 22 03:00:01 CDT 2020	58,574,415 [29.9 TB]
Wed Jul 22 04:00:01 CDT 2020	58,605,297 [30.0 TB]
Wed Jul 22 05:00:01 CDT 2020	58,632,079 [30.0 TB]
Wed Jul 22 06:00:01 CDT 2020	58,655,069 [30.0 TB]
Wed Jul 22 07:00:01 CDT 2020	58,672,137 [30.0 TB]
Wed Jul 22 08:00:01 CDT 2020	58,689,196 [30.0 TB]
Wed Jul 22 09:00:01 CDT 2020	58,712,601 [30.0 TB]
Wed Jul 22 10:00:01 CDT 2020	58,731,743 [30.0 TB]

TexasUnraid · July 22, 2020

1 hour ago, Dephcon said:

Just to give you some additional info based on my friend's use case who had pretty much identical cache load to me on 6.8.3:

2MB/s brfs cache, btrfs docker.img

650kB/s brfs cache (w/space_cache=v2), btrfs docker.img

250kB/s xfs cache, btrfs docker.img

So if you're not using or don't need BTRFS RAID, re-formatting your cache disk to XFS makes a huge difference. That's a change 60TB/yr to 7.5TB/yr.

I have 5x 128gb laptop SSD's I was given in a raid5, so kinda need btrfs lol. This is why I had the extra SSD in the array formatted as XFS.

I put docker on the cache this morning, waiting a few hours to see what kind of writes I end up with. Really considering just updating to the beta and creating a 2nd cache to be done with this.

Wavey · July 22, 2020

@StevenDWhat is that output you are using to monitor the amount of writes?

StevenD · July 22, 2020

27 minutes ago, Wavey said:

@StevenDWhat is that output you are using to monitor the amount of writes?

There is probably a better way, but I just have a script run at the top of every hour

date >> /mnt/cache/cache1.txt
smartctl -a -d nvme /dev/nvme0n1 | grep "Units Written" >> /mnt/cache/cache1.txt
date >> /mnt/cache/cache2.txt
smartctl -a -d nvme /dev/nvme1n1 | grep "Units Written" >> /mnt/cache/cache2.txt

Wavey · July 22, 2020

Thanks! I'll give it a try.

JorgeB · July 24, 2020

Upgraded one of my main servers to b25 and re-formatted the cache device with the new alignment and consider this issue fixed for me, I have a single btrfs cache device with 3 Windows VMs always running, as well as the docker image and appdata:

v6.8 was writing about 3TB a day

v6.8 with space cache v2 brought it down to a more reasonable 700GB a day

v6.9-beta25 with the new alignment brought it down even further to 191.87GB in the last 24 hours

While 192GB a day is still considerable and I know that if I went with xfs it would less, as the previously linked study found I believe we must accept btrfs will always have higher write amplification due to being a COW filesystem, and while I don't need a pool for this I rely on btrfs snapshots and send/receive for my backups, so I can live with these daily writes, just couldn't with 3TB a day, that was just crazy.

Dephcon · July 24, 2020

5 hours ago, johnnie.black said:

v6.8 was writing about 3TB a day

v6.8 with space cache v2 brought it down to a more reasonable 700GB a day

v6.9-beta25 with the new alignment brought it down even further to 191.87GB in the last 24 hours

Is the alignment issue something regarding NVMe? I recall something about that, but I only have SATA so I've skimmed over most of it.

*Edit* Before you answer, I noticed my OG Cache pool is 4K aligned and my new pool devices are 1M aligned so i guess it's for all SSDs?

*edit2* that's a 93% decrease in writes! I'm still testing with XFS but I'd much rather go back to BTRFS RAID10 or a pair of BTRFS RAID1 for protection, assuming it's not a massive difference from XFS.

Edited July 24, 2020 by Dephcon

JorgeB · July 24, 2020

23 minutes ago, Dephcon said:

Is the alignment issue something regarding NVMe?

All SSDs, my cache is NVMe but earlier I tested on a regular SSD and the difference was similar, though it can vary with brand/model.

JorgeB · July 24, 2020

25 minutes ago, Dephcon said:

I'm still testing with XFS

Based on some quick earlier tests xfs would still write much less, I would estimate at least 5 times less in my case, still I can live with 190GB instead of 30/40GB a day so I can have checksums and snapshots.

Gragorg · July 24, 2020

If I install 6.9 beta 25 and change my cache to the 1 MiB Partition Alignment is it still possible to roll back to 6.8.3 without changing the new alignment?

JorgeB · July 24, 2020

12 minutes ago, Gragorg said:

I install 6.9 beta 25 and change my cache to the 1 MiB Partition Alignment is it still possible to roll back to 6.8.3 without changing the new alignment?

Unfortunately no, you'd need to re-format.

Dephcon · July 24, 2020

17 minutes ago, johnnie.black said:

Based on some quick earlier tests xfs would still write much less, I would estimate at least 5 times less in my case, still I can live with 190GB instead of 30/40GB a day so I can have checksums and snapshots.

damn that's still pretty significant.

I'm really torn on this whole issue. I'm super butt-hurt over how much wear I've been putting on cache SSDs over the years and want to limit it as much as possible, but I also would prefer to not to ever restore my pool devices from backup, reconfigure containers, etc.

Edited July 24, 2020 by Dephcon

JorgeB · July 24, 2020

1 minute ago, Dephcon said:

damn that's still pretty significant.

It is, but like the previously linked study found, some write amplification is unavoidable:

imagem.png.d367586d792e53b74f7b191ab7c61ab0.png

As long as it's not ridiculous like before I'm fine with it, but anyone that doesn't need a pool or the other btrfs features might as well stick with xfs.

Dephcon · July 24, 2020

Just switched appdata from xfs to a single disk btrfs, it's about 2x the writes:

ignore the "avg" it's btrfs heavy as it starts with all my containers starting back up. if i exclude the container boot-up, until now it's AVG is ~121kB/s and my 2hr average on XFS before the cut-over was 49kB/s. So that's a 2.5x difference.

[6.8.3] docker image huge amount of unnecessary writes on cache

User Feedback

Recommended Comments

testdasi 500

Link to comment

JorgeB 7473

Link to comment

boomam 15

Link to comment

TexasUnraid 113

Link to comment

ChatNoir 738

Link to comment

TexasUnraid 113

Link to comment

limetech 3326

Link to comment

TexasUnraid 113

Link to comment

Dephcon 20

Link to comment

TexasUnraid 113

Link to comment

Dephcon 20

Link to comment

StevenD 88

Link to comment

TexasUnraid 113

Link to comment

Wavey 2

Link to comment

StevenD 88

Link to comment

Wavey 2

Link to comment

JorgeB 7473

Link to comment

Dephcon 20

Link to comment

JorgeB 7473

Link to comment

JorgeB 7473

Link to comment

Gragorg 39

Link to comment

JorgeB 7473

Link to comment

Dephcon 20

Link to comment

JorgeB 7473

Link to comment

Dephcon 20

Link to comment

Join the conversation