[6.8.3] docker image huge amount of unnecessary writes on cache

Lucanus · June 10, 2020

Can confirm that the Binhex Plexpass Docker drastically reduces writes in contrast to the official Docker.

Am down from a bit more than 16GB (official Plex + Pihole) an hour to around 2,5GB an hour (Binhex Plexpass + Pihole).

Am now thinking about running the Pihole on my actual RaspberryPi again^^

Of course, how much the Pihole Docker writes is dependent on the amount of DNS queries at that moment, so take these numbers with a grain of salt. They are however obtained with

iotop -ao

over one hour of time, so it should not be too much of a factor.

Edited June 10, 2020 by Lucanus
Added information

S1dney · June 10, 2020

Hey All,

It is very much appreciated to update your findings, but like many have said here, this bug simply amplifies writes and it affects BTRFS not XFS, so reformatting to XFS or putting the loopdevice out of the equation fixes this bug, but in case of XFS you'll lose redundancy and modifying shell scripts is obviously unsupported.

So please post containers that write like crazy in the accompanying threads for that specific container, also note that these writes are most likely amplified by a factor x, so the real number of writes might not be that bad.

Cheers.

Edited June 10, 2020 by S1dney

bastl · June 10, 2020

And again the hint for all readers. Try the following: dissable all Dockers and keep docker itself active and the writes will still happen.

It's not an issue of one specific docker!

Each docker more or less produces writes to the disk, thats normal. The issue is that docker itself in combination with btrfs produces constant writes for no reasons.

albertogomcas · June 10, 2020

On 6/9/2020 at 12:01 PM, johnnie.black said:

No, you have 89% left.

Which is not great either, at that pace that drive is going to bite the dust in less than a year...

Moz80 · June 11, 2020

8 hours ago, albertogomcas said:

Which is not great either, at that pace that drive is going to bite the dust in less than a year...

Yeah, not great, but considerable relief that it wasn't worse news!

Do we know if this problem is LIKELY to have a solution, to continue using BTRFS, or is reformatting the cache drive to XFS really the best option at this point?

goodGame · June 11, 2020

1 hour ago, Moz80 said:

Yeah, not great, but considerable relief that it wasn't worse news!

Do we know if this problem is LIKELY to have a solution, to continue using BTRFS, or is reformatting the cache drive to XFS really the best option at this point?

We do not, I wish that Lime could say if it's fixed for 6.9 or not. At this point the biggest issue for me isn't if it will get fixed or not. It's that I have no clue if it'll be fixed next month in a hot patch or in 6.9 half a year away, or not at all by diverging from btrfs or whatever. That way I could give myself a remote chance of planning my approach.

But it is what it is so I'll go the xfs route, worst case I could just switch back later down the road I guess.

hotio · June 11, 2020

crazy thought, just throwing it out there, what if the combo BTRFS (COW) + sqlite (most our databases) is causing the write amplification...for appdata COW is iirc not disabled....

And truth be told, I'm still seeing a constant write pattern on a non-unraid system, with docker data on a btrfs ssd

JorgeB · June 11, 2020

Last unofficial information I have is that this issue is hopefully fixed and the fix will be in v6.9-rc1 which is expected soon™, no idea if a patch/update for v6.8 will be available, but very much doubt that as LT doesn't usually patch older releases after a newer one is in active development, maybe if it was a critical security issue.

Also, and completely by accident I might have found a "fix" for current users, at least it's working for be and it's repeatable, this is iotop after 5 minutes with just the plex docker installed (which appears to be one of the most active):

dawb.PNG.e632674ac1c5ea008e1260ce4a2d445d.PNG

The "fix" is using the mover to move the docker image to the array (docker service must be stopped first and share set to cache="yes") then back to the cache device (set share to cache="prefer"), start the docker service back up and this is the same docker image running for 5 minutes after the move:

image.png.af89665b0290d523d2a4e1cfd68b56d6.png

So a 10x decrease in write activity, please try and let me known if it helps.

hotio · June 11, 2020

probably did that 10x times, didn't work for me at all at the time

JorgeB · June 11, 2020

2 hours ago, hotio said:

probably did that 10x times, didn't work for me at all at the time

Maybe it doesn't work with all dockers, like mentioned it's repeatable and works consistently for me, but I only had the single plex docker installed.

-Daedalus · June 11, 2020

5 hours ago, johnnie.black said:

Last unofficial information I have is that this issue is hopefully fixed and the fix will be in v6.9-rc1 which is expected soon™, no idea if a patch/update for v6.8 will be available, but very much doubt that as LT doesn't usually patch older releases after a newer one is in active development, maybe if it was a critical security issue.

Thanks for the info, and for that work-around. I'm already back to an XFS cache, and spent a couple of days setting up backups and the like, so not really bothered moving back to BTRFS at this point, but it's wonderful if this works for more people.

However, we shouldn't have to hear this from you. I'm sure Limetech are working on this, I'm sure there's some sort of fix coming at some point, but radio silence for something this severe really shouldn't be the norm, especially if Limetech is shooting for a more official, polished vibe, as a company.

Event something simple, like:

Quote

Hi everyone, we realise this is an issue for a lot of you, and we're very sorry it's happened. We're not entirely sure where the bug originated at this point, but rest assured we are working on a fix for the next release. This should enter RC phase within the next few weeks, but obviously this is subject to change. We won't be porting this fix to 6.8.x, as too many things have changed already. We advise anyone affected to this to switch to XFS and take appropriate measures regarding backups, or at the very least to be aware of the increased wear of SSDs due to extra writes and plan accordingly.

This actually tells us very little, other than not to expect a patch for 6.8, and that the release is only "soon", but at least it's something reassuring. I'm usually not the guy to advocate being seen to be doing something, rather than actually just doing the thing, but in this case I think a little more communication would have been warranted.

S1dney · June 11, 2020

3 hours ago, johnnie.black said:

Maybe it doesn't work with all dockers, like mentioned it's repeatable and works consistently for me, but I only had the single plex docker installed.

Thanks for your work @johnnie.black, it would feels strange though that moving the image would work for one docker container, but doesn't for the other?

Any ideas what changed after the mover kicked in?

50 minutes ago, -Daedalus said:

However, we shouldn't have to hear this from you. I'm sure Limetech are working on this, I'm sure there's some sort of fix coming at some point, but radio silence for something this severe really shouldn't be the norm, especially if Limetech is shooting for a more official, polished vibe, as a company.

I think Johnnie is very closely affiliated to Limetech so I'm sure this is coming from first hand. Do remember that the fact that they are not actively responding does not mean they're not working on it.

Although I do kind of agree a bit with you, not hearing anything tends to let users believe that no work is being done. It might be a quick win for the dev's to give some info once in a while, I have read some ideas somewhere on the forum to create some kind of blog (or a newsletter by mail or something) about "What we're doing" , which sounded appealing to me.

Nevertheless you have to acknowledge the fact that they do a lot and we all profit from it.

Keep up the good work guys 🍻

-Daedalus · June 11, 2020

3 minutes ago, S1dney said:

I think Johnnie is very closely affiliated to Limetech so I'm sure this is coming from first hand. Do remember that the fact that they are not actively responding does not mean they're not working on it.

Although I do kind of agree a bit with you, not hearing anything tends to let users believe that no work is being done. It might be a quick win for the dev's to give some info once in a while, I have read some ideas somewhere on the forum to create some kind of blog (or a newsletter by mail or something) about "What we're doing" , which sounded appealing to me.

Nevertheless you have to acknowledge the fact that they do a lot and we all profit from it.

Keep up the good work guys 🍻

Absolutely. I in no way meant for that to come across as complaining (it may have, I apologise), more "passionate suggestion", shall we say. If anyone from the dev team ever decides to visit Ireland, I'll happily buy them a round. 🍻

S1dney · June 11, 2020

3 minutes ago, -Daedalus said:

Absolutely. I in no way meant for that to come across as complaining (it may have, I apologise), more "passionate suggestion", shall we say. If anyone from the dev team ever decides to visit Ireland, I'll happily buy them a round. 🍻

Hahaha BIG like! Cheers.

goodGame · June 11, 2020

7 hours ago, johnnie.black said:

Last unofficial information I have is that this issue is hopefully fixed and the fix will be in v6.9-rc1 which is expected soon™, no idea if a patch/update for v6.8 will be available, but very much doubt that as LT doesn't usually patch older releases after a newer one is in active development, maybe if it was a critical security issue.

Thanks for the update, appreciated

JorgeB · June 11, 2020

49 minutes ago, S1dney said:

it would feels strange though that moving the image would work for one docker container, but doesn't for the other?

I only tested with one, hence why I asked for other users to test.

Also note that like mentioned my information is unofficial, it's not been confirmed by LT, and it might be wrong, or possibly not work for everyone or every situation, and if it is wrong you can only blame me.

goodGame · June 11, 2020

1 minute ago, johnnie.black said:

Also note that like mentioned my information is unofficial, it's not been confirmed by LT, and it might be wrong, or possibly not work for everyone or every situation, and if it is wrong you can only blame me.

At this point, I'll take a vague rumor over an empty void!

TexasUnraid · June 13, 2020

Just going to say I am also having these excessive writes. All dockers are writing way more then they should but some are really out of control with a gig written in 15 minutes from a single docker (netdata) and others not far behind.

Then some only have a few MB of writes, more then I would expect but not enough to worry about. I systemically went through and tested every docker individually.

Also I noticed that "btrfs-transacti" writes quite a bit as well, I was seeing almost 300MB/hr from that alone. So 2.5TB a year just from that, not nearly as bad as the docker issue but still quite excessive IMHO.

In my testing I decided to try increasing the vm.dirty_expire_centisecs to see if maybe the writes would overwrite themselves in ram before going to the SSD.

My hunch was correct, with most dockers increasing this to 5 minutes dropped the writes by a factor of 8x-10x. Although some were only minorly effected.

Now obviously this is longer then I would run daily so I tried different lengths and ~2 minutes seems to of been about the sweet spot (still saw reduction increasing further but not real comfortable going much loner), cutting writes drastically for most dockers while being acceptable risk short term until a real fix is found.

Not recommending anyone set it that long, but here is a little script I made up to make changing this easy and automating for testing. I have it set to 12000 myself at the moment to see how that goes.

#!/bin/bash
#description=This script allows changing the time data sits in ram before being written to disk. This can be useful if you have a lot of small writes to the same file as it will overwrite in ram and only write to disk once the time is up. Set the desired time in ms either in the argument box if running directly or after the "argumentDefault" below if scheduling it. 3000 = 30 seconds and is the default value. Careful setting this too long as anything still in ram if the power is lost will be lost forever.
#argumentDescription= set time in ms here, 3000 = 30 seconds
#argumentDefault=6000


echo
echo Below is the old config
echo

sysctl -a | grep dirty

echo
echo

grep -q 'vm.dirty_expire_centisecs.*' /etc/sysctl.conf && sed -i "s/vm.dirty_expire_centisecs.*/vm.dirty_expire_centisecs = "$1"/g" /etc/sysctl.conf || echo "vm.dirty_expire_centisecs = "$1"" >> /etc/sysctl.conf
                
sysctl -p

echo
echo Below is the new config
echo

sysctl -a | grep dirty

Any timeline for an official fix for this? I need a cache pool as I only have 128gb SSD's and they are not large enough individually for the files on the cache. Only option I am coming up with right now is tossing an old 2.5" HDD into the array and putting the docker image on there till a fix is released.

Edited June 13, 2020 by TexasUnraid

TexasUnraid · June 13, 2020

Decided to try the suggestion above to use mover to move the docker to array and back again.

I moved it to the array and decided to try starting it up to see what would happen as I have a BTRFS encrypted array at the moment.

Even with all the "bad" dockers running that would of had me well over 1gb in 10 minutes in writes, I only have 50mb of writes in 10 mins and I also reverted the above dirty memory tweak. Although this is on loop3 instead of loop2 but I assume that is for the array.

btrfs-transacti is now writing more then the loop3 though, over 200MB in 10 mins vs 300mb in an hour I saw before.

Although I am now getting this strange message in the docker settings that was not there before, I just installed unraid a few days ago, so not sure how this could be true, guessing it was triggered for some other reason.

"Your existing Docker image file needs to be recreated due to an issue from an earlier beta of Unraid 6. Failure to do so may result in your docker image suffering corruption at a later time. Please do this NOW!"

Edited June 13, 2020 by TexasUnraid

tjb_altf4 · June 13, 2020

19 minutes ago, TexasUnraid said:

"Your existing Docker image file needs to be recreated due to an issue from an earlier beta of Unraid 6. Failure to do so may result in your docker image suffering corruption at a later time. Please do this NOW!"

I knew this message would start popping up for people...

TexasUnraid · June 13, 2020

34 minutes ago, tjb_altf4 said:

I knew this message would start popping up for people...

What is it?

The plot thickens.

I moved the docker back to the cache pool, I still get the message about the docker image needing to be rebuilt.

I started up all the dockers and am watching the writes, the first thing that stands out is that it still says loop3 instead of loop2? Maybe this is what the docker message is complaining about?

The writes also appears to be only marginally higher then when it was on the array drive, around 60MB in 10 mins.

This is offset though by btrfs-transacti now writing ~1.6GB per hour (extrapolated) vs the ~300MB I saw before.

It is actually pretty close to the numbers I saw before if you combine the btrfs-transacti and loop3, almost like some of the writes have moved from the "loop" to the btrfs-transacti.

Maybe this will give some incite to the root issue?

EDIT: I noticed that even with the docker moved to the array and the cache doing exactly nothing (VM's disabled, cache is not setup for anything ATM, etc) I was still seeing writes spike cache. So the btrfs-transacti must be writing to the cache still, even though nothing at all is happening on it. That is concerning.

Turning in for the night.

Tomorrow gonna try installing another array drive and formatting as xfs and try moving it there.

Edited June 13, 2020 by TexasUnraid

Taddeusz · June 13, 2020

I’ve seen this message when my Docker.img file was in a share not set to nocow or did not have the nocow attribute set on it.

JorgeB · June 13, 2020

5 hours ago, TexasUnraid said:

Even with all the "bad" dockers running that would of had me well over 1gb in 10 minutes in writes, I only have 50mb of writes in 10 mins

That's encouraging and similar to my findings, and I was hoping it would work for everyone else before @hotio's comments, but it still might work for most, or everyone in case he was mistaken.

5 hours ago, TexasUnraid said:

Although this is on loop3 instead of loop2 but I assume that is for the array.

The loop# used can change and it's not important, just make sure it's the one where the docker image is mounted.

5 hours ago, TexasUnraid said:

Although I am now getting this strange message in the docker settings that was not there before, I just installed unraid a few days ago, so not sure how this could be true, guessing it was triggered for some other reason.

This will happen every time you move the docker image to a btrfs filesystem/share with COW set to "auto" (enable), it can be ignored for this, you could recreate the image to get rid of the warning but then it would resume the amplified writing.

TexasUnraid · June 13, 2020

6 hours ago, johnnie.black said:

That's encouraging and similar to my findings, and I was hoping it would work for everyone else before @hotio's comments, but it still might work for most, or everyone in case he was mistaken.

The loop# used can change and it's not important, just make sure it's the one where the docker image is mounted.

This will happen every time you move the docker image to a btrfs filesystem/share with COW set to "auto" (enable), it can be ignored for this, you could recreate the image to get rid of the warning but then it would resume the amplified writing.

Ok, that explains the loop and docker message.

Although sadly I don't know that moving it to another drive is working, at least when that other drive is btrfs. While the loop writes dropped, the "btrfs-transacti" writes increased by almost exactly the same amount. So the total writes are only minimally lower now.

Watching the writes again this morning and sure enough, the loop writes are much lower (I set the dirty to 5 mins before bed to help reduce writes) the btrfs-transacti is still going crazy.

Over 150mb in 10 mins. 8tb a year just for that, pretty extreme for a drive literally doing nothing at all.

Next step is to install an XFS formatted drive and try moving it there.

Edited June 13, 2020 by TexasUnraid

TexasUnraid · June 13, 2020

So before adding another drive I restarted the server and now loop 2 is back but so are the writes.

Although btrfs-transacti writes are down by a proportional amount. I am convinced that these are connected somehow.

Total writes are lower then before playing musical docker image but still around ~350mb in 10 mins or 18TB a year. It is just now split 50/50 between loop 2 and btrfs-transacti.

The docker is still on the array drive and my cache is setup to do nothing at all yet I am still getting writes to the cache every ~30 seconds, it should not have any writes at all.

I tried increasing the dirty writeback time and sure enough, the writes mostly fell in line with the new time.

This issue is obviously deeper then just the docker image.

Next step, XFS formatted array drive. It would be SO cool to have multiple cache pools right now.

[6.8.3] docker image huge amount of unnecessary writes on cache

User Feedback

Recommended Comments

Lucanus 0

Link to comment

S1dney 49

Link to comment

bastl 208

Link to comment

albertogomcas 0

Link to comment

Moz80 0

Link to comment

goodGame 3

Link to comment

hotio 26

Link to comment

JorgeB 7519

Link to comment

hotio 26

Link to comment

JorgeB 7519

Link to comment

-Daedalus 73

Link to comment

S1dney 49

Link to comment

-Daedalus 73

Link to comment

S1dney 49

Link to comment

goodGame 3

Link to comment

JorgeB 7519

Link to comment

goodGame 3

Link to comment

TexasUnraid 113

Link to comment

TexasUnraid 113

Link to comment

tjb_altf4 397

Link to comment

TexasUnraid 113

Link to comment

Taddeusz 104

Link to comment

JorgeB 7519

Link to comment

TexasUnraid 113

Link to comment

TexasUnraid 113

Link to comment

Join the conversation