[6.8.3] docker image huge amount of unnecessary writes on cache

TexasUnraid · June 16, 2020

Interesting, I will keep that in mind.

Although I was actually referring to docker in its entirety, like turning it off/on in the settings menu so that mapping it to a direct share could be done?

Basically trying to allow mapping the docker to a direct share without having to use the go file.

Although it appears all that would need to be done to revert the changes are remove the line from the go file / delete the folder from the flash drive. So not the end of the world, I just like having everything managed with a GUI. The insistence on terminal usage for even basic tasks in linux is what kept me from switching to it many years ago.

I have a very love/hate relationship with windows lol.

TexasUnraid · June 17, 2020

Ok, I added the SSD with LBA logging to the cache pool and have the docker image on an XFS drive and app data on the cache again.

The writes to the pool work out to around ~400mb/hour yet when the appdata was on an XFS drive it was a mere ~5mb/hour. Almost a 100X increase in writes seems pretty extreme and I can't explain it.

I think going with the direct mount docker is the best option at this point. Guess that is my next project.

chanrc · June 18, 2020

Anyone try out 6.90-beta22 yet? I'm assuming since we haven't heard anything from the LT guys thsi is still probably an issue.

JTok · June 18, 2020

12 hours ago, TexasUnraid said:

Ok, I added the SSD with LBA logging to the cache pool and have the docker image on an XFS drive and app data on the cache again.

The writes to the pool work out to around ~400mb/hour yet when the appdata was on an XFS drive it was a mere ~5mb/hour. Almost a 100X increase in writes seems pretty extreme and I can't explain it.

I think going with the direct mount docker is the best option at this point. Guess that is my next project.

For what it is worth, the direct mount did not actually fix it for me, just obfuscate it by making it so loop2/3 didn't show up in iotop. When I checked the SMART status I wound up with just as many writes as before, so I would be interested to hear your results.

It seems like I might be an outlier here, so possibly I have a different issue affecting my setup.

Edited June 18, 2020 by JTok

TexasUnraid · June 18, 2020

I am sure people are tired of my updates at this point, hopefully about done with not a lot to show for it lol.

Quick summery:

Docker and appdata on XFS array drive = not ideal but acceptable writes in the 200-300mb/hour range

Docker on cache and app data on XFS = ~5mb/hour writes for appdata and 1gig+/hour for docker and climbing over time

Docker on XFS and appdata on the cache = almost 100x the writes vs appdata on xfs at 400-500mb/hour

Docker and appdata on cache = 1gig+ an hour and climbing over time even with modest dockers doing basically nothing

I have now implemented S1dney workaround with a basic script I made up. I started out disabling docker and then copy the file and start docker back up but it seems that simply coping the file at array first start happens early enough that it does not need to stop docker first and it can start up normally.

Here is the script if anyone is intersted, I followed S1dney's write up in post #2 except I used /boot/config/plugins/user.scripts/scripts/Docker\ excessive\ write\ workaround/ to store the files on the flash drive so it is stored with the script and will be deleted with it as well when a fix is released.

#!/bin/bash
#description=This script changes docker from using an image mounted via a loop device to direct writing to BTRFS cache.

echo "stopping docker"
#/etc/rc.d/rc.docker stop
echo "Docker stopped"
echo 

echo Put the modified docker service file over the original one to make it not use the docker.img
cp -v /boot/config/plugins/user.scripts/scripts/Docker\ excessive\ write\ workaround/rc.docker /etc/rc.d/rc.docker
chmod +x /etc/rc.d/rc.docker


echo 
echo "starting docker" 
#/etc/rc.d/rc.docker start
echo "Docker started"

This is much simpler for me and reinstalling the dockers was way easier then expected, CA actually allows batch reinstalling dockers from the previous apps menu. I simply checked them all and hit install, boom, all back up and running. I am also leaving the stock docker image in place and it does work to simply disable this script and it reverts to stock on next boot.

I am not getting any issues with the dockers settings menu not showing up but the main menu is taking several seconds for unassigned devices to show up for some reason?

Except maiamdb, for some reason it does not autostart now. I have also not checked all the dockers to make sure they are working properly, most of them are not really doing anything yet since this server is still not "active" outside of UD sharing my old windows drives.

I am going to leave this on overnight and see how things progress with both docker and appdata on the cache.

Edited June 18, 2020 by TexasUnraid

italeffect · June 18, 2020

For those following this thread like I am - Limetech posted in the new 6.9 beta post that they are not aware of this issue?

Perhaps someone more skilled than I can provide a TLDR on the issue and what has been worked out so far in that thread.

grigsby · June 18, 2020

1 hour ago, italeffect said:

Limetech posted in the new 6.9 beta post that they are not aware of this issue?

😳

JorgeB · June 18, 2020

4 hours ago, italeffect said:

Limetech posted in the new 6.9 beta post that they are not aware of this issue?

Well, that's disappointing, but basically the information I had was from someone using the latest betas saying that the writes to the docker image for him had decreased by multiple times, so I wrongly assumed the issue was fixed by LT, possibly just a result of other changes, still I can confirm that at least on my test server the information I got was correct:

Writes to the same docker image after 5 minutes, v6.9-beta1 vs v6.9-beta22, also note no increase on btrfs-transacti:

Now idea if there will be a difference to everyone, please try and post here.

JorgeB · June 18, 2020

Just to add that I did nothing else other than booting with the different betas, and if I go back to the old beta writes again increase massively, here are a couple of 30 second videos showing the real-time write difference:

TexasUnraid · June 18, 2020

Ok, left it overnight with the direct method writing to the cache along with appdata also on the cache.

Sadly it did not seem to change much from the normal setup, still getting 1gig+ per hour writes like this. guessing this is more a fix for particular problem dockers? It was worth a shot though.

Seems the only real option at this point is a sacrificial 2.5" HDD formatted as XFS in the array. Multiple cache pools would be amazing right now. Guess I will find out how long a 2.5" drive can last with constant writes.

Edited June 18, 2020 by TexasUnraid

TexasUnraid · June 18, 2020

5 hours ago, johnnie.black said:

Well, that's disappointing, but basically the information I had was from someone using the latest betas saying that the writes to the docker image for him had decreased by multiple times, so I wrongly assumed the issue was fixed by LT, possibly just a result of other changes, still I can confirm that at least on my test server the information I got was correct:

Writes to the same docker image after 5 minutes, v6.9-beta1 vs v6.9-beta22, also note no increase on btrfs-transacti:

Now idea if there will be a difference to everyone, please try and post here.

Interesting, how can I test out beta 22 without messing up my current install / and be able to easily revert to my current install?

JorgeB · June 18, 2020

3 minutes ago, TexasUnraid said:

Interesting, how can I test out beta 22 without messing up my current install / and be able to easily revert to my current install?

You can easily revert back to the previous release, manually or using the GUI (if the update was done using the GUI):

Niklas · June 18, 2020

To revert from beta22 you have to do some manual configuration too. Read the release notes for beta22 before trying it out.

TexasUnraid · June 18, 2020

8 minutes ago, Niklas said:

To revert from beta22 you have to do some manual configuration too. Read the release notes for beta22 before trying it out.

I read the notes, I didn't see the manual config that was needed? The only thing I noticed was if you use the multiple cache pools options the config would be lost but does this apply if only using 1 pool?

I have jacked around with this so much, I am kind of accepting I will need to wipe unraid and start fresh before actually going live with this server. I have had too many bad experiences from a slight issue at setup causing big issues down the road in the past on windows, I am a bit paranoid now lol.

Edited June 18, 2020 by TexasUnraid

JorgeB · June 18, 2020

3 minutes ago, TexasUnraid said:

The only thing I noticed was if you use the multiple cache pools options the config would be lost but does this apply if only using 1 pool?

You'll need to reassign the cache devices.

TexasUnraid · June 18, 2020

8 minutes ago, johnnie.black said:

You'll need to reassign the cache devices.

Ok, thats not a big deal, does it matter if they are back in the same order? Or just that the correct devices are in the same pool? For some reason my drive letters have been changing on reboots and the models are the same for my cache pool.

JorgeB · June 18, 2020

4 minutes ago, TexasUnraid said:

Or just that the correct devices are in the same pool?

This.

TexasUnraid · June 18, 2020

5 minutes ago, johnnie.black said:

This.

Thanks, that was a question I had for some time.

Doing a backup now and then will try updating to the beta to see how it goes.

I really like where this beta is heading, add in official snapshot support and read caching / tired storage (which could be as simple as tweaking mover to move recently accessed files to a cache pool honestly) and I can't think of any of major features that would be missing except direct VM snapshots like vmware.

Lignumaqua · June 18, 2020

There’s an elephant in this room which needs mentioning. We have a long thread marked urgent in the Bugs forum and Limetech state they have no knowledge of it? Were all the reports here a waste of time?

After repeated requests for official acknowledgment of the issue we got posts from insiders telling us not to worry and that Limetech had it in hand. Were those posts untrue?

As one of those who has had an SSD die extremely early with a huge number of writes that took it out of warranty, this is very, very disappointing. Please, please tell me that this was just a misunderstanding.

TexasUnraid · June 18, 2020

Ok, updated to the beta, sadly early indications are not promising.

After 10 mins writes would equal ~1 gig /hour, going to let it run for awhile so I can track actual LBA's written but seems unchanged from the stable version.

At least with multiple cache pools I could format a drive as XFS just for the docker but that is such a waste of a drive.

Strangely my CPU usage is higher then the stable version, 1 thread is consistently pegged.

Average CPU usage used to be ~5% on the stable version.

In the beta it is hovering in the 15%-20% range although it will settle down to ~5% for a second every now and then.

Might just be doing background stuff after the upgrade, going to see if it settles down over the next few hours.

Edited June 18, 2020 by TexasUnraid

T0rqueWr3nch · June 18, 2020

15 hours ago, chanrc said:

Anyone try out 6.90-beta22 yet? I'm assuming since we haven't heard anything from the LT guys thsi is still probably an issue.

I did this morning. While it's still very early, I think this may finally be fixed:

Screenshots here: https://forums.engineerworkshop.com/t/unraid-6-9-0-beta22-update-fixes-and-improvements/215

I am seeing a drop from ~8 MB/s to ~500 kB/s after upgrade with a similar server load (basically idle) and the same Docker containers running. Hopefully the trend holds.

-TorqueWrench

Edited June 18, 2020 by T0rqueWr3nch

TexasUnraid · June 18, 2020

Well, after a full hour, the LBA's have increased by a total of 1.5gig / hour on the beta but could just be first hour after boot up work going on since it is actually a bit worse then the stable version. Does not appear to be any better though, nothing like when it was on the XFS drive.

The CPU still spends ~70-80% of it's time with 1-2 threads pegged and 15-20% total CPU usage. I can actually see the higher power draw on my UPS reporting.

Going to leave it for a few more hours at least, more then likely revert things tomorrow. See how things progress.

edit: Another hour, another 1.5GB of writes. it somehow got worse with the beta it seems. Still high CPU usage as well.

Edited June 18, 2020 by TexasUnraid

mf808 · June 18, 2020

2 hours ago, Lignumaqua said:

There’s an elephant in this room which needs mentioning. We have a long thread marked urgent in the Bugs forum and Limetech state they have no knowledge of it? Were all the reports here a waste of time?

After repeated requests for official acknowledgment of the issue we got posts from insiders telling us not to worry and that Limetech had it in hand. Were those posts untrue?

As one of those who has had an SSD die extremely early with a huge number of writes that took it out of warranty, this is very, very disappointing. Please, please tell me that this was just a misunderstanding.

This

T0rqueWr3nch · June 18, 2020

57 minutes ago, TexasUnraid said:

Well, after a full hour, the LBA's have increased by a total of 1.5gig / hour on the beta but could just be first hour after boot up work going on since it is actually a bit worse then the stable version. Does not appear to be any better though, nothing like when it was on the XFS drive.

The CPU still spends ~70-80% of it's time with 1-2 threads pegged and 15-20% total CPU usage. I can actually see the higher power draw on my UPS reporting.

Going to leave it for a few more hours at least, more then likely revert things tomorrow. See how things progress.

edit: Another hour, another 1.5GB of writes. it somehow got worse with the beta it seems. Still high CPU usage as well.

Very strange. I had the exact opposite experience from the latest beta update to 6.9.0-beta22. My cache writes are way down to a much more reasonable ~500 kB/s and it's still holding from this morning.

It's weird that we have such discrepancies.

TexasUnraid · June 18, 2020

4 minutes ago, T0rqueWr3nch said:

Very strange. I had the exact opposite experience from the latest beta update to 6.9.0-beta22. My cache writes are way down to a much more reasonable ~500 kB/s and it's still holding from this morning.

It's weird that we have such discrepancies.

Agreed, I can't make sense of it.

I think most of you that have the truly extreme write black holes are running things like plex, my best guess is that these fixes help the issue those dockers have but not the underlying issue.

I only run very mild dockers, lancache, krusader, mumble, qbittorrent etc that are not actively doing anything right now.

The difference from putting docker/appdata on an XFS array drive vs the btrfs cache is undeniable though at around 200-300mb/hour vs 1000-1500mb/hour and climbing in most cases.

[6.8.3] docker image huge amount of unnecessary writes on cache

User Feedback

Recommended Comments

TexasUnraid 113

Link to comment

TexasUnraid 113

Link to comment

chanrc 0

Link to comment

JTok 74

Link to comment

TexasUnraid 113

Link to comment

italeffect 0

Link to comment

grigsby 6

Link to comment

JorgeB 7477

Link to comment

JorgeB 7477

Link to comment

TexasUnraid 113

Link to comment

TexasUnraid 113

Link to comment

JorgeB 7477

Link to comment

Niklas 57

Link to comment

TexasUnraid 113

Link to comment

JorgeB 7477

Link to comment

TexasUnraid 113

Link to comment

JorgeB 7477

Link to comment

TexasUnraid 113

Link to comment

Lignumaqua 9

Link to comment

TexasUnraid 113

Link to comment

T0rqueWr3nch 43

Link to comment

TexasUnraid 113

Link to comment

mf808 3

Link to comment

T0rqueWr3nch 43

Link to comment

TexasUnraid 113

Link to comment

Join the conversation