• [6.8.3] docker image huge amount of unnecessary writes on cache


    S1dney
    • Solved Urgent

    EDIT (March 9th 2021):

    Solved in 6.9 and up. Reformatting the cache to new partition alignment and hosting docker directly on a cache-only directory brought writes down to a bare minimum.

     

    ###

     

    Hey Guys,

     

    First of all, I know that you're all very busy on getting version 6.8 out there, something I'm very much waiting on as well. I'm seeing great progress, so thanks so much for that! Furthermore I won't be expecting this to be on top of the priority list, but I'm hoping someone of the developers team is willing to invest (perhaps after the release).

     

    Hardware and software involved:

    2 x 1TB Samsung EVO 860, setup with LUKS encryption in BTRFS RAID1 pool.

     

    ###

    TLDR (but I'd suggest to read on anyway 😀)

    The image file mounted as a loop device is causing massive writes on the cache, potentially wearing out SSD's quite rapidly.

    This appears to be only happening on encrypted caches formatted with BTRFS (maybe only in RAID1 setup, but not sure).

    Hosting the Docker files directory on /mnt/cache instead of using the loopdevice seems to fix this problem.

    Possible idea for implementation proposed on the bottom.

     

    Grateful for any help provided!

    ###

     

    I have written a topic in the general support section (see link below), but I have done a lot of research lately and think I have gathered enough evidence pointing to a bug, I also was able to build (kind of) a workaround for my situation. More details below.

     

    So to see what was actually hammering on the cache I started doing all the obvious, like using a lot of find commands to trace files that were written to every few minutes and also used the fileactivity plugin. Neither was able trace down any writes that would explain 400 GBs worth of writes a day for just a few containers that aren't even that active.

     

    Digging further I moved the docker.img to /mnt/cach/system/docker/docker.img, so directly on the BTRFS RAID1 mountpoint. I wanted to check whether the unRAID FS layer was causing the loop2 device to write this heavy. No luck either.

    This gave me a situation I was able to reproduce on a virtual machine though, so I started with a recent Debian install (I know, it's not Slackware, but I had to start somewhere ☺️). I create some vDisks, encrypted them with LUKS, bundled them in a BTRFS RAID1 setup, created the loopdevice on the BTRFS mountpoint (same of /dev/cache) en mounted it on /var/lib/docker. I made sure I had to NoCow flags set on the IMG file like unRAID does. Strangely this did not show any excessive writes, iotop shows really healthy values for the same workload (I migrated the docker content over to the VM).

     

    After my Debian troubleshooting I went back over to the unRAID server, wondering whether the loopdevice is created weirdly, so I took the exact same steps to create a new image and pointed the settings from the GUI there. Still same write issues. 

     

    Finally I decided to put the whole image out of the equation and took the following steps:

    - Stopped docker from the WebGUI so unRAID would properly unmount the loop device.

    - Modified /etc/rc.d/rc.docker to not check whether /var/lib/docker was a mountpoint

    - Created a share on the cache for the docker files

    - Created a softlink from /mnt/cache/docker to /var/lib/docker

    - Started docker using "/etc/rd.d/rc.docker start"

    - Started my BItwarden containers.

     

    Looking into the stats with "iotstat -ao" I did not see any excessive writing taking place anymore.

    I had the containers running for like 3 hours and maybe got 1GB of writes total (note that on the loopdevice this gave me 2.5GB every 10 minutes!)

     

    Now don't get me wrong, I understand why the loopdevice was implemented. Dockerd is started with options to make it run with the BTRFS driver, and since the image file is formatted with the BTRFS filesystem this works at every setup, it doesn't even matter whether it runs on XFS, EXT4 or BTRFS and it will just work. I my case I had to point the softlink to /mnt/cache because pointing it /mnt/user would not allow me to start using the BTRFS driver (obviously the unRAID filesystem isn't BTRFS). Also the WebGUI has commands to scrub to filesystem inside the container, all is based on the assumption everyone is using docker on BTRFS (which of course they are because of the container 😁)

    I must say that my approach also broke when I changed something in the shares, certain services get a restart causing docker to be turned off for some reason. No big issue since it wasn't meant to be a long term solution, just to see whether the loopdevice was causing the issue, which I think my tests did point out.

     

    Now I'm at the point where I would definitely need some developer help, I'm currently keeping nearly all docker container off all day because 300/400GB worth of writes a day is just a BIG waste of expensive flash storage. Especially since I've pointed out that it's not needed at all. It does defeat the purpose of my NAS and SSD cache though since it's main purpose was hosting docker containers while allowing the HD's to spin down.

     

    Again, I'm hoping someone in the dev team acknowledges this problem and is willing to invest. I did got quite a few hits on the forums and reddit without someone actually pointed out the root cause of issue.

     

    I missing the technical know-how to troubleshoot the loopdevice issues on a lower level, but have been thinking on possible ways to implement a workaround. Like adjusting the Docker Settings page to switch off the use of a vDisk and if all requirements are met (pointing to /mnt/cache and BTRFS formatted) start docker on a share on the /mnt/cache partition instead of using the vDisk.

    In this way you would still keep all advantages of the docker.img file (cross filesystem type) and users who don't care about writes could still use it, but you'd be massively helping out others that are concerned over these writes.

     

    I'm not attaching diagnostic files since they would probably not point out the needed.

    Also if this should have been in feature requests, I'm sorry. But I feel that, since the solution is misbehaving in terms of writes, this could also be placed in the bugreport section.

     

    Thanks though for this great product, have been using it so far with a lot of joy! 

    I'm just hoping we can solve this one so I can keep all my dockers running without the cache wearing out quick,

     

    Cheers!

     

    • Like 3
    • Thanks 17



    User Feedback

    Recommended Comments



    Interesting, I will keep that in mind.

     

    Although I was actually referring to docker in its entirety, like turning it off/on in the settings menu so that mapping it to a direct share could be done?

     

    Basically trying to allow mapping the docker to a direct share without having to use the go file.

     

    Although it appears all that would need to be done to revert the changes are remove the line from the go file / delete the folder from the flash drive. So not the end of the world, I just like having everything managed with a GUI. The insistence on terminal usage for even basic tasks in linux is what kept me from switching to it many years ago.

     

    I have a very love/hate relationship with windows lol.

    Link to comment

    Ok, I added the SSD with LBA logging to the cache pool and have the docker image on an XFS drive and app data on the cache again.

     

    The writes to the pool work out to around ~400mb/hour yet when the appdata was on an XFS drive it was a mere ~5mb/hour. Almost a 100X increase in writes seems pretty extreme and I can't explain it.

     

    I think going with the direct mount docker is the best option at this point. Guess that is my next project.

    Link to comment

    Anyone try out 6.90-beta22 yet? I'm assuming since we haven't heard anything from the LT guys thsi is still probably an issue. 

    Link to comment
    12 hours ago, TexasUnraid said:

    Ok, I added the SSD with LBA logging to the cache pool and have the docker image on an XFS drive and app data on the cache again.

     

    The writes to the pool work out to around ~400mb/hour yet when the appdata was on an XFS drive it was a mere ~5mb/hour. Almost a 100X increase in writes seems pretty extreme and I can't explain it.

     

    I think going with the direct mount docker is the best option at this point. Guess that is my next project.

     

    For what it is worth, the direct mount did not actually fix it for me, just obfuscate it by making it so loop2/3 didn't show up in iotop. When I checked the SMART status I wound up with just as many writes as before, so I would be interested to hear your results.

    It seems like I might be an outlier here, so possibly I have a different issue affecting my setup.

    Edited by JTok
    Link to comment

    I am sure people are tired of my updates at this point, hopefully about done with not a lot to show for it lol.

     

    Quick summery:

     

    Docker and appdata on XFS array drive = not ideal but acceptable writes in the 200-300mb/hour range

     

    Docker on cache and app data on XFS = ~5mb/hour writes for appdata and 1gig+/hour for docker and climbing over time

     

    Docker on XFS and appdata on the cache = almost 100x the writes vs appdata on xfs at 400-500mb/hour

     

    Docker and appdata on cache = 1gig+ an hour and climbing over time even with modest dockers doing basically nothing

     

    I have now implemented S1dney workaround with a basic script I made up. I started out disabling docker and then copy the file and start docker back up but it seems that simply coping the file at array first start happens early enough that it does not need to stop docker first and it can start up normally.

     

    Here is the script if anyone is intersted, I followed S1dney's write up in post #2 except I used /boot/config/plugins/user.scripts/scripts/Docker\ excessive\ write\ workaround/ to store the files on the flash drive so it is stored with the script and will be deleted with it as well when a fix is released.

     

    #!/bin/bash
    #description=This script changes docker from using an image mounted via a loop device to direct writing to BTRFS cache.
    
    echo "stopping docker"
    #/etc/rc.d/rc.docker stop
    echo "Docker stopped"
    echo 
    
    echo Put the modified docker service file over the original one to make it not use the docker.img
    cp -v /boot/config/plugins/user.scripts/scripts/Docker\ excessive\ write\ workaround/rc.docker /etc/rc.d/rc.docker
    chmod +x /etc/rc.d/rc.docker
    
    
    echo 
    echo "starting docker" 
    #/etc/rc.d/rc.docker start
    echo "Docker started"

     

    This is much simpler for me and reinstalling the dockers was way easier then expected, CA actually allows batch reinstalling dockers from the previous apps menu. I simply checked them all and hit install, boom, all back up and running. I am also leaving the stock docker image in place and it does work to simply disable this script and it reverts to stock on next boot.

     

    I am not getting any issues with the dockers settings menu not showing up but the main menu is taking several seconds for unassigned devices to show up for some reason?

     

    Except maiamdb, for some reason it does not autostart now. I have also not checked all the dockers to make sure they are working properly, most of them are not really doing anything yet since this server is still not "active" outside of UD sharing my old windows drives.

     

    I am going to leave this on overnight and see how things progress with both docker and appdata on the cache.

    Edited by TexasUnraid
    • Like 1
    Link to comment

    For those following this thread like I am - Limetech posted in the new 6.9 beta post that they are not aware of this issue? 

     

     

    Perhaps someone more skilled than I can provide a TLDR on the issue and what has been worked out so far in that thread.

     

     

    Link to comment
    1 hour ago, italeffect said:

    Limetech posted in the new 6.9 beta post that they are not aware of this issue?

    😳

    Link to comment
    4 hours ago, italeffect said:

    Limetech posted in the new 6.9 beta post that they are not aware of this issue? 

    Well, that's disappointing, but basically the information I had was from someone using the latest betas saying that the writes to the docker image for him had decreased by multiple times, so I wrongly assumed the issue was fixed by LT, possibly just a result of other changes, still I can confirm that at least on my test server the information I got was correct:

     

    Writes to the same docker image after 5 minutes, v6.9-beta1 vs v6.9-beta22, also note no increase on btrfs-transacti:

     

    11.thumb.PNG.23d6cb4c035fbeb918ee0e7cbfcc29dc.PNG

    12.thumb.PNG.9dd04b0d77f70144fa61d0fcdd987f9e.PNG

     

    Now idea if there will be a difference to everyone, please try and post here.

    • Like 1
    Link to comment

    Just to add that I did nothing else other than booting with the different betas, and if I go back to the old beta writes again increase massively, here are a couple of 30 second videos showing the real-time write difference:

     

     

    Link to comment

    Ok, left it overnight with the direct method writing to the cache along with appdata also on the cache.

     

    Sadly it did not seem to change much from the normal setup, still getting 1gig+ per hour writes like this. guessing this is more a fix for particular problem dockers? It was worth a shot though.

     

    Seems the only real option at this point is a sacrificial 2.5" HDD formatted as XFS in the array. Multiple cache pools would be amazing right now. Guess I will find out how long a 2.5" drive can last with constant writes.

    Edited by TexasUnraid
    Link to comment
    5 hours ago, johnnie.black said:

    Well, that's disappointing, but basically the information I had was from someone using the latest betas saying that the writes to the docker image for him had decreased by multiple times, so I wrongly assumed the issue was fixed by LT, possibly just a result of other changes, still I can confirm that at least on my test server the information I got was correct:

     

    Writes to the same docker image after 5 minutes, v6.9-beta1 vs v6.9-beta22, also note no increase on btrfs-transacti:

     

    11.thumb.PNG.23d6cb4c035fbeb918ee0e7cbfcc29dc.PNG

    12.thumb.PNG.9dd04b0d77f70144fa61d0fcdd987f9e.PNG

     

    Now idea if there will be a difference to everyone, please try and post here.

    Interesting, how can I test out beta 22 without messing up my current install / and be able to easily revert to my current install?

    Link to comment
    3 minutes ago, TexasUnraid said:

    Interesting, how can I test out beta 22 without messing up my current install / and be able to easily revert to my current install?

    You can easily revert back to the previous release, manually or using the GUI (if the update was done using the GUI):

    image.thumb.png.868bfec08f121d7eed3184ea5aa36923.png

     

     

    Link to comment

    To revert from beta22 you have to do some manual configuration too. Read the release notes for beta22 before trying it out.

    Link to comment
    8 minutes ago, Niklas said:

    To revert from beta22 you have to do some manual configuration too. Read the release notes for beta22 before trying it out.

    I read the notes, I didn't see the manual config that was needed? The only thing I noticed was if you use the multiple cache pools options the config would be lost but does this apply if only using 1 pool?

     

    I have jacked around with this so much, I am kind of accepting I will need to wipe unraid and start fresh before actually going live with this server. I have had too many bad experiences from a slight issue at setup causing big issues down the road in the past on windows, I am a bit paranoid now lol.

    Edited by TexasUnraid
    Link to comment
    3 minutes ago, TexasUnraid said:

    The only thing I noticed was if you use the multiple cache pools options the config would be lost but does this apply if only using 1 pool?

    You'll need to reassign the cache devices.

    Link to comment
    8 minutes ago, johnnie.black said:

    You'll need to reassign the cache devices.

    Ok, thats not a big deal, does it matter if they are back in the same order? Or just that the correct devices are in the same pool? For some reason my drive letters have been changing on reboots and the models are the same for my cache pool.

    Link to comment
    5 minutes ago, johnnie.black said:

    This.

    Thanks, that was a question I had for some time.

     

    Doing a backup now and then will try updating to the beta to see how it goes.

     

    I really like where this beta is heading, add in official snapshot support and read caching / tired storage (which could be as simple as tweaking mover to move recently accessed files to a cache pool honestly) and I can't think of any of major features that would be missing except direct VM snapshots like vmware.

    Link to comment

    There’s an elephant in this room which needs mentioning. We have a long thread marked urgent in the Bugs forum and Limetech state they have no knowledge of it? Were all the reports here a waste of time? 

     

    After repeated requests for official acknowledgment of the issue we got posts from insiders telling us not to worry and that Limetech had it in hand. Were those posts untrue?

     

    As one of those who has had an SSD die extremely early with a huge number of writes that took it out of warranty, this is very, very disappointing. Please, please tell me that this was just a misunderstanding.

    • Like 1
    Link to comment

    Ok, updated to the beta, sadly early indications are not promising.

     

    After 10 mins writes would equal ~1 gig /hour, going to let it run for awhile so I can track actual LBA's written but seems unchanged from the stable version.

     

    At least with multiple cache pools I could format a drive as XFS just for the docker but that is such a waste of a drive.

     

    Strangely my CPU usage is higher then the stable version, 1 thread is consistently pegged.

     

    Average CPU usage used to be ~5% on the stable version.

     

    In the beta it is hovering in the 15%-20% range although it will settle down to ~5% for a second every now and then.

     

    Might just be doing background stuff after the upgrade, going to see if it settles down over the next few hours.

    Edited by TexasUnraid
    Link to comment
    15 hours ago, chanrc said:

    Anyone try out 6.90-beta22 yet? I'm assuming since we haven't heard anything from the LT guys thsi is still probably an issue. 

    I did this morning. While it's still very early, I think this may finally be fixed:

     

    Screenshots here: https://forums.engineerworkshop.com/t/unraid-6-9-0-beta22-update-fixes-and-improvements/215

     

    I am seeing a drop from ~8 MB/s to ~500 kB/s after upgrade with a similar server load (basically idle) and the same Docker containers running. Hopefully the trend holds.

     

    -TorqueWrench

    Edited by T0rqueWr3nch
    • Thanks 2
    Link to comment

    Well, after a full hour, the LBA's have increased by a total of 1.5gig / hour on the beta but could just be first hour after boot up work going on since it is actually a bit worse then the stable version. Does not appear to be any better though, nothing like when it was on the XFS drive.

     

    The CPU still spends ~70-80% of it's time with 1-2 threads pegged and 15-20% total CPU usage. I can actually see the higher power draw on my UPS reporting.

     

    Going to leave it for a few more hours at least, more then likely revert things tomorrow. See how things progress.

     

    edit: Another hour, another 1.5GB of writes. it somehow got worse with the beta it seems. Still high CPU usage as well.

    Edited by TexasUnraid
    Link to comment
    2 hours ago, Lignumaqua said:

    There’s an elephant in this room which needs mentioning. We have a long thread marked urgent in the Bugs forum and Limetech state they have no knowledge of it? Were all the reports here a waste of time? 

     

    After repeated requests for official acknowledgment of the issue we got posts from insiders telling us not to worry and that Limetech had it in hand. Were those posts untrue?

     

    As one of those who has had an SSD die extremely early with a huge number of writes that took it out of warranty, this is very, very disappointing. Please, please tell me that this was just a misunderstanding.

    This

     

    Link to comment
    57 minutes ago, TexasUnraid said:

    Well, after a full hour, the LBA's have increased by a total of 1.5gig / hour on the beta but could just be first hour after boot up work going on since it is actually a bit worse then the stable version. Does not appear to be any better though, nothing like when it was on the XFS drive.

     

    The CPU still spends ~70-80% of it's time with 1-2 threads pegged and 15-20% total CPU usage. I can actually see the higher power draw on my UPS reporting.

     

    Going to leave it for a few more hours at least, more then likely revert things tomorrow. See how things progress.

     

    edit: Another hour, another 1.5GB of writes. it somehow got worse with the beta it seems. Still high CPU usage as well.

    Very strange. I had the exact opposite experience from the latest beta update to 6.9.0-beta22. My cache writes are way down to a much more reasonable ~500 kB/s and it's still holding from this morning.

     

    It's weird that we have such discrepancies. 

    Link to comment
    4 minutes ago, T0rqueWr3nch said:

    Very strange. I had the exact opposite experience from the latest beta update to 6.9.0-beta22. My cache writes are way down to a much more reasonable ~500 kB/s and it's still holding from this morning.

     

    It's weird that we have such discrepancies. 

    Agreed, I can't make sense of it.

     

    I think most of you that have the truly extreme write black holes are running things like plex, my best guess is that these fixes help the issue those dockers have but not the underlying issue.

     

    I only run very mild dockers, lancache, krusader, mumble, qbittorrent etc that are not actively doing anything right now.

     

    The difference from putting docker/appdata on an XFS array drive vs the btrfs cache is undeniable though at around 200-300mb/hour vs 1000-1500mb/hour and climbing in most cases.

    • Like 1
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.