• [6.8.3] docker image huge amount of unnecessary writes on cache


    S1dney
    • Solved Urgent

    EDIT (March 9th 2021):

    Solved in 6.9 and up. Reformatting the cache to new partition alignment and hosting docker directly on a cache-only directory brought writes down to a bare minimum.

     

    ###

     

    Hey Guys,

     

    First of all, I know that you're all very busy on getting version 6.8 out there, something I'm very much waiting on as well. I'm seeing great progress, so thanks so much for that! Furthermore I won't be expecting this to be on top of the priority list, but I'm hoping someone of the developers team is willing to invest (perhaps after the release).

     

    Hardware and software involved:

    2 x 1TB Samsung EVO 860, setup with LUKS encryption in BTRFS RAID1 pool.

     

    ###

    TLDR (but I'd suggest to read on anyway 😀)

    The image file mounted as a loop device is causing massive writes on the cache, potentially wearing out SSD's quite rapidly.

    This appears to be only happening on encrypted caches formatted with BTRFS (maybe only in RAID1 setup, but not sure).

    Hosting the Docker files directory on /mnt/cache instead of using the loopdevice seems to fix this problem.

    Possible idea for implementation proposed on the bottom.

     

    Grateful for any help provided!

    ###

     

    I have written a topic in the general support section (see link below), but I have done a lot of research lately and think I have gathered enough evidence pointing to a bug, I also was able to build (kind of) a workaround for my situation. More details below.

     

    So to see what was actually hammering on the cache I started doing all the obvious, like using a lot of find commands to trace files that were written to every few minutes and also used the fileactivity plugin. Neither was able trace down any writes that would explain 400 GBs worth of writes a day for just a few containers that aren't even that active.

     

    Digging further I moved the docker.img to /mnt/cach/system/docker/docker.img, so directly on the BTRFS RAID1 mountpoint. I wanted to check whether the unRAID FS layer was causing the loop2 device to write this heavy. No luck either.

    This gave me a situation I was able to reproduce on a virtual machine though, so I started with a recent Debian install (I know, it's not Slackware, but I had to start somewhere ☺️). I create some vDisks, encrypted them with LUKS, bundled them in a BTRFS RAID1 setup, created the loopdevice on the BTRFS mountpoint (same of /dev/cache) en mounted it on /var/lib/docker. I made sure I had to NoCow flags set on the IMG file like unRAID does. Strangely this did not show any excessive writes, iotop shows really healthy values for the same workload (I migrated the docker content over to the VM).

     

    After my Debian troubleshooting I went back over to the unRAID server, wondering whether the loopdevice is created weirdly, so I took the exact same steps to create a new image and pointed the settings from the GUI there. Still same write issues. 

     

    Finally I decided to put the whole image out of the equation and took the following steps:

    - Stopped docker from the WebGUI so unRAID would properly unmount the loop device.

    - Modified /etc/rc.d/rc.docker to not check whether /var/lib/docker was a mountpoint

    - Created a share on the cache for the docker files

    - Created a softlink from /mnt/cache/docker to /var/lib/docker

    - Started docker using "/etc/rd.d/rc.docker start"

    - Started my BItwarden containers.

     

    Looking into the stats with "iotstat -ao" I did not see any excessive writing taking place anymore.

    I had the containers running for like 3 hours and maybe got 1GB of writes total (note that on the loopdevice this gave me 2.5GB every 10 minutes!)

     

    Now don't get me wrong, I understand why the loopdevice was implemented. Dockerd is started with options to make it run with the BTRFS driver, and since the image file is formatted with the BTRFS filesystem this works at every setup, it doesn't even matter whether it runs on XFS, EXT4 or BTRFS and it will just work. I my case I had to point the softlink to /mnt/cache because pointing it /mnt/user would not allow me to start using the BTRFS driver (obviously the unRAID filesystem isn't BTRFS). Also the WebGUI has commands to scrub to filesystem inside the container, all is based on the assumption everyone is using docker on BTRFS (which of course they are because of the container 😁)

    I must say that my approach also broke when I changed something in the shares, certain services get a restart causing docker to be turned off for some reason. No big issue since it wasn't meant to be a long term solution, just to see whether the loopdevice was causing the issue, which I think my tests did point out.

     

    Now I'm at the point where I would definitely need some developer help, I'm currently keeping nearly all docker container off all day because 300/400GB worth of writes a day is just a BIG waste of expensive flash storage. Especially since I've pointed out that it's not needed at all. It does defeat the purpose of my NAS and SSD cache though since it's main purpose was hosting docker containers while allowing the HD's to spin down.

     

    Again, I'm hoping someone in the dev team acknowledges this problem and is willing to invest. I did got quite a few hits on the forums and reddit without someone actually pointed out the root cause of issue.

     

    I missing the technical know-how to troubleshoot the loopdevice issues on a lower level, but have been thinking on possible ways to implement a workaround. Like adjusting the Docker Settings page to switch off the use of a vDisk and if all requirements are met (pointing to /mnt/cache and BTRFS formatted) start docker on a share on the /mnt/cache partition instead of using the vDisk.

    In this way you would still keep all advantages of the docker.img file (cross filesystem type) and users who don't care about writes could still use it, but you'd be massively helping out others that are concerned over these writes.

     

    I'm not attaching diagnostic files since they would probably not point out the needed.

    Also if this should have been in feature requests, I'm sorry. But I feel that, since the solution is misbehaving in terms of writes, this could also be placed in the bugreport section.

     

    Thanks though for this great product, have been using it so far with a lot of joy! 

    I'm just hoping we can solve this one so I can keep all my dockers running without the cache wearing out quick,

     

    Cheers!

     

    • Like 3
    • Thanks 17



    User Feedback

    Recommended Comments



    I was wrong, it is around 70gb right now. lol

     

    Several of them are pretty hefty, hard to know which ones, I tried running the container size tool but it just sat thinking for longer then I wanted to wait.

     

    Most of the dockers I get are GUI based, so they can be pretty big, I try my hardest to avoid CLI if at all possible. Part of why I collect all the GUI dockers I can find.

    Link to comment
    4 hours ago, TexasUnraid said:

    I tend to install any docker that looks interesting just to have it on hand.

     

    I can't think of any good reason to do this. It's just digital hoarding. All the crap you're "collecting" is readily available and downloadable anytime you want it.

    Link to comment
    6 minutes ago, grigsby said:

     

    I can't think of any good reason to do this. It's just digital hoarding. All the crap you're "collecting" is readily available and downloadable anytime you want it.

     

    But then I have to track it down again, easier said then done in many cases to find a GUI that works properly and get it setup properly.

     

    It also sounds worse then it is, I think I have around 60 dockers right now, all of them serve a purpose, they just don't need to be running all the time.

     

    Most of them are basic GUI tools that eliminate the need to use the CLI for some basic task, worth every GB. The insistence on CLI is what kept me from using linux the last 20 years.

     

    Plus, I have 112TB of space on my server, whats 100gb for dockers? I have regretted not downloading something when I had the chance but never once regretted downloading it and not using it. Can always delete it if I ran out of room.

    Edited by TexasUnraid
    Link to comment

    It's nothing to redownload them.  Takes a couple of seconds of your time, computer time limited by your modem speed.

     

    Apps - Previous Apps.  Within that section you can also delete whatever you no longer wish to ever use again.

    Link to comment

    I am aware of that, I was simply curious if I put the image back if it would pick up where it left off. Saves a few hours of downloading and is simpler.

    Link to comment

    Yeah, because you're not touching the appdata (I do it ~1/month to verify it works)  Only if you've got custom networks created via the command line then you have to recreate them by hand

    Link to comment

    cool, thats what I was wanting to know.

     

    No custom networks that I know of unless sending one container through another (VPN) counts.

    Link to comment
    14 hours ago, mgutt said:

    This is something which needs to be solved by docker:

    https://github.com/moby/moby/issues/42200

    I get that, but considering it was listed as 'resolved' in the 6.9 update - if its still an issue, then there should be a clear migration path that doesn't require users to faff with moving and changing docker around to remove the issue from Unraid's point of view, to work around the core docker issue.

     

    Instead, I, like others, have rebuilt the pool to a new partition map as suggested, incurring the time cost of doing so, but the issue persists regardless, so its ultimately been a pointless endeavor that would have been best described as a 'workaround'.

    Link to comment
    27 minutes ago, boomam said:

    I get that, but considering it was listed as 'resolved' in the 6.9 update - if its still an issue

    The part that is related to Unraid was solved. Nobody can solve write-amplification of BTRFS and Unraid can't influence how docker stores status updates. Docker decided to save this data in a file instead of RAM. This causes writes. Feel free to like / comment the issue. Maybe it will be solved earlier if devs see how many people are suffering from wearing out SSDs.

    Link to comment
    1 hour ago, mgutt said:

    The part that is related to Unraid was solved. Nobody can solve write-amplification of BTRFS and Unraid can't influence how docker stores status updates. Docker decided to save this data in a file instead of RAM. This causes writes. Feel free to like / comment the issue. Maybe it will be solved earlier if devs see how many people are suffering from wearing out SSDs.

    You are missing the other part of my point that lends greater context to it ;-)

     

     

    Anyway, do we know for sure yet if the folder based method is a viable solution? Or another workaround?

     

    Edited by boomam
    Link to comment

    So after letting things settle down somewhat, looks like my average writes per day to the appdata SSD works out like this:

     

    XFS = ~20-25gb/day

    BTRFS single drive = 75-85gb/day

     

    Roughly 3x the writes, I wish there was a way to narrow down which docker container was causing most of the writes.

     

    I am mixed on if I will go back to XFS or stick with btrfs. I really like the idea of snapshots but then again it is not super important as I take weekly backups of appdata anyways.

    Link to comment
    14 hours ago, TexasUnraid said:

     

    XFS = ~20-25gb/day

    BTRFS single drive = 75-85gb/day

    Both folder or docker.img?

    Link to comment
    1 hour ago, mgutt said:

    Both folder or docker.img?

     

    both are using an image.

     

    I might try moving to a folder if I have some time.

    Link to comment

    Just dropping this note here more to remind myself when this part of the test started lol.

     

    Just converted docker from a BTRFS image to a folder on a btrfs formatted SSD. Reinstalling was not as simple as it was made to sound as there were a LOT of previous apps I did not want to reinstall lol (I tend to keep apps in there on purpose as a way of bookmarking them).

     

    Server should not be doing much in the coming days so can hopefully get some usable numbers quicker this time.

     

    Link to comment

    While I am doing all this, is there a way to tell which files are being written to by docker? I would like to see which dockers are causing the most writes so I can know if they need a health check disabled / look for another option.

     

    It does not seem like the file activity plugin lists appdata / docker?

    Link to comment
    23 hours ago, TexasUnraid said:

    is there a way to tell which files are being written to by docker?

     

    You could start with this, which returns the 100 most recent files of the docker directory:

    find /mnt/user/system/docker -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head -n100n

     

    Another method would be to log all file changes:

    inotifywait -e create,modify,attrib,moved_from,moved_to --timefmt %c --format '%T %_e %w %f' -mr /mnt/user/system/docker > /mnt/user/system/recent_modified_files_$(date +"%Y%m%d_%H%M%S").txt
    

     

    More about --no-healtcheck and these commands:

    https://forums.unraid.net/bug-reports/stable-releases/683-unnecessary-overwriting-of-json-files-in-dockerimg-every-5-seconds-r1079/?tab=comments#comment-10983

     

    Link to comment
    2 hours ago, mgutt said:

     

    You could start with this, which returns the 100 most recent files of the docker directory:

    
    find /mnt/user/system/docker -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head -n100n

     

    Another method would be to log all file changes:

    
    inotifywait -e create,modify,attrib,moved_from,moved_to --timefmt %c --format '%T %_e %w %f' -mr /mnt/user/system/docker > /mnt/user/system/recent_modified_files_$(date +"%Y%m%d_%H%M%S").txt
    

     

    More about --no-healtcheck and these commands:

    https://forums.unraid.net/bug-reports/stable-releases/683-unnecessary-overwriting-of-json-files-in-dockerimg-every-5-seconds-r1079/?tab=comments#comment-10983

     

     

    Thanks good info there. Didn't think about the fact I can see the individual docker files now.

     

    I tried the first command but got this error:

     

    head: invalid number of lines: ‘100n’

     

    I tried removing the last n but it just sat thinking for half an hour before I killed it?

    Link to comment
    On 6/16/2021 at 3:35 PM, mgutt said:

     

     

    Another method would be to log all file changes:

    
    inotifywait -e create,modify,attrib,moved_from,moved_to --timefmt %c --format '%T %_e %w %f' -mr /mnt/user/system/docker > /mnt/user/system/recent_modified_files_$(date +"%Y%m%d_%H%M%S").txt
    

     

     

     

    I tried this command as well, it worked for a long time and said it setup the watches but after a few hours the log is still empty?

     

    I know for a fact there have been writes to the docker images in this time?

    Link to comment
    4 hours ago, TexasUnraid said:

    after a few hours

    Was the terminal open in this time? After closing the terminal, the watch process is killed as well.

     

    If you want a long term monitoring, you could add " &" at the end of the command, to permanently run it in the background and later you could kill the process with the following command:

    pkill -xc inotifywait

     

    Are you using the docker.img? The command can't monitor file changes inside the docker.img. If you want to monitor them, you need to change the path to "/var/lib/docker".

    Link to comment
    6 hours ago, mgutt said:

    Was the terminal open in this time? After closing the terminal, the watch process is killed as well.

     

    If you want a long term monitoring, you could add " &" at the end of the command, to permanently run it in the background and later you could kill the process with the following command:

    
    
    pkill -xc inotifywait

     

    Are you using the docker.img? The command can't monitor file changes inside the docker.img. If you want to monitor them, you need to change the path to "/var/lib/docker".

     

    I ran the command from inside user scripts so I would not have to keep a terminal open but yes, it was running all this time, I just killed it a few mins ago.

     

    Still using the docker folder.

     

    Really odd it is not working, the inotifywait was setup properly as checking for existing inotifywait processes shows this one as being active yet the log is still empty.

     

    I will try running it from a direct terminal and see what happens.

     

    I was not aware that adding & makes it run in the background, so simply adding a single & to the end of a command puts it in the background and the terminal can be closed? I knew that && would run the next command if the first completed successfully but this is quite useful as well.

    Edited by TexasUnraid
    Link to comment

    Ok, tried it again in an open terminal window and still get an empty log file after several hours:

     

    inotifywait -e create,modify,attrib,moved_from,moved_to --timefmt %c --format '%T %_e %w %f' -mr /mnt/user/system/docker > /mnt/user/system/recent_modified_files_$(date +"%Y%m%d_%H%M%S").txt &
    [1] 28515
    Setting up watches.  Beware: since -r was given, this may take a while!
    Watches established.

     

    Link to comment

    So it has been a few days now and the results are pretty consistent.

     

    BTRFS formatted drive + BTRFS image = 75-85gb/day

    BTRFS drive + Docker folder = 60-65gb/day

    XFS drive with BTRFS image = 20-25gb/day

     

    Before I switch back to the image (might try xfs but don't expect much difference then the folder) I would really like to figure out what containers are causing the writes.

     

    Any idea why the inotifywait is not working?

    Link to comment

    Ok, trying to troubleshoot the inotifywait and totally lost.

     

    I kept breaking it down and testing it by manually editing a file but it never once logged a single thing.

     

    I now have it down to

     

    inotifywait /mnt/user/Temp/Temp/Test.txt

     

    Yet no matter what I do to the file, it never outputs a thing?

     

    How is this even possible?

     

    I need to move back to the image but really want to see what is causing the writes before I do.

    Link to comment

    Ok, after a lot of testing and troubleshooting I finally narrowed it down to the command needing to be run in the /var/lib/docker folder and NOT the user folder.

     

    Now it is logging properly.

     

    It seems the vast majority of the activity is container-id-json.log files, interestingly, they seem to be focused on the binhex dockers.

     

    Anyone know what these logs are for? Are they needed? any way to reduce or stop them?

     

    I tried a command I saw online but the container would get an error at not being able to load the logging driver and exit.

     

    --log-driver none

     

    Edited by TexasUnraid
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.