• [6.8.3] docker image huge amount of unnecessary writes on cache


    S1dney
    • Solved Urgent

    EDIT (March 9th 2021):

    Solved in 6.9 and up. Reformatting the cache to new partition alignment and hosting docker directly on a cache-only directory brought writes down to a bare minimum.

     

    ###

     

    Hey Guys,

     

    First of all, I know that you're all very busy on getting version 6.8 out there, something I'm very much waiting on as well. I'm seeing great progress, so thanks so much for that! Furthermore I won't be expecting this to be on top of the priority list, but I'm hoping someone of the developers team is willing to invest (perhaps after the release).

     

    Hardware and software involved:

    2 x 1TB Samsung EVO 860, setup with LUKS encryption in BTRFS RAID1 pool.

     

    ###

    TLDR (but I'd suggest to read on anyway 😀)

    The image file mounted as a loop device is causing massive writes on the cache, potentially wearing out SSD's quite rapidly.

    This appears to be only happening on encrypted caches formatted with BTRFS (maybe only in RAID1 setup, but not sure).

    Hosting the Docker files directory on /mnt/cache instead of using the loopdevice seems to fix this problem.

    Possible idea for implementation proposed on the bottom.

     

    Grateful for any help provided!

    ###

     

    I have written a topic in the general support section (see link below), but I have done a lot of research lately and think I have gathered enough evidence pointing to a bug, I also was able to build (kind of) a workaround for my situation. More details below.

     

    So to see what was actually hammering on the cache I started doing all the obvious, like using a lot of find commands to trace files that were written to every few minutes and also used the fileactivity plugin. Neither was able trace down any writes that would explain 400 GBs worth of writes a day for just a few containers that aren't even that active.

     

    Digging further I moved the docker.img to /mnt/cach/system/docker/docker.img, so directly on the BTRFS RAID1 mountpoint. I wanted to check whether the unRAID FS layer was causing the loop2 device to write this heavy. No luck either.

    This gave me a situation I was able to reproduce on a virtual machine though, so I started with a recent Debian install (I know, it's not Slackware, but I had to start somewhere ☺️). I create some vDisks, encrypted them with LUKS, bundled them in a BTRFS RAID1 setup, created the loopdevice on the BTRFS mountpoint (same of /dev/cache) en mounted it on /var/lib/docker. I made sure I had to NoCow flags set on the IMG file like unRAID does. Strangely this did not show any excessive writes, iotop shows really healthy values for the same workload (I migrated the docker content over to the VM).

     

    After my Debian troubleshooting I went back over to the unRAID server, wondering whether the loopdevice is created weirdly, so I took the exact same steps to create a new image and pointed the settings from the GUI there. Still same write issues. 

     

    Finally I decided to put the whole image out of the equation and took the following steps:

    - Stopped docker from the WebGUI so unRAID would properly unmount the loop device.

    - Modified /etc/rc.d/rc.docker to not check whether /var/lib/docker was a mountpoint

    - Created a share on the cache for the docker files

    - Created a softlink from /mnt/cache/docker to /var/lib/docker

    - Started docker using "/etc/rd.d/rc.docker start"

    - Started my BItwarden containers.

     

    Looking into the stats with "iotstat -ao" I did not see any excessive writing taking place anymore.

    I had the containers running for like 3 hours and maybe got 1GB of writes total (note that on the loopdevice this gave me 2.5GB every 10 minutes!)

     

    Now don't get me wrong, I understand why the loopdevice was implemented. Dockerd is started with options to make it run with the BTRFS driver, and since the image file is formatted with the BTRFS filesystem this works at every setup, it doesn't even matter whether it runs on XFS, EXT4 or BTRFS and it will just work. I my case I had to point the softlink to /mnt/cache because pointing it /mnt/user would not allow me to start using the BTRFS driver (obviously the unRAID filesystem isn't BTRFS). Also the WebGUI has commands to scrub to filesystem inside the container, all is based on the assumption everyone is using docker on BTRFS (which of course they are because of the container 😁)

    I must say that my approach also broke when I changed something in the shares, certain services get a restart causing docker to be turned off for some reason. No big issue since it wasn't meant to be a long term solution, just to see whether the loopdevice was causing the issue, which I think my tests did point out.

     

    Now I'm at the point where I would definitely need some developer help, I'm currently keeping nearly all docker container off all day because 300/400GB worth of writes a day is just a BIG waste of expensive flash storage. Especially since I've pointed out that it's not needed at all. It does defeat the purpose of my NAS and SSD cache though since it's main purpose was hosting docker containers while allowing the HD's to spin down.

     

    Again, I'm hoping someone in the dev team acknowledges this problem and is willing to invest. I did got quite a few hits on the forums and reddit without someone actually pointed out the root cause of issue.

     

    I missing the technical know-how to troubleshoot the loopdevice issues on a lower level, but have been thinking on possible ways to implement a workaround. Like adjusting the Docker Settings page to switch off the use of a vDisk and if all requirements are met (pointing to /mnt/cache and BTRFS formatted) start docker on a share on the /mnt/cache partition instead of using the vDisk.

    In this way you would still keep all advantages of the docker.img file (cross filesystem type) and users who don't care about writes could still use it, but you'd be massively helping out others that are concerned over these writes.

     

    I'm not attaching diagnostic files since they would probably not point out the needed.

    Also if this should have been in feature requests, I'm sorry. But I feel that, since the solution is misbehaving in terms of writes, this could also be placed in the bugreport section.

     

    Thanks though for this great product, have been using it so far with a lot of joy! 

    I'm just hoping we can solve this one so I can keep all my dockers running without the cache wearing out quick,

     

    Cheers!

     

    • Like 3
    • Thanks 17



    User Feedback

    Recommended Comments



    1 hour ago, TexasUnraid said:

    /var/lib/docker

    If you are using the docker.img, but not if you are using the folder.

     

    1 hour ago, TexasUnraid said:

    container-id-json.log files

    You could add a path to the container, so for example /log (container) is written to /tmp (host). And /tmp is located in the RAM, so it does not touch your SSD. This would be a similar trick as Plex RAM transcoding.

     

    Another method would be to define the /log path inside the container as a RAM Disk:

    https://forums.unraid.net/topic/35878-plex-guide-to-moving-transcoding-to-ram/page/12/?tab=comments#comment-894460

     

    Of course "/log" is only an example. You need to check the path where the log files are written to.

     

    PS it could be necessary to rebuild the container (delete / add through apps > previous apps), so the content of /log becomes deleted (you can't set paths if they already exist in a container).

    Link to comment
    8 hours ago, mgutt said:

    If you are using the docker.img, but not if you are using the folder.

     

    You could add a path to the container, so for example /log (container) is written to /tmp (host). And /tmp is located in the RAM, so it does not touch your SSD. This would be a similar trick as Plex RAM transcoding.

     

    Another method would be to define the /log path inside the container as a RAM Disk:

    https://forums.unraid.net/topic/35878-plex-guide-to-moving-transcoding-to-ram/page/12/?tab=comments#comment-894460

     

    Of course "/log" is only an example. You need to check the path where the log files are written to.

     

    PS it could be necessary to rebuild the container (delete / add through apps > previous apps), so the content of /log becomes deleted (you can't set paths if they already exist in a container).

     

    I know in theory that you can use the direct path with the folder but the log sat empty for hours when I did that. Switched to the /var folder and the log was populated as expected. Can't explain it but thats what happened.

     

    Good option for situations where you can control the logs, I have no idea what these json.log files are though, they are part of the docker system and I could not find any command to change the path they are written to.

     

    You can add set the log type to none but for some reason it breaks the container. It is stored in the root folder of the container, so I don't think a mount point inside the container would effect anything as this is lower level then the container.

    Link to comment
    32 minutes ago, TexasUnraid said:

    they are part of the docker system

    Which is the primary reason why Unraid can't quickly "fix" this. They didn't author the docker system.

    Link to comment
    1 hour ago, TexasUnraid said:

    It is stored in the root folder of the container

    you can map specific file instead of folder too.. so no need to map entire folder. it might be useful, when file is at root level inside the container. 

    Link to comment
    1 hour ago, jonathanm said:

    Which is the primary reason why Unraid can't quickly "fix" this. They didn't author the docker system.

    I know, I am not saying the log files are unraids problem. I am simply asking if there is a universal way to deal with it.

    Link to comment
    2 minutes ago, uldise said:

    you can map specific file instead of folder too.. so no need to map entire folder. it might be useful, when file is at root level inside the container. 

     

    Please explain, are you talking about mapping a file inside the container? The container is not aware of this log, it is the docker system itself that maintains the log. After research it seems it is a log of all the output from the container.

     

    My thinking was I could re-direct the log files to a tmp folder with a sumlink but docker complains with this:

     

      

    error can not open /var/lib/docker/containers/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4/tmp/test/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4-json.log: no such file or directory"}

     

    Why does it stack the paths of the symlink like that instead of just going right to the tmp folder?

     

    I created the link with this command:

     

    ln -sf /tmp/test/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4-json.log /var/lib/docker/containers/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4-json.log

     

    Any other ideas that could somehow move the writes to these logs onto ram without docker knowing?

     

    I tried a hard link but it won't work across file systems.

    Link to comment
    24 minutes ago, TexasUnraid said:

    The container is not aware of this log, it is the docker system itself that maintains the log.

    if it's not in the container file system, then you can't map this.

    i'm talking about simple volume mapping, for example you can map /mnt/user/appdata/container/caintainer.log on host side with /container.log inside a container.

    Link to comment
    1 hour ago, uldise said:

    if it's not in the container file system, then you can't map this.

    i'm talking about simple volume mapping, for example you can map /mnt/user/appdata/container/caintainer.log on host side with /container.log inside a container.

     

    Yeah, I thought that is what you were talking about. Sadly that won't help in this case.

     

    The easiest option is to move these log files to a /tmp/ folder and use a symlink but I can't get it to work for some reason.

     

    If I move the log to a lower directory a symlink works fine but as soon as I try to move it outside of the normal directory tree, it fails.

     

    When looking at the logs of the failure, it stacks the paths one after the other:

     

    /var/lib/docker/containers/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4/tmp/test/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4-json.log

     

    Instead of just

     

    /tmp/test/112811851ab079fcf45f423e95abc48e466a34f8468c917c7bdd1686d22f6ab4-json.log

     

    I can't find anything helpful on google either.

     

    The link works fine if I manually cat the file though.

    Link to comment
    4 hours ago, TexasUnraid said:

    I have no idea what these json.log files are though,

    You need to be more clear. Are you talking about these files?

    https://forums.unraid.net/bug-reports/stable-releases/683-unnecessary-overwriting-of-json-files-in-dockerimg-every-5-seconds-r1079/

    /var/lib/docker/containers/*/hostconfig.json is updated every 5 seconds with the same content
    /var/lib/docker/containers/*/config.v2.json is updated every 5 seconds with the same content except of some timestamps (which shouldn't be part of a config file I think)

     

    Writes to them can be disabled through --no-healthcheck

     

    If not, what is the content of the files you mean?

     

    EDIT: Ok, it seems these are the general logs:

    https://stackoverflow.com/questions/31829587/docker-container-logs-taking-all-my-disk-space

     

    Docker offers different options to influence the logs:

    https://docs.docker.com/config/containers/logging/configure/

     

    As an example this should disable the logs:

    --log-driver none

     

    Another idea could be to raise the buffer, so it collects a huge amount of logs before writing them to the ssd:

    --log-opt max-buffer-size=32m

     

    Another interesting thing is this description:

    local	    Logs are stored in a custom format designed for minimal overhead.
    json-file	The logs are formatted as JSON. The default logging driver for Docker.

     

    So "local" seems to produce smaller log files?!

     

    Another possible option is maybe "syslog", so it writes the logs to the host syslogs (which is located in the RAM) and not inside a json file:

    syslog	Writes logging messages to the syslog facility. The syslog daemon must be running on the host machine.

     

    Happy testing ;)

    Link to comment

    I tried the log-driver none option before but it broke the containers for some reason.

     

    I could not understand how the buffer option worked, the best I could understand it, it doesn't actually buffer the log into memory first and then write it out when it is full, this would be great.

     

    The way I understood it is that it is simply a buffer for when the logging driver is overloaded so that it can catch back up.

     

    Am I wrong in that understanding?

     

    I considered syslog but didn't want it filling my unraid log. I know very little about this stuff but I found a syslog-ng docker container that can accept logs via tcp connection.

     

    This might be the only way to change the location of the log files, although not sure how to handle multiple dockers going to the same syslog.

     

    During testing just now, I set it to syslog-ng settings but stopped the syslog-ng container. This time there was no local log and it didn't complain either. Seems it just didn't log at all.

     

    I am curious, where are the unraid docker logs pulled from? (when you click logs in the gui)?

    Link to comment
    2 hours ago, TexasUnraid said:

    The way I understood it is that it is simply a buffer for when the logging driver is overloaded so that it can catch back up.

     

    Yes, I think you are right. What about this:

    --log-opt mode=non-blocking --log-opt max-buffer-size=1K

     

    Maybe it "looses" the logs because of this 😅

    When the buffer is full and a new message is enqueued, the oldest message in memory is dropped. Dropping messages is often preferred to blocking the log-writing process of an application.

     

    Or maybe its possible to create empty log files?!

    --log-opt max-size=0k

     

    2 hours ago, TexasUnraid said:

    via tcp connection.

    Sounds like a lot of overhead.

     

     

    Link to comment
    5 minutes ago, mgutt said:

     

    Sounds like a lot of overhead.

     

     

     

    Indeed it would be but it was just a test to see what is possible.

     

    Interestingly, while the syslog-ng does indeed work and log properly the more interesting option for this situation is that by simply inputting a null ip address the containers do not error out and they also do not seem to be storing logs anywhere. The syslog-ng is shutdown and there is no json.log files in the container folders.

     

    I restarted the docker notify logging and sure enough, no more log activity dealing with those log files, vastly reduced the number of entries in total.

     

    The nice thing about this option is that it seems to be universal to any problem container.

     

    Now I am trying to narrow down the remaining entries. Looks like the healthcheck files you listed above exist in a few containers, so going to add the no-healthcheck to those and just keep working my way along.

     

    Is there any downside to the no-healthcheck?

    Link to comment
    18 hours ago, TexasUnraid said:

    Is there any downside to the no-healthcheck?

    Some people use external tools to auto-check the running state of docker containers. healthcheck runs usually a simple "heartbeat" shell script to verify the docker is not only in running state, but also really working. In Unraid you can see this state as well in the docker container overview. But finally I think this is only important for business usage if you really need to know if everything works as expected.

    Link to comment

    Makes sense. Looks like I have disabled most of the docker logs at this point, got to give it a day to see the results to see what effect it has on writes. Then I will start disabling healthchecks and see what improvement that has.

    • Like 3
    Link to comment

    It is only one day, it does vary a bit day to day but looks like disabling those logs dropped the writes from 75-85gb/day to 50gb/day.

     

    Not bad really.

     

    Think I will give it one more day and then disable the healthchecks and see how that does.

    • Thanks 2
    Link to comment

    Ok, another day past, this time it was 47gb of writes, so can confirm, disabling those logs (only cut the logs from the worst offenders, about half are still logging) cut the writes by ~35%.

     

    Now going to disable the worst helthchecks, or possibly change the timing, I noticed the option to set the healthcheck timing, thinking about raising it to like 30-60 mins, should cause minimal writes and still check in enough for most of the dockers.

     

    I am curious, the ifnotifywait log has grown to 100mb and is really hard to view. I tried using sort | uniqe but it doesn't seem to be removing all the duplicate lines as I still see a lot. Is there a better way to remove duplicates?

    Link to comment
    On 6/24/2021 at 3:06 PM, TexasUnraid said:

    Looks like I have disabled most of the docker logs at this point

    How did you disable the logs exactly?

     

    7 hours ago, TexasUnraid said:

    worst offenders

    and which dockers were the worst offenders?

    Link to comment
    11 minutes ago, vakilando said:

    How did you disable the logs exactly?

     

    and which dockers were the worst offenders?

     

    I explained it earlier but might of not posted the exact command, basically you set it to syslog and then a remote server but with an invalid ip address. Put this in the "Extra Parameters:" section of the docker settings.

     

    --log-driver syslog --log-opt syslog-address=udp://192.168.1.100:514

     

    This seems to work to disable the internal logging of the docker without breaking it.

     

    The worst offenders so far seem to be the VPN enabled containers and binhex containers for some reason.

     

    Healthchecks I am not sure about yet, have not had time to dig into that.

    Edited by TexasUnraid
    • Like 1
    Link to comment

    Thank you! I'll try this out.

    I disabled healthcheck on almost all containers and have no problems or even noticed any changes.

    Link to comment

    Ok Just putting this hear as notes to myself and anyone else that finds this.

     

    So far the commands I am using to reduce writes are first disabling the low level logging from the docker engine. When possible using :

     

    --log-driver none

     

    This breaks some containers though and also disables the unraid gui logs so using this for the picky containers or ones I want to be able to see the logs for in unraid:

     

    --log-driver syslog --log-opt syslog-address=udp://127.0.0.1:541

     

    I used my unraid server ip so that I could spinup a syslog-ng container and easily capture the logs but that should work as well.

     

    I am then using this to mostly disable health checks

     

     --health-interval=60m

     

    It extends the health checks to 60 mins, I figure they should cause virtually no noticeable writes like that and still function good enough for home use.

     

    Slowly working my way through the containers using the watch commands mgutt posted on the last page. Netdata was doing a lot of logging and only way to stop it was with the --log-driver none.

    Edited by TexasUnraid
    • Thanks 1
    Link to comment

    Really going down the rabbit hole now, trying to eliminate all unnecessary writes possible lol.

     

    I have been tweaking things based on the activity logs, slowly knocking out the big hitters. It has made a BIG difference in the logs themselves. I was racking up ~75mb of activity logs a day at first, I had less then 1mb of activity logs overnight now!

     

    So far I have mostly just disabled the docker logging of most of the containers using the above commands. Extended the helthchecks of a few containers to 60 mins but only like 2 problem containers so far.

     

    The log is still seeing a lot of activity from /btrfs/subvolumes/*id*/tmp from various containers. Obviously it is an internal log of the program thinking it is writing to ram. But by default docker does not map /tmp to ram, instead it maps to the disk.

     

    Found another command that should help quite a bit for in this case though, testing it out now on the worst offenders. Added to the extra parms in the docker config as well.

     

    --mount type=tmpfs,destination=/tmp,tmpfs-mode=1777,tmpfs-size=256000000

     

    This sets up a 256gb ramdrive (only uses the ram as needed, so oversized it a bit) in the /tmp folder inside the container. Seems to be working on the first container I am testing it on.

     

    I just restarted the server so got to wait for the activity logs to start back up and keep testing from there.

     

    I can confirm that the helthcheck interval works, only see that activity every 60 mins now, which is ok for me.

    Edited by TexasUnraid
    • Like 1
    Link to comment
    1 minute ago, BRiT said:

    Why not map the existing unraid host /tmp to the dockers' /tmp?

     

    From reading it doesn't sound like this works, or at the least is discouraged. Everyone recommended to setup a dedicated ramdrive inside the container. Makes sense, if it had access to the system ramdrive it could cause system instability if it for example filled up the ram. Vs inside the container with a limit on the size the damage would be contained.

     

    Don't know the reasons only they recommended doing it this way and it seems to be working good, no more writes to /tmp inside the containers.

    Link to comment

    Ah, I see normal unRaid systems don't have their own tmpfs setup for the unRaid's own /tmp.

     

    They only have it setup for the following: /run, /dev/shm, /var/log. It would be nice if by default they also used it for /tmp.

    Link to comment

    Well, after some trial and error I am getting to the point the activity log is actually staying empty so I have to wait to see what will be causing the next writes!

     

    Does anyone know how to figure out what /var/lib/docker/volumes/*folder* connects to which container?

     

    Seems a lot of the writes that are still happening are happening in those folders but I can't figure out a way to connect them to a specific container. Luckily I don't think they cause a lot of writes but while I am in this deep might as well chase that rabbit as well.

     

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.