• [6.8.3] docker image huge amount of unnecessary writes on cache


    S1dney
    • Solved Urgent

    EDIT (March 9th 2021):

    Solved in 6.9 and up. Reformatting the cache to new partition alignment and hosting docker directly on a cache-only directory brought writes down to a bare minimum.

     

    ###

     

    Hey Guys,

     

    First of all, I know that you're all very busy on getting version 6.8 out there, something I'm very much waiting on as well. I'm seeing great progress, so thanks so much for that! Furthermore I won't be expecting this to be on top of the priority list, but I'm hoping someone of the developers team is willing to invest (perhaps after the release).

     

    Hardware and software involved:

    2 x 1TB Samsung EVO 860, setup with LUKS encryption in BTRFS RAID1 pool.

     

    ###

    TLDR (but I'd suggest to read on anyway 😀)

    The image file mounted as a loop device is causing massive writes on the cache, potentially wearing out SSD's quite rapidly.

    This appears to be only happening on encrypted caches formatted with BTRFS (maybe only in RAID1 setup, but not sure).

    Hosting the Docker files directory on /mnt/cache instead of using the loopdevice seems to fix this problem.

    Possible idea for implementation proposed on the bottom.

     

    Grateful for any help provided!

    ###

     

    I have written a topic in the general support section (see link below), but I have done a lot of research lately and think I have gathered enough evidence pointing to a bug, I also was able to build (kind of) a workaround for my situation. More details below.

     

    So to see what was actually hammering on the cache I started doing all the obvious, like using a lot of find commands to trace files that were written to every few minutes and also used the fileactivity plugin. Neither was able trace down any writes that would explain 400 GBs worth of writes a day for just a few containers that aren't even that active.

     

    Digging further I moved the docker.img to /mnt/cach/system/docker/docker.img, so directly on the BTRFS RAID1 mountpoint. I wanted to check whether the unRAID FS layer was causing the loop2 device to write this heavy. No luck either.

    This gave me a situation I was able to reproduce on a virtual machine though, so I started with a recent Debian install (I know, it's not Slackware, but I had to start somewhere ☺️). I create some vDisks, encrypted them with LUKS, bundled them in a BTRFS RAID1 setup, created the loopdevice on the BTRFS mountpoint (same of /dev/cache) en mounted it on /var/lib/docker. I made sure I had to NoCow flags set on the IMG file like unRAID does. Strangely this did not show any excessive writes, iotop shows really healthy values for the same workload (I migrated the docker content over to the VM).

     

    After my Debian troubleshooting I went back over to the unRAID server, wondering whether the loopdevice is created weirdly, so I took the exact same steps to create a new image and pointed the settings from the GUI there. Still same write issues. 

     

    Finally I decided to put the whole image out of the equation and took the following steps:

    - Stopped docker from the WebGUI so unRAID would properly unmount the loop device.

    - Modified /etc/rc.d/rc.docker to not check whether /var/lib/docker was a mountpoint

    - Created a share on the cache for the docker files

    - Created a softlink from /mnt/cache/docker to /var/lib/docker

    - Started docker using "/etc/rd.d/rc.docker start"

    - Started my BItwarden containers.

     

    Looking into the stats with "iotstat -ao" I did not see any excessive writing taking place anymore.

    I had the containers running for like 3 hours and maybe got 1GB of writes total (note that on the loopdevice this gave me 2.5GB every 10 minutes!)

     

    Now don't get me wrong, I understand why the loopdevice was implemented. Dockerd is started with options to make it run with the BTRFS driver, and since the image file is formatted with the BTRFS filesystem this works at every setup, it doesn't even matter whether it runs on XFS, EXT4 or BTRFS and it will just work. I my case I had to point the softlink to /mnt/cache because pointing it /mnt/user would not allow me to start using the BTRFS driver (obviously the unRAID filesystem isn't BTRFS). Also the WebGUI has commands to scrub to filesystem inside the container, all is based on the assumption everyone is using docker on BTRFS (which of course they are because of the container 😁)

    I must say that my approach also broke when I changed something in the shares, certain services get a restart causing docker to be turned off for some reason. No big issue since it wasn't meant to be a long term solution, just to see whether the loopdevice was causing the issue, which I think my tests did point out.

     

    Now I'm at the point where I would definitely need some developer help, I'm currently keeping nearly all docker container off all day because 300/400GB worth of writes a day is just a BIG waste of expensive flash storage. Especially since I've pointed out that it's not needed at all. It does defeat the purpose of my NAS and SSD cache though since it's main purpose was hosting docker containers while allowing the HD's to spin down.

     

    Again, I'm hoping someone in the dev team acknowledges this problem and is willing to invest. I did got quite a few hits on the forums and reddit without someone actually pointed out the root cause of issue.

     

    I missing the technical know-how to troubleshoot the loopdevice issues on a lower level, but have been thinking on possible ways to implement a workaround. Like adjusting the Docker Settings page to switch off the use of a vDisk and if all requirements are met (pointing to /mnt/cache and BTRFS formatted) start docker on a share on the /mnt/cache partition instead of using the vDisk.

    In this way you would still keep all advantages of the docker.img file (cross filesystem type) and users who don't care about writes could still use it, but you'd be massively helping out others that are concerned over these writes.

     

    I'm not attaching diagnostic files since they would probably not point out the needed.

    Also if this should have been in feature requests, I'm sorry. But I feel that, since the solution is misbehaving in terms of writes, this could also be placed in the bugreport section.

     

    Thanks though for this great product, have been using it so far with a lot of joy! 

    I'm just hoping we can solve this one so I can keep all my dockers running without the cache wearing out quick,

     

    Cheers!

     

    • Like 3
    • Thanks 16



    User Feedback

    Recommended Comments



    ok, solved the Ghost container issue in 2x ways - 

    1x was caused by a bad theme - changing to 'stock' removed the error and replaced it with normal info messages. Still a lot of them, but no error anymore.

    Workaround was to move the log folder in the container into memory (/tmp).

    Link to comment

    By default /tmp still writes to disk unless you added the command to mount a ramdisk, which I assume you did since you say it is fixed.

    Edited by TexasUnraid
    Link to comment

    Not sure that thats accurate - most documentation on the topic lists /tmp as being a default ramdisk for the system.

    Link to comment
    21 minutes ago, boomam said:

    Not sure that thats accurate - most documentation on the topic lists /tmp as being a default ramdisk for the system.

    /tmp on the host is RAM. /tmp in the container is in the docker image file, unless mapped.

    Link to comment
    9 minutes ago, JonathanM said:

    /tmp on the host is RAM. /tmp in the container is in the docker image file, unless mapped.

     

    ^That.

     

    On linux you are correct that /tmp is a ramdrive by default.

     

    Inside docker on the other hand it is just part of the file system by default.

     

    If you read through the docker writes guide I put together it gives you the command to mount a ramdisk at /tmp inside the container. You simply add the command to the extra commands section of the docker profile.

    Edited by TexasUnraid
    Link to comment

    Mapping /tmp from the host to wherever in the container places said mapping into RAM.

    That is what I did to fix the Ghost container, and is common practice for media containers too.

    Link to comment
    40 minutes ago, boomam said:

    Mapping /tmp from the host to wherever in the container places said mapping into RAM.

    That is what I did to fix the Ghost container, and is common practice for media containers too.

     

    While this works, it is not the proper way to do it as it does not keep everything contained inside the container.

     

    The proper method is like was explained in the guide here:

     

    Simply add this to the Extra Parameters of the container settings and it will create a ramdrive in the container mounted at /tmp.

     

    --mount type=tmpfs,destination=/tmp,tmpfs-size=256000000

     

    This prevents the container from mistakenly causing issues with the host system and can also limit the max size of that containers /tmp data. The last argument is the size, that is 256MB in the example.

     

    Either way will reach the same result of causing the writes to go to ram. The above is just the native method for docker.

    Edited by TexasUnraid
    • Like 1
    Link to comment

    Late to the party here, but I wanted to share a script I tweaked to monitor NVME TBW/GBW/health. Requires nvme-cli from Nerd Pack.


     

    #!/bin/bash
    #arrayStarted=true
    
    # Ref:
    # https://forums.unraid.net/bug-reports/stable-releases/683-docker-image-huge-amount-of-unnecessary-writes-on-cache-r733/page/16/?tab=comments#comment-9945
    
    # SSDs
    drives=(
      "nvme0n1"
      "nvme1n1"
    )
    for drive in ${drives[*]}
    do
        device=/dev/$drive
    
        TBWSDB_TB=$(nvme smart-log $device | awk '/data_units_written/{ gsub(",", "") ; print $3 * 512000 / 1e+12 }')
        TBWSDB_GB=$(nvme smart-log $device | awk '/data_units_written/{ gsub(",", "") ; print $3 * 512000 / 1e+9 }')
        PWRON_DAYS=$(nvme smart-log $device | awk '/power_on_hours/{ print $3 / 24}')
        HEALTH=$(nvme smart-log $device | awk '/percentage_used/{ print $3 }')
    
        # Set permissions and clear previous average
        touch /mnt/user/logs/"$drive"_health.log
        chmod a+rw /mnt/user/logs/"$drive"_health.log
        sed -i '/Average writes/d' /mnt/user/logs/"$drive"_health.log
        echo "TBW on $(date +"%d-%m-%Y %H:%M:%S") --> $TBWSDB_TB TB, which is $TBWSDB_GB GB -- Health (% used, higher is worse): $HEALTH" >> /mnt/user/logs/"$drive"_health.log
        echo "$TBWSDB_GB $PWRON_DAYS" | awk '{printf "Average writes/day:  %.2fGB\n", $1/$2}' >> /mnt/user/logs/"$drive"_health.log
    done

     

    Link to comment
    On 8/20/2021 at 8:14 PM, TexasUnraid said:

     

    While this works, it is not the proper way to do it as it does not keep everything contained inside the container.

    Containers should be immutable in all ways, that's the point of them ;-)
     

    On 11/5/2021 at 11:28 AM, kindofblue42 said:

    Late to the party here, but I wanted to share a script I tweaked to monitor NVME TBW/GBW/health. Requires nvme-cli from Nerd Pack.

    What does it specifically show as different to usual?

    Link to comment

    Depends on what all you have running and what you are doing. Overall though, assuming you have dockers running and use the cache for file transfers, I would not be worried about that.

     

    I do notice the 15c temps, you have it in the garage lol?

    Link to comment
    On 1/22/2022 at 3:35 PM, TexasUnraid said:

    Depends on what all you have running and what you are doing. Overall though, assuming you have dockers running and use the cache for file transfers, I would not be worried about that.

     

    I do notice the 15c temps, you have it in the garage lol?

     

    Not much running actually. It's near the garage. 😁

    Capture.PNG

    Link to comment

    Plex is something you need to watch out for, it can really hammer the cache, I don't use it personally but others can help you with that.

     

    I have a server in the garage and even in 20 degree weather I hardly see those kinds of temps lol.

     

    I did read a paper from google that said keeping drives too cold actually caused them to fail sooner. The sweet spot seemed to be around 30-35c. I try to keep my drives in that range if possible.

    Link to comment

    The amount of writes reported in the GUI is basically meaningless, it can vary wildly with the device/controller used, you need to check the SSD SMART report and then again after 24H to see the actual writes.

    Link to comment
    On 8/21/2021 at 2:14 AM, TexasUnraid said:

     

    While this works, it is not the proper way to do it as it does not keep everything contained inside the container.

     

    The proper method is like was explained in the guide here:

     

    Simply add this to the Extra Parameters of the container settings and it will create a ramdrive in the container mounted at /tmp.

     

    --mount type=tmpfs,destination=/tmp,tmpfs-size=256000000

     

    This prevents the container from mistakenly causing issues with the host system and can also limit the max size of that containers /tmp data. The last argument is the size, that is 256MB in the example.

     

    Either way will reach the same result of causing the writes to go to ram. The above is just the native method for docker.

     

     

    do you have this for normal intel ssd as well?

    Link to comment
    5 hours ago, furian said:

     

     

    do you have this for normal intel ssd as well?

     

    Not sure I understand your question, that quoted post does not matter what brand SSD you use.

     

    If you are referring to the script I posted to log SSD writes, I do not own any intel SSD's personally so I don't know what the multiplier is.

     

    One of the existing multipliers might work, if not then you will need to either do some research or testing to figure it out (get a file of a known size, copy to drive, see what writes increased in the smart log and do math to figure out the multiplier, generally takes a few different sized files in the 1-10gb range to narrow in on it).

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.