Jump to content
  • [6.8.3] docker image huge amount of unnecessary writes on cache


    S1dney
    • Urgent

    Hey Guys,

     

    First of all, I know that you're all very busy on getting version 6.8 out there, something I'm very much waiting on as well. I'm seeing great progress, so thanks so much for that! Furthermore I won't be expecting this to be on top of the priority list, but I'm hoping someone of the developers team is willing to invest (perhaps after the release).

     

    Hardware and software involved:

    2 x 1TB Samsung EVO 860, setup with LUKS encryption in BTRFS RAID1 pool.

     

    ###

    TLDR (but I'd suggest to read on anyway 😀)

    The image file mounted as a loop device is causing massive writes on the cache, potentially wearing out SSD's quite rapidly.

    This appears to be only happening on encrypted caches formatted with BTRFS (maybe only in RAID1 setup, but not sure).

    Hosting the Docker files directory on /mnt/cache instead of using the loopdevice seems to fix this problem.

    Possible idea for implementation proposed on the bottom.

     

    Grateful for any help provided!

    ###

     

    I have written a topic in the general support section (see link below), but I have done a lot of research lately and think I have gathered enough evidence pointing to a bug, I also was able to build (kind of) a workaround for my situation. More details below.

     

    So to see what was actually hammering on the cache I started doing all the obvious, like using a lot of find commands to trace files that were written to every few minutes and also used the fileactivity plugin. Neither was able trace down any writes that would explain 400 GBs worth of writes a day for just a few containers that aren't even that active.

     

    Digging further I moved the docker.img to /mnt/cach/system/docker/docker.img, so directly on the BTRFS RAID1 mountpoint. I wanted to check whether the unRAID FS layer was causing the loop2 device to write this heavy. No luck either.

    This gave me a situation I was able to reproduce on a virtual machine though, so I started with a recent Debian install (I know, it's not Slackware, but I had to start somewhere ☺️). I create some vDisks, encrypted them with LUKS, bundled them in a BTRFS RAID1 setup, created the loopdevice on the BTRFS mountpoint (same of /dev/cache) en mounted it on /var/lib/docker. I made sure I had to NoCow flags set on the IMG file like unRAID does. Strangely this did not show any excessive writes, iotop shows really healthy values for the same workload (I migrated the docker content over to the VM).

     

    After my Debian troubleshooting I went back over to the unRAID server, wondering whether the loopdevice is created weirdly, so I took the exact same steps to create a new image and pointed the settings from the GUI there. Still same write issues. 

     

    Finally I decided to put the whole image out of the equation and took the following steps:

    - Stopped docker from the WebGUI so unRAID would properly unmount the loop device.

    - Modified /etc/rc.d/rc.docker to not check whether /var/lib/docker was a mountpoint

    - Created a share on the cache for the docker files

    - Created a softlink from /mnt/cache/docker to /var/lib/docker

    - Started docker using "/etc/rd.d/rc.docker start"

    - Started my BItwarden containers.

     

    Looking into the stats with "iotstat -ao" I did not see any excessive writing taking place anymore.

    I had the containers running for like 3 hours and maybe got 1GB of writes total (note that on the loopdevice this gave me 2.5GB every 10 minutes!)

     

    Now don't get me wrong, I understand why the loopdevice was implemented. Dockerd is started with options to make it run with the BTRFS driver, and since the image file is formatted with the BTRFS filesystem this works at every setup, it doesn't even matter whether it runs on XFS, EXT4 or BTRFS and it will just work. I my case I had to point the softlink to /mnt/cache because pointing it /mnt/user would not allow me to start using the BTRFS driver (obviously the unRAID filesystem isn't BTRFS). Also the WebGUI has commands to scrub to filesystem inside the container, all is based on the assumption everyone is using docker on BTRFS (which of course they are because of the container 😁)

    I must say that my approach also broke when I changed something in the shares, certain services get a restart causing docker to be turned off for some reason. No big issue since it wasn't meant to be a long term solution, just to see whether the loopdevice was causing the issue, which I think my tests did point out.

     

    Now I'm at the point where I would definitely need some developer help, I'm currently keeping nearly all docker container off all day because 300/400GB worth of writes a day is just a BIG waste of expensive flash storage. Especially since I've pointed out that it's not needed at all. It does defeat the purpose of my NAS and SSD cache though since it's main purpose was hosting docker containers while allowing the HD's to spin down.

     

    Again, I'm hoping someone in the dev team acknowledges this problem and is willing to invest. I did got quite a few hits on the forums and reddit without someone actually pointed out the root cause of issue.

     

    I missing the technical know-how to troubleshoot the loopdevice issues on a lower level, but have been thinking on possible ways to implement a workaround. Like adjusting the Docker Settings page to switch off the use of a vDisk and if all requirements are met (pointing to /mnt/cache and BTRFS formatted) start docker on a share on the /mnt/cache partition instead of using the vDisk.

    In this way you would still keep all advantages of the docker.img file (cross filesystem type) and users who don't care about writes could still use it, but you'd be massively helping out others that are concerned over these writes.

     

    I'm not attaching diagnostic files since they would probably not point out the needed.

    Also if this should have been in feature requests, I'm sorry. But I feel that, since the solution is misbehaving in terms of writes, this could also be placed in the bugreport section.

     

    Thanks though for this great product, have been using it so far with a lot of joy! 

    I'm just hoping we can solve this one so I can keep all my dockers running without the cache wearing out quick,

     

    Cheers!

     

    • Like 2
    • Thanks 15


    User Feedback

    Recommended Comments



    One more data point here, a couple of weeks ago I spotted that my used Intel DC S3500 SSDs had dropped their SMART Media Wearout Indicator values from 95% to 60% in the year or so since I installed them.  Based on a 'host writes 32mib' value of 22,840,656, I think that's almost 700TBW.

     

    Looking at the writes column on the main Unraid UI page shows pretty constant writes around 15-20MB/s.  I have two drives in a BTRFS RAID 1 pool, no encryption.

    Share this comment


    Link to comment
    Share on other sites
    mf808

    Posted (edited)

    Do we have an update on this issue? 

     

    I stumbled on this yesterday and have been analyzing my docker containers behaviour and was able to pinpoint it to a handful of containers that cause the massive amount of writes. (2x 1TB WD Black Raid 1 btrfs)

    • pihole
    • unms
    • sonarr (latest linuxserver version)
    • hydra2 (latest linuxserver version)
    • Plex (offical version)

     

    After analyzing with iotop and checking the writes for each and every container I am running for 10 min each I calculated a TBW of ~120/year. (This from a ~10min sample size)

     

    What I did to mitigate:

    Using the Sonarr binhex version and Plex linuxserver somehow seems to have reduced the writes massively. I also stopped using hydra2 and will migrate Pihole and UNMS to a spare pi I have laying around.

    With these changes  I was able to reduce to ~2 TBW/year, which I think is way more acceptable.

     

    I have found a couple of reddit threads which describe the same problem. This is hitting everybody with their docker.img on SSDs. 

    What logging or other procedures would need 20-30MB/s worth of writes? This is totally unneccesary.

     

     

    Edited by mf808

    Share this comment


    Link to comment
    Share on other sites

    It seems i got hit with this one also.

    Cache is a 1TB crucial sata ssd, btfs, not encrypted afaik

    I seems to have 5MB/s+ writes constant on the ssd, after finding one of the culpirts (unifi controller) that was also writing a lot to appdata, now my 2nd process with most writes is the [loop2]...

    In a few minutes there's 500MB of writes from [loop2]...

    At this rate my ssd will probably die soon 😨

     

    My docker list maybe it can help find common dockers?

    linuxserver/sonarr
    linuxserver/qbittorrent
    nextcloud:latest
    gitlab/gitlab-runner
    gitlab/gitlab-ce
    linuxserver/tvheadend
    linuxserver/minisatip
    linuxserver/plex
    binhex/arch-krusader
    linuxserver/mariadb
    linuxserver/ombi
    linuxserver/bazarr
    linuxserver/radarr
    phpmyadmin/phpmyadmin
    didstopia/tvhproxy
    debian8/apcupsd-cgi

     

    Share this comment


    Link to comment
    Share on other sites

    I suspect I am also experiencing this issue.

     

    975642839_Annotation2020-05-04055430_LI.thumb.jpg.2347ec330133c18bfec2ee5134b66bc7.jpg

     

    The iotop screenshot is from a 2 hour period where the server was idling for most of the time.

    ~11GB from loop2 in 2 hours....

    2 Samsung 500gb Evo SSD in BTRFS pool, no encryption.

     

    iotop_7AM-9AM 2020-05-04 090045.png

     

     

    ****update****

    8 hours later it looks like almost 900gb in writes? I hope I am interpreting this incorrectly?

     

    I need to fix this ASAP otherwise these SSDs will be cooked by the end of the month.

     

    Update_Annotation 2020-05-04 144126_LI.jpg

     

    Edited by nas_nerd
    updated

    Share this comment


    Link to comment
    Share on other sites
    14 hours ago, mf808 said:

    pinpoint it to a handful of containers

    I have the same issue and not using a single one of these containers. Even with all my containers turned of I see the same 3-5mb/s writes to the cache. The only thing helps is to completely dissable docker to stop it.

    Share this comment


    Link to comment
    Share on other sites

    Another update.

     

    I stopped all my docker containers overnight (but docker was still enabled), and I barely had any writes to the cache.

     

    This to me suggests potentially a rogue docker application, or having dockers running is causing an issue.

     

    More testing is required on my behalf.

    Share this comment


    Link to comment
    Share on other sites

    Been following this thread as I believe I'm also having the issue. Just wanted to list what I've come across in the off chance this is of any use to anyone else. 

     

    I stopped all the dockers containers leaving docker still enabled like @nas_nerd did. There were no writes of any sort to the cache drive overnight while all the docker containers were stopped.

     

    Docker Containers

    binhex nzbHydra2

    binhex plexpass

    binhex radarr

    binhex sonarr

    binhex sabnzbdvpn

     

    Process I followed

    I rebooted unraid, after all the dockers above were up and running (as I forgot I had them set to auto-start), I had around 30,000 writes to cache. I stopped them all one-by-one, and the writes stopped around 39,000. All docker containers were stopped overnight and no writes occurred.  The next morning I enabled the dockers listed above, ~ 44 hours ago, and I'm now sitting at 1,220,283 writes.

     

    My next step is to stop all docker containers, disable auto-start, reboot unraid, then enable one docker at a time (without actually using them) and monitor the number of writes to the cache to see if I can find an offending docker container.

    Share this comment


    Link to comment
    Share on other sites

    For me the official Plex container was the largest offender. That and my Windows 10 VM rack up the most writes.

     

    I switched to the linuxserver Plex container and that reduced writes by quite a lot. Not much I can do about the VM other than move it’s drive images.

    Share this comment


    Link to comment
    Share on other sites

    running 6.8.3, and official plex container exhibits this problem

    Share this comment


    Link to comment
    Share on other sites
    Odessa

    Posted (edited)

    Running 6.8.3, cache is btrfs - getting between 5-20 MB/s constant writes to my SSD for no apparent reason, with temperature warnings. Running Official Plex docker, some common Binhex media dockers. How can we escalate this to critical since it is potentially causing actual hardware damage?

    Edited by Odessa
    • Like 1

    Share this comment


    Link to comment
    Share on other sites

    Yeah, i don't get it why this is only minor.. if my brand new ssd dies by the end of the year because of this i will be pissed!

    • Like 2

    Share this comment


    Link to comment
    Share on other sites

    I believe this issue is much more wide spread than it appears - I found this on the unraid subreddit and decided to poke around my server. Currently loop2 is writing over 2gb in under 10 minutes to my unencrypted BTRFS cache pool. 

     

    Unraid: 6.8.3

     

    Added a new samsung 860 1tb ssd to my btrfs pool 4 months ago:

    22.01 TB (47269069408)

    3383 (4m, 18d, 23h)

     

    I'd rather not have to run XFS and or modify unraid beyond what is supported. Hopefully we can get a official update on this and or a fix soon as this is causing excessive writes to my ssds - thus reducing their life and possibly causing unforeseen damage. 

     

    Referenced subreddit post: 

    https://www.reddit.com/r/unRAID/comments/ggbvgv/unraid_is_unusable_for_me_because_of_the_docker/

     

    Edited by beneath
    • Like 2

    Share this comment


    Link to comment
    Share on other sites

    I have the same issue and testing with all dockers stopped, loop2 by itself would still be writing data at 5-15MB/s in iotop to my single unencrypted BTRFS cache SSD. Tried converting my cache drive xfs and now it's down to 20MB over the past 10 minutes with no dockers running and 100MB over 10 minutes with all my dockers up (binhex sonarr, radarr, tautulli, sabnzbd, deluge, ombi, pihole, nextcloud). Huge improvements with XFS over BTRFS though still a problem when there is really no usage in any of those dockers.

     

    My month and half old cache SSD is already at 66TBW (of the 640TBW my manufacturer rates the drive for) before I noticed this Can devs look at this as an urgent instead of minor issue? Probably cratered a lot of peoples SSDs already. 

    Share this comment


    Link to comment
    Share on other sites

    Someone should really make a PSA for this. I purchased a brand new ssd in January, 163 TBW on it now. Like others have mentioned earlier, iotop shows loop2 is constantly writing to the disk. Using df -kh shows /var/lib/docker is mounted on loop2.

     

    I have a 5 disk btrfs encrypted array with a non-encrypted btrfs cache disk (Samsung EVO 860 1TB). Running several dockers, mainly for web hosting (traefik, cloudflare, organizr, etc.) and data storage (ms SQL Server, influxdb). No VMs.


    Below is a slightly modified/simplified version of a script to calculate drive TBW and health %. Source here

    #!/bin/bash
    
    ### replace sdg below with label of drive you want TBW calculated for  ###
    device=/dev/sdg
    
    sudo smartctl -A $device |awk '
    $0 ~ /Power_On_Hours/ { poh=$10; printf "%s / %d hours / %d days / %.2f years\n",  $2, $10, $10 / 24, $10 / 24 / 365.25 }
    $0 ~ /Total_LBAs_Written/ {
       lbas=$10;
       bytes=$10 * 512;
       mb= bytes / 1024^2;
       gb= bytes / 1024^3;
       tb= bytes / 1024^4;
       #printf "%s / %s  / %d mb / %.1f gb / %.3f tb\n", $2, $10, mb, gb, tb
         printf "%s / %.2f gb / %.2f tb\n", $2, gb, tb
       printf "mean writes per hour:  / %.3f gb / %.3f tb",  gb/poh, tb/poh
    }
    $0 ~ /Wear_Leveling_Count/ { printf "%s / %d (%% health)\n", $2, int($4) }
    ' |
       sed -e 's:/:@:' |
       sed -e "s\$^\$$device @ \$" |
       column -ts@

     

    • Like 1

    Share this comment


    Link to comment
    Share on other sites

    That script(well, smartctl) gave me wrong numbers, my ssd reports that the power on hours are only 349 (14days), that's not correct, i bought it new when i installed unraid on 14 march, and has been running 24/7 since then (uptime ~57days).

    Recalculating manually and assuming the total LBAs written is not wrong... 48905653011 LBA=22.77 TB

    perDay = 22.77 TB/57days = 0.40 TB/day = 400 GB/day

    perHour = 0.40 TB/24h = 0.017 TB/h = 17 GB/hour

    It seems i'm not getting hit that hard as i thought i was.. still think it's a bit on the high side tho.

    Share this comment


    Link to comment
    Share on other sites
    woble

    Posted (edited)

    Unraid 6.8.3

    Cache: Crucial MX500 2TB SSD - BTRFS w/o encryption

     

    Just want to chip in and say that I have similar issue, write on the cache drive hovers between 5-10MB/s constantly. `iotop` reports huge amount of writes to `loop2` for no reason, or so it seems.

     

    I disabled all active dockers and started enabling them one by one to see how it affects `loop2`. Out of all dockers that I have, `nginx-proxy-manager` seems to have the most effect on it. Without it `loop2` writes around 100MB/min and the cache drive in the UI shows as low as few KB/s or even 0 for writes, which is for a total of 14 dockers, arguably some of them aren't that heavy to begin with or don't do much IO in the first place. Those 100M might be related to the dockerd logs which are reported as `dockerd -p /var/run/dockerd.pid --log-opt max-size=10m --log-opt max-file=1 --storage-driver=btrfs --log-level=error` in `iotop`, althogh these report less than 1MB/min. Then there are `shfs /mnt/user -disks 4095 2048000000 -o noatime,allow_other -o remember=330` processes which report 2-30MB/min. With `nginx-proxy-manager` docker enabled (plus all the other dockers), the write to `loop2` jumps to around 400MB/min, the dockerd log processes go to around 2-3MB/min, and shfs ones jump to 70-90MB/min each.

     

    I used to have 6 Crucial SSDs in RAID10 for cache, which ran for about 2 years. Upon inspecting them with Crucial tool, all of them reported around 260TBW which is crazy high for 2 years of really not that intensive load.

     

    I've seen people mention `pihole` and `nzbhydra2`, which I also run, but they don't seem to affect it overall as much as `nginx-proxy-manager` does.

    Edited by woble

    Share this comment


    Link to comment
    Share on other sites

    I'm seeing this behavior, too. New unraid build (6.8.3), with two nvme drives as a cache pool formatted btrfs without encryption. Numerous docker containers (all the fun stuff -- plex, sonarr, radarr, grafana, telegraf, bitwarden, etc.) iotop shows a huge amount of write activity from loop2 (Gb after just a few minutes of watching). I removed the cache pool, removed one of the drives, and formatted one of the nvme drives as xfs to use a a single cache drive, brought everything back online again, and now the i/o is at what I would consider normal levels (a few megabytes in a few minutes).

     

    It would be great to have some resolution to this bug, since my cache is now unprotected, which makes me uncomfortable. Right now I have to make a choice between having an unprotected cache instance, or thrashing my 1Tb nvme drives....

    Share this comment


    Link to comment
    Share on other sites

    Can we update the title of this report to [6.8.3], since it's still happening with this latest version? And I would personally consider this to be more severe than a "minor" bug -- I think it fits the category of "urgent" since it potentially leads to data loss if a cache pool is not a viable option.

    Share this comment


    Link to comment
    Share on other sites

    Update from my end:

     

    Converted my 500GB SSD BTRFS cache pool to a single XFS 500GB cache.

     

    Writes to the cache have now dropped significantly. I am running the exact same dockers as previously.

     

    This suggests a BTRFS + Docker combination is contributing to this excessive write problem.

     

    Unfortunately now my cache is unprotected and I have a spare 500gb SSD (I'm sure I'll find a use for this :))

     

    I agree with a few comments about this issue/bug being more significant than "minor".

    Share this comment


    Link to comment
    Share on other sites
    5 hours ago, grigsby said:

    Can we update the title of this report to [6.8.3], since it's still happening with this latest version? And I would personally consider this to be more severe than a "minor" bug -- I think it fits the category of "urgent" since it potentially leads to data loss if a cache pool is not a viable option.

    I definately agree that bug deserves moving to "urgent". Many users are more than likely affected and not knowing that they are burning through their SSD's. Without reading that reddit thread referenced above, I would of been one of them too. 

     

    I keep weekly backups outside of my unraid server - but I know many users don't have that luxury. Would hate to see a perfect "storm" and see potental dataloss. 

    Edited by beneath
    • Like 2

    Share this comment


    Link to comment
    Share on other sites
    S1dney

    Posted (edited)

    Changed Priority to Urgent

     

    >>

     

    Since I noticed this thread getting more and more attention lately, and more and more people urging it to be urgent instead of minor, I'll raise priority on this one.

     

    Just an FYI, I made/kept it minor initially cause I had a workable workaround that I felt satisfied with. If the Command Line Interface isn't really your thing or you have any other reason to not tweak the OS in an unsupported way I can fully understand this frustration.

     

    In the end... The community decides priority.

     

    Also updated the title to version 6.8.3 as requested.

     

    Cheers

    Edited by S1dney
    • Like 1

    Share this comment


    Link to comment
    Share on other sites
    27 minutes ago, S1dney said:

    Changed Priority to Urgent

     

    >>

     

    Since I noticed this thread getting more and more attention lately, and more and more people urging it to be urgent instead of minor, I'll raise priority on this one.

     

    Just an FYI, I made/kept it minor initially cause I had a workable workaround that I felt satisfied with. If the Command Line Interface isn't really your thing or you have any other reason to not tweak the OS in an unsupported way I can fully understand this frustration.

     

    In the end... The community decides priority.

     

    Also updated the title to version 6.8.3 as requested.

     

    Cheers

    This actually points out that we could do with another intermediate category called something like “Major” meaning it is very important but is not actually stopping the server working or directly causing data loss.    I would then put this into the “Major” category rather than “Urgent”.   I certainly agree it needs to be more than “Minor”.

    Share this comment


    Link to comment
    Share on other sites
    1 hour ago, itimpi said:

    This actually points out that we could do with another intermediate category called something like “Major” meaning it is very important but is not actually stopping the server working or directly causing data loss.    I would then put this into the “Major” category rather than “Urgent”.   I certainly agree it needs to be more than “Minor”.

    Agreed!

    "Urgent" might be to generic in that it sums up "Server crash", "data loss" and "showstopper" under one caller.

    For now it seems to be a showstopper for a bunch of people so it's still accurate.

    If a new category is created, let me know and I'll adjust 👍

    Share this comment


    Link to comment
    Share on other sites

    The priority "Urgent" means something is seriously wrong and prevents the system from working normally.

     

    This is not really the case here...

     

    The priority "Minor" may sound as insignificant, but it does mean Limetech is looking into the issue and address it as appropriate.

    Share this comment


    Link to comment
    Share on other sites
    2 minutes ago, bonienl said:

    The priority "Urgent" means something is seriously wrong and prevents the system from working normally.

     

    This is not really the case here...

     

    The priority "Minor" may sound as insignificant, but it does mean Limetech is looking into the issue and address it as appropriate.

    I strongly disagree with this. My new purchase of a $550 cache SSD which would have lasted 10+ years with my workload is now at 253TBW out of the warranted 300 after 1 year. How this can be seen as system working normally is frustrating to me.

    • Like 2

    Share this comment


    Link to comment
    Share on other sites



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.