• [6.8.3] docker image huge amount of unnecessary writes on cache


    S1dney
    • Solved Urgent

    EDIT (March 9th 2021):

    Solved in 6.9 and up. Reformatting the cache to new partition alignment and hosting docker directly on a cache-only directory brought writes down to a bare minimum.

     

    ###

     

    Hey Guys,

     

    First of all, I know that you're all very busy on getting version 6.8 out there, something I'm very much waiting on as well. I'm seeing great progress, so thanks so much for that! Furthermore I won't be expecting this to be on top of the priority list, but I'm hoping someone of the developers team is willing to invest (perhaps after the release).

     

    Hardware and software involved:

    2 x 1TB Samsung EVO 860, setup with LUKS encryption in BTRFS RAID1 pool.

     

    ###

    TLDR (but I'd suggest to read on anyway 😀)

    The image file mounted as a loop device is causing massive writes on the cache, potentially wearing out SSD's quite rapidly.

    This appears to be only happening on encrypted caches formatted with BTRFS (maybe only in RAID1 setup, but not sure).

    Hosting the Docker files directory on /mnt/cache instead of using the loopdevice seems to fix this problem.

    Possible idea for implementation proposed on the bottom.

     

    Grateful for any help provided!

    ###

     

    I have written a topic in the general support section (see link below), but I have done a lot of research lately and think I have gathered enough evidence pointing to a bug, I also was able to build (kind of) a workaround for my situation. More details below.

     

    So to see what was actually hammering on the cache I started doing all the obvious, like using a lot of find commands to trace files that were written to every few minutes and also used the fileactivity plugin. Neither was able trace down any writes that would explain 400 GBs worth of writes a day for just a few containers that aren't even that active.

     

    Digging further I moved the docker.img to /mnt/cach/system/docker/docker.img, so directly on the BTRFS RAID1 mountpoint. I wanted to check whether the unRAID FS layer was causing the loop2 device to write this heavy. No luck either.

    This gave me a situation I was able to reproduce on a virtual machine though, so I started with a recent Debian install (I know, it's not Slackware, but I had to start somewhere ☺️). I create some vDisks, encrypted them with LUKS, bundled them in a BTRFS RAID1 setup, created the loopdevice on the BTRFS mountpoint (same of /dev/cache) en mounted it on /var/lib/docker. I made sure I had to NoCow flags set on the IMG file like unRAID does. Strangely this did not show any excessive writes, iotop shows really healthy values for the same workload (I migrated the docker content over to the VM).

     

    After my Debian troubleshooting I went back over to the unRAID server, wondering whether the loopdevice is created weirdly, so I took the exact same steps to create a new image and pointed the settings from the GUI there. Still same write issues. 

     

    Finally I decided to put the whole image out of the equation and took the following steps:

    - Stopped docker from the WebGUI so unRAID would properly unmount the loop device.

    - Modified /etc/rc.d/rc.docker to not check whether /var/lib/docker was a mountpoint

    - Created a share on the cache for the docker files

    - Created a softlink from /mnt/cache/docker to /var/lib/docker

    - Started docker using "/etc/rd.d/rc.docker start"

    - Started my BItwarden containers.

     

    Looking into the stats with "iotstat -ao" I did not see any excessive writing taking place anymore.

    I had the containers running for like 3 hours and maybe got 1GB of writes total (note that on the loopdevice this gave me 2.5GB every 10 minutes!)

     

    Now don't get me wrong, I understand why the loopdevice was implemented. Dockerd is started with options to make it run with the BTRFS driver, and since the image file is formatted with the BTRFS filesystem this works at every setup, it doesn't even matter whether it runs on XFS, EXT4 or BTRFS and it will just work. I my case I had to point the softlink to /mnt/cache because pointing it /mnt/user would not allow me to start using the BTRFS driver (obviously the unRAID filesystem isn't BTRFS). Also the WebGUI has commands to scrub to filesystem inside the container, all is based on the assumption everyone is using docker on BTRFS (which of course they are because of the container 😁)

    I must say that my approach also broke when I changed something in the shares, certain services get a restart causing docker to be turned off for some reason. No big issue since it wasn't meant to be a long term solution, just to see whether the loopdevice was causing the issue, which I think my tests did point out.

     

    Now I'm at the point where I would definitely need some developer help, I'm currently keeping nearly all docker container off all day because 300/400GB worth of writes a day is just a BIG waste of expensive flash storage. Especially since I've pointed out that it's not needed at all. It does defeat the purpose of my NAS and SSD cache though since it's main purpose was hosting docker containers while allowing the HD's to spin down.

     

    Again, I'm hoping someone in the dev team acknowledges this problem and is willing to invest. I did got quite a few hits on the forums and reddit without someone actually pointed out the root cause of issue.

     

    I missing the technical know-how to troubleshoot the loopdevice issues on a lower level, but have been thinking on possible ways to implement a workaround. Like adjusting the Docker Settings page to switch off the use of a vDisk and if all requirements are met (pointing to /mnt/cache and BTRFS formatted) start docker on a share on the /mnt/cache partition instead of using the vDisk.

    In this way you would still keep all advantages of the docker.img file (cross filesystem type) and users who don't care about writes could still use it, but you'd be massively helping out others that are concerned over these writes.

     

    I'm not attaching diagnostic files since they would probably not point out the needed.

    Also if this should have been in feature requests, I'm sorry. But I feel that, since the solution is misbehaving in terms of writes, this could also be placed in the bugreport section.

     

    Thanks though for this great product, have been using it so far with a lot of joy! 

    I'm just hoping we can solve this one so I can keep all my dockers running without the cache wearing out quick,

     

    Cheers!

     

    • Like 3
    • Thanks 17



    User Feedback

    Recommended Comments



    Ok, still poking things to see what happens. I am sure others have already done most of this testing but didn't have time to read the whole thread.

     

    So while moving the docker image to an XFS formatted array drive (decided to use unbalance instead of mover so I can watch the progress) dockers were disabled obviously. With the dockers disabled entirely, I noticed zero writes to btrfs-transacti or loop after 5 minutes as you would except. In fact I saw no writes of any kind, exactly as I would expect.

     

    When I tried to start docker after moving it to XFS array drive, it would not start, so I restarted the server. It then started up ok.

     

    So now watching the writes with docker on an XFS array drive. After 10 mins I am at 50mb writes to loop2 and 120mb writes to btrfs-transacti, which netdata shows is indeed still being written to the cache. That is still over 5TB a year to the cache for no reason at all.

     

    By increasing the dirty writeback time to 3 mins it drops the writes in 10 minutes to 35mb for loop2 and 70mb for btrfs-transacti. Not nearly as much of a drop as when docker was on the cache but still noticeable.

     

    This is confirmed by the writes increasing on the cache in the main view. In 45 mins of up time I have about 7500 writes on both the array drive with the docker and my cache pool.

     

    BTW, what does each "write" represent in the main view?

     

    If I disable the docker service, then btrfs-transacti doesn't even show up.

     

    Anyone that has gone the path of mapping the docker image directly without loop 2, does btrfs-transacti still cause writes to the cache? I might have to go that route if the official fix is months away.

     

    Was checking the smart status of my SSD's and in just over a week since installing unraid they all ticked down 1 of the lifetime in the smart stats. Took me well over a year to do that when using windows. Sure glad I caught it now.

    Edited by TexasUnraid
    • Like 1
    Link to comment
    28 minutes ago, TexasUnraid said:

    Anyone that has gone the path of mapping the docker image directly without loop 2, does btrfs-transacti still cause writes to the cache? I might have to go that route if the official fix is months away.

    There you go!

    10 minutes into "iotop -ao", btrfs-transacti produced nearly 60MB of writes.

    My /var/lib/docker is symlinked to /mnt/cache/docker, so all writes that should go into the image, go straightly on the btrfs mountpoint.

     

    Note that running iotop initially gave me Gigabytes of data,

    After several months I'm still one happy redundant btrfs camper 😁

     

    Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
    Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
      TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
    25615 be/4 root          0.00 B     13.98 M  0.00 %  0.25 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    25612 be/4 root          0.00 B     12.67 M  0.00 %  0.23 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    23465 be/4 root          0.00 B     12.51 M  0.00 %  0.21 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    24974 be/4 root          0.00 B     11.56 M  0.00 %  0.21 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    25613 be/4 root          0.00 B      8.93 M  0.00 %  0.16 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    23449 be/4 root          0.00 B      8.39 M  0.00 %  0.15 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    24682 be/4 root          0.00 B      7.35 M  0.00 %  0.12 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    25959 be/4 root          0.00 B      7.48 M  0.00 %  0.12 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    23468 be/4 root          0.00 B      6.84 M  0.00 %  0.12 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
     6192 be/4 root          0.00 B      6.37 M  0.00 %  0.12 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    25616 be/4 root          0.00 B      4.96 M  0.00 %  0.10 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    16693 be/4 65534         0.00 B      6.95 M  0.00 %  0.09 % sqlservr
    23464 be/4 root          0.00 B      4.44 M  0.00 %  0.09 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
    23441 be/4 root          0.00 B      2.51 M  0.00 %  0.05 % dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --storage-driver=btrfs --log-level=error
     5523 be/4 root          0.00 B     58.58 M  0.00 %  0.04 % [btrfs-transacti]
    26247 be/4 root          0.00 B      2.19 M  0.00 %  0.03 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
    27636 be/4 root          0.00 B   1952.00 K  0.00 %  0.03 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
    25702 be/4 root          0.00 B      2.00 M  0.00 %  0.03 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
    26773 be/4 root          0.00 B   1460.00 K  0.00 %  0.02 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
    29889 be/4 root          0.00 B   1272.00 K  0.00 %  0.02 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
     7181 be/4 root          0.00 B   1192.00 K  0.00 %  0.02 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
    30421 be/4 root          0.00 B    992.00 K  0.00 %  0.01 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
    14571 be/4 root          0.00 B      2.86 M  0.00 %  0.01 % [kworker/u8:1-btrfs-endio-write]
    26949 be/4 root          0.00 B      2.88 M  0.00 %  0.01 % [kworker/u8:4-btrfs-endio-write]
    12185 be/4 root          0.00 B   1600.00 K  0.00 %  0.01 % [kworker/u8:2-bond0]
    32373 be/4 root          0.00 B   2016.00 K  0.00 %  0.01 % [kworker/u8:5-btrfs-endio-write]
    24913 be/4 root          0.00 B    588.00 K  0.00 %  0.01 % containerd --config /var/run/docker/containerd/containerd.toml --log-level error
    30128 be/4 root          0.00 B      2.38 M  0.00 %  0.01 % [kworker/u8:3-btrfs-endio-write]

     

    Link to comment

    Very interesting, thanks for the info.

     

    So the btrfs-transacti is doing about the same amount of writes with the symlink as putting the docker on an xfs drive. Yet with docker disabled it does nothing at all. Not thrilled with the idea of my SSD's being eaten away when they should be doing nothing at all. They could be with me for a good long time.

     

    While it is a heck of a lot better then the gigs I was also getting in 10 minutes, it is still over 3TB of writes per year for no reason and will really chew threw SSD's. Also not a lot of reason to possibly mess up the unraid config for when a real fix is released vs just leaving the docker on the Array HDD.

     

    Really wish there was a time frame for when this would be fixed. I am still on the trial and don't really feel comfortable buying the license with this issue active.

    Edited by TexasUnraid
    Link to comment
    7 minutes ago, TexasUnraid said:

    Very interesting, thanks for the info.

     

    So the btrfs-transacti is doing about the same amount of writes with the symlink as putting the docker on an xfs drive. Yet with docker disabled it does nothing at all. Not thrilled with the idea of my SSD's being eaten away when they should be doing nothing at all. They could be with me for a good long time.

     

    While it is a heck of a lot better then the gigs I was also getting in 10 minutes, it is still over 3TB of writes per year for no reason and will really chew threw SSD's. Also not a lot of reason to possibly mess up the unraid config for when a real fix is released vs just leaving the docker on the Array HDD.

     

    Really wish there was a time frame for when this would be fixed. I am still on the trial and don't really feel comfortable buying the license with this issue active.

    Totally understandable.

    3TB on my two Samsung Evo 860 1TB drives are neglectable though, since their warranty will void after 300TB written hahaha. They should last 100 years 😛 

     

    It would feel strange however that moving to n XFS volume would still have btrfs-transacti "do stuff".

    If I interpret my quick Google query correctly this is snapshotting at work, which is a btrfs, but not an XFS process.

     

    Also, indeed, only mess with unRAID's config if you confident about doing it 🙂 

    Great thing about unRAID in these cases is that a reboot sets you back to default with a default go file and no other scripts running, at least on the OS side.

    Edited by S1dney
    Link to comment
    2 minutes ago, TexasUnraid said:

    it is still over 3TB of writes per year

    I'm sorry, but you are worried about 3TB of extra writes per year? My cache drive is currently writing around 2TB per day and don't really worry about it, but I do understand users worried about that, not about 3TB per year.

    Link to comment

    Remember that 3tb per yet is with the docker on a hard drive added to the array for the strict purpose of hosting docker AND the dirty writeback set to 3 mins. These are ok for the short term but not something I would want to have to do long term.

     

    Without the writeback change it is 6tb a year.

     

    And with the docker on the cache like it should be, it was closer to 30-40TB a year on an SSD with a TBW of 72.

     

    3tb a year that should not be happening by itself I would not be happy about but could live with.  If there was a reason for the writes, I would not even have an issue with it.

     

    The root cause of this that presents the possibility of killing my SSD in ~2 years and the chance it could pop up again with some insane writes and start killing them without me realizing it, that is what worries me.

     

    That is a reasonable thing to be worried about IMHO. Particularly when buying new hardware is not exactly feasible and I am kinda stuck with what I have for the foreseeable future.

     

    Don't get me wrong, I REALLY like unraid as a whole, this is the first issue with it that was not a lack of a feature (aka, snapshots). But before spending $130, basically what I have in my whole server, on a license, I am considering all angles.

     

    I want to be able to set it and forget it. With this issue active I will never be able to truly relax and trust it, I will always be worried that it could start writing again and constantly feel the need to check on it.

    Link to comment
    1 hour ago, S1dney said:

    Totally understandable.

    3TB on my two Samsung Evo 860 1TB drives are neglectable though, since their warranty will void after 300TB written hahaha. They should last 100 years 😛 

     

    It would feel strange however that moving to n XFS volume would still have btrfs-transacti "do stuff".

    If I interpret my quick Google query correctly this is snapshotting at work, which is a btrfs, but not an XFS process.

     

    Also, indeed, only mess with unRAID's config if you confident about doing it 🙂 

    Great thing about unRAID in these cases is that a reboot sets you back to default with a default go file and no other scripts running, at least on the OS side.

     

    Yeah, if I had nice SSD's like that it would not be as big of a deal. I am stuck with some new old stock laptop drives though that a friend gave me with a mere 72TBW lifespan. Not exactly built for high endurance lol.

     

    I agree, it is very odd that simply starting docker causes the btrfs-transacti to do stuff when there is nothing on any BTRFS drive that should be being access or written to as shown when docker is disabled.

     

    The fact the issue is so strange is why it bothers me, never know when it could flare back up with a vengeance since no one understands it.

     

    Once again, I am not hating on unraid, I truly love it even with it's lack of snapshots lol. I figured out how to install ESXI as a guest in unraid (got to set the sata controller to 1 instead of 0 and set network type to vemdx3 (SP?)), so that should take care of most of my VM needs.

     

    Combined with BTRFS and the scripts someone posted on here to at least take crude manual snapshots, I am really loving unraid as a whole. Just want to have an idea of what to expect going forward with this issue.

     

    I have no idea how things generally work around here, do patches for this kind of thing generally take days, weeks, months or years? We all know of software that falls into each of those categories, I simply don't know which one unraid falls into.

    Edited by TexasUnraid
    Link to comment

    I started monitoring with S1idney's script and saw almost 500GB/day writes on my drive.  I originally switched from Plex official to lsio docker which reduced my writes dramatically to around 150GB/day which is still high.  As others have said this isn't the fix, and I saw it start creeping up towards 500GB/day in about a week.  Since I only have 1x SSD I decided to reformat to XFS instead of BTRFS earlier this week.  I just looked at it today and with nothing else changing outside of the XFS format I'm only showing 20GB writes in total over the last week.

     

    Thank you to everyone's research so far in at least finding a workaround and saving my SSD from an early death!  Hopefully this gets solved in an update soon.

    Link to comment
    2 minutes ago, WackyWRZ said:

    I started monitoring with S1idney's script and saw almost 500GB/day writes on my drive. 

    What script is that? I didn't see this.

    Link to comment

    After letting it sit most of the day, seems things have gotten worse, this is why this bug worries me, it can flare up at any time without warning and if you don't catch it really mess you up lol.

     

    Loop2 is now very low usage for some reason, basically what I would expect (although I was not counting it earlier since it is on the XFS hard drive) but the btrfs-transcti has doubled the writes even with the mitigations in place.

     

     

    Link to comment

    Going to throw my 2 cents in on this thread... I seem to have this issue only when the tdarr-aio docker container is running.  My Samsung 960 Pro's are showing quite a bit of TBW compared to read.  If I stop that container, it immediately quiets down.

    -Data units read [38.7 TB]

    -Data units written [107 TB]

     

    This is just in less than a minute of tdarr-aio running

    11601 be/0 root        292.48 M    127.82 M  [loop2]

    Edited by nickp85
    Link to comment

    So my troubleshooting continues. I have a theory at the bottom based on latest testing. Seems plausible to me, would love to hear those more knowledgeable then me opinions on it.

     

    TL:DR,

    moving appdata and docker image to XFS drive reduced writes to basically nothing on btrfs drives

    still saw more writes then there should be to XFS drive but acceptable.

    Even small file access to btrfs file system results in large writes from btrfs-transacti

    It was mentioned online that this could be btrfs autodefrag at work and this lines up with my results

    It was also mentioned that random accsess inside of a file (like our docker image file) can really cause defrag to freak out

     

     

    Today I used unbalance to move everything off the cache and onto my btrfs array drive.

     

    The cache now has almost no writes, although it is still ticking up slightly even with completely empty drives which is strange.

     

    btrfs-transcti is still ticking away though 200mb in 15 mins with the writes now going to the array drive that is formatted btrfs and has the cache data on it.

     

    So I then started slowly moving data back to the cache to see if any data caused writes, or if it is particular data.

     

     

    1: started out with 5gb of iso's, no change in writes from empty drives after 10 mins. (also allowed me to calculate that the writes in the main screen translate roughly to a write size of 4096)

     

    2: I then moved 30gb of VM images over (VM's are disabled right now while testing all of this). Same results as empty cache.

     

    3: I then moved the rest of the VM images along with everything else that was on the cache except system and appdata. Same results as empty cache.

     

    Interestingly, the btrfs-transacti writes to the array drives slowed down to about 1/3 of what they were doing with the same system settings / dockers running vs yesterday at this point. ~40mb/10 mins vs 120mb kinda falls in line with my theory below. Loop 2 is about the same, ~50mb in 10 mins.

     

    4: I moved the appdata to the XFS array drive, btrfs-transacti dropped again down to ~25mb/10 mins but all of these writes appear to be going to the xfs drive with the docker image. Both BTRFS formatted array and cache drives are showing same writes as when empty. Loop2 remained the same at around 40mb/10 mins.

     

    5: moved the "well behanved" docker appdata to cache. Writes start increasing on cache drive again and now I know what the second btrfs-transacti I see is. One is for the docker image, the other is for the cache and sometimes I saw a 3rd for the array drive.

     

    Loop2 increased slightly to around 45mb/10 mins. btrfs-transacti #1 was up a bit to 35mb, btrfs-transacti #2 was at 30mb.

     

    Also discovered something interesting, when moving the appdata to the array I changed the share to no cache. When I moved it back to the cache I forgot to change the share setting. Thus any new file writes went to the array, easily allowing me to see exactly what activity the dockers are actually doing to the appdata.

     

    As expected it is mostly log files and startup files, a grand total of 3mb worth of files from starting the dockers and letting them run for half an hour. Exactly what I would expect. When actual writes are almost 100x more then this due to loop2 and btrfs-trans, something is wrong.

     

     

     

    Been doing some research on btrfs-transacti, snapshots are one cause that people pointed out as a cause but since snapshots are not used by unraid, seems unlikely, unless something on the backend is trying to take snapshots?

     

    The more plausible explanation IMHO was someone mentioned that btrfs-transacti manages the BTRFS auto-defrag mechanism. This actually makes a lot of sense in a way based on the above results. When files that are not being accessed at all are on the drive, there are no writes.

     

    As soon as you put files on the drive that would have even very minor access, btrfs-transacti sweeps in and starts writing excessive data. As if it is trying to keep any possible fragmentation at bay. It was mentioned somewhere that btrfs can be particularly bad about this with image files that have a lot of random access inside a file. Like our docker file.

     

    It is like it is "double de-fragging", it is defragging the main cache file system and then also defragging the docker image. Which then causes the cache to defrag again and on and on it goes.

     

    At least it is a theory and lines up well with mapping the docker directly to cache helping things since it would just defrag one file system like that instead of 2 file systems fighting each other. Just spitballing here.

     

    Reminds me of the PS3 where it would not allow any fragmention at all and would thrash the hard drive and reduce things to a crawl even for fairly minor tasks.

     

    Does anyone know if autodefrag is enabled? How can you find out? How can it be disabled for testing?

     

    Would love to hear others opinions on this, maybe this has already been considered and debunked?

    Edited by TexasUnraid
    Link to comment
    On 6/13/2020 at 3:48 PM, S1dney said:

    Totally understandable.

     

    Great thing about unRAID in these cases is that a reboot sets you back to default with a default go file and no other scripts running, at least on the OS side.

     

    Is there a way to extract the BTRFS image? Or will the docker settings be saved so it is just a 1 click reinstall using the instructions from page 2?

    Edited by TexasUnraid
    Link to comment
    23 hours ago, TexasUnraid said:

    What script is that? I didn't see this.

    The script is on Page 7 of this post. Link

     

    I just have the code below run 2x day from the "User Scripts" plugin.  Change the /dev/sdX to whatever your drive is and change the SHARENAME to a share on the server.

     

    #/bin/bash
    # Get the TBW of /dev/s!db
    TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/sdX | awk '$0~/LBAs/{ printf "%.1f\n", $10 * 512 / 1024^4 }') 
    TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/sdX | awk '$0~/LBAs/{ printf "%.1f\n", $10 * 512 / 1024^3 }') 

    echo "TBW on $(date +"%d-%m-%Y %H:%M:%S") --> $TBWSDB_TB TB, which is $TBWSDB_GB GB." >> /mnt/user/SHARENAME/TBW_sdb.log

     

    Edited by WackyWRZ
    Found script
    • Thanks 1
    Link to comment
    4 hours ago, WackyWRZ said:

    The script is on Page 7 of this post. Link

     

    I just have the code below run 2x day from the "User Scripts" plugin.  Change the /dev/sdX to whatever your drive is and change the SHARENAME to a share on the server.

     

    #/bin/bash
    # Get the TBW of /dev/s!db
    TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/sdX | awk '$0~/LBAs/{ printf "%.1f\n", $10 * 512 / 1024^4 }') 
    TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/sdX | awk '$0~/LBAs/{ printf "%.1f\n", $10 * 512 / 1024^3 }') 

    echo "TBW on $(date +"%d-%m-%Y %H:%M:%S") --> $TBWSDB_TB TB, which is $TBWSDB_GB GB." >> /mnt/user/SHARENAME/TBW_sdb.log

     

    Great, I will try that out, got a suspicion that it will not work with my drives since they don't output lba's written in the smart data.

     

    EDIT: Hunch was correct but found an old laptop samsung ssd, put that in the array as XFS and moved the docker image and app data there for the time being to get some baseline numbers, then will convert it to btrfs and see how it goes. Thanks for pointing that out.

     

    EDIT 2:

     

    Made a few tweaks to the script to allow setting the drive you want to watch with an argument:

     

    #!/bin/bash
    #description=Basic script to display the amount of data written to SSD on drives that support this. Set "argumentDefault" to the drive you want if you will schedule this.
    #argumentDescription= Set drive you want to see here.
    #argumentDefault=sdk
    
    ### replace sdg below with label of drive you want TBW calculated for  ###
    device=/dev/"$1"
    
    sudo smartctl -A $device |awk '
    $0 ~ /Power_On_Hours/ { poh=$10; printf "%s / %d hours / %d days / %.2f years\n",  $2, $10, $10 / 24, $10 / 24 / 365.25 }
    $0 ~ /Total_LBAs_Written/ {
       lbas=$10;
       bytes=$10 * 512;
       mb= bytes / 1024^2;
       gb= bytes / 1024^3;
       tb= bytes / 1024^4;
       #printf "%s / %s  / %d mb / %.1f gb / %.3f tb\n", $2, $10, mb, gb, tb
         printf "%s / %.2f gb / %.2f tb\n", $2, gb, tb
       printf "mean writes per hour:  / %.3f gb / %.3f tb",  gb/poh, tb/poh
    }
    $0 ~ /Wear_Leveling_Count/ { printf "%s / %d (%% health)\n", $2, int($4) }
    ' |
       sed -e 's:/:@:' |
       sed -e "s\$^\$$device @ \$" |
       column -ts@
    
    
    
    
    # Get the TBW of /dev/s!db
    TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/"$1" | awk '$0~/LBAs/{ printf "%.1f\n", $10 * 512 / 1024^4 }')
    TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/"$1" | awk '$0~/LBAs/{ printf "%.1f\n", $10 * 512 / 1024^3 }')
    TBWSDB_MB=$(/usr/sbin/smartctl -A /dev/"$1" | awk '$0~/LBAs/{ printf "%.1f\n", $10 * 512 / 1024^2 }')
    
    echo "TBW on $(date +"%d-%m-%Y %H:%M:%S") --> if 2 numbers, Written data first line, read data second line > $TBWSDB_TB TB, which is $TBWSDB_GB GB, which is $TBWSDB_MB MB." >> /mnt/user/Temp/TBW_"$1".log 

     

    Edited by TexasUnraid
    Link to comment
    17 hours ago, TexasUnraid said:

    Does anyone know if autodefrag is enabled? How can you find out? How can it be disabled for testing?

    Good theory but autodefrag is disabled by default for any btrfs mount, you'd need to use that option at mount time to enable it, and it's not used by Unraid.

    Link to comment
    23 hours ago, TexasUnraid said:

     

    Is there a way to extract the BTRFS image? Or will the docker settings be saved so it is just a 1 click reinstall using the instructions from page 2?

    Well the instructions from page 2 (you meant these right?) are meant to make the changes persistent by editing the go files and injecting a file from flash into the OS on boot.

     

    If you want a solution that is temporary you should follow the steps to create a directory on the cache  and edit the start_docker() function in the /etc/rc.docker/rc.docker file.

    Then stop docker from the docker tab and start it again.

    Docker will now write directly into the btrfs mountpoint.

     

    One thing here though, is that you end up with no docker container.

    To recreate every one of them, go to add container and select the containers from the user-templates one by one (assuming all your containers were created via the GUI).

    This downloads the images again and all the persistent mappings are restored.

     

    If you want to revert, simple reboot and voila.

     

    Easy way to find where the investigation you did stand against this, kind of curious also 🙂

    Cheers.

     

    Link to comment
    8 hours ago, johnnie.black said:

    Good theory but autodefrag is disabled by default for any btrfs mount, you'd need to use that option at mount time to enable it, and it's not used by Unraid.

     

    Darn, well it was worth a shot, guess there is no way it got enabled by mistake?

     

    After running things for 10 hours with LBA monitoring as an XFS format, I averaged 325mb/hour writes, much higher then necessary but tolerable. If I disabled the 3 worst dockers that dropped down to ~200mb/hour.

     

    I then converted the drive to btrfs and moved the dockers back, still waiting for data to come in but looking like almost a gig/hour and growing over time, trying to see if it will settle down.

    Link to comment
    Just now, TexasUnraid said:

    Darn, well it was worth a shot, guess there is no way it got enabled by mistake?

    No, it would need to be part of the mount command, only options Unraid uses are noatime and nodiratime

     

    Jun 11 19:07:38 Tower1 emhttpd: shcmd (113): mount -t btrfs -o noatime,nodiratime /dev/nvme1n1p1 /mnt/cache

     

    Link to comment
    52 minutes ago, S1dney said:

    Well the instructions from page 2 (you meant these right?) are meant to make the changes persistent by editing the go files and injecting a file from flash into the OS on boot.

     

    If you want a solution that is temporary you should follow the steps to create a directory on the cache  and edit the start_docker() function in the /etc/rc.docker/rc.docker file.

    Then stop docker from the docker tab and start it again.

    Docker will now write directly into the btrfs mountpoint.

     

    One thing here though, is that you end up with no docker container.

    To recreate every one of them, go to add container and select the containers from the user-templates one by one (assuming all your containers were created via the GUI).

    This downloads the images again and all the persistent mappings are restored.

     

    If you want to revert, simple reboot and voila.

     

    Easy way to find where the investigation you did stand against this, kind of curious also 🙂

    Cheers.

     

     

    Yeah, those are the ones I saw, I have done so much reading on this subject it all kind of runs together at this point lol. Is there another write up on the subject?

     

    Only issue I saw with that writeup was it seems like when an official fix is released, I will have to rebuild all of the dockers to go back to the official setup? This is my first time using dockers (been working with PC's since DOS 3.0 but only really messed with linux in VM's before this), I get them in theory but are no settings/information stored inside the containers themselves that would be lost?

     

    Would hate to get everything setup just right and then have to do it all over again when a patch is released.

     

    Course it would be easier to figure that out if we knew an approx timeframe for a fix.

     

    Does anyone know when "run at array start" in user scripts actually runs? Is it when you click the button or after the array has started? It would be cool if it could be scripted into a user script so it is a simple matter of enabling it or disabling it and everything is reverted.

     

    Is it possible to stop / start docker from a script so it could all be automated into a single script?

     

    Edited by TexasUnraid
    Link to comment

    Getting some really strange results in latest tests using the drive LBA method for number of writes.

     

    If I format it as xfs and have both the docker and appdata on the xfs drives, total writes are ~200-250mb/s total.

     

    I moved the appdata to another XFS drive and kept the docker on the SSD, writes were reduced to 150-200mb/hour.

     

    I reformat the SSD with LBA reporting as btrfs and put everything back on it, gets ~1gig of writes per hour and seemed to be climbing, plan to leave it overnight sometime to see how bad it gets.

     

    This is where it gets strange, I formatted the SSD back to XFS, I have the docker on the XFS SSD and put the appdata on the cache and the docker on the XFS. LBA writes are reporting 350-400mb/hour on the SSD alone plus the writes to the cache which seem to be around 200mb/hour based on iotop.

     

    I have no idea how this is possible or what is going on, going to recheck my findings and try reversing it, appdata on SSD and docker on cache.

    Edited by TexasUnraid
    Link to comment
    14 hours ago, TexasUnraid said:

     

    Yeah, those are the ones I saw, I have done so much reading on this subject it all kind of runs together at this point lol. Is there another write up on the subject?

     

    Only issue I saw with that writeup was it seems like when an official fix is released, I will have to rebuild all of the dockers to go back to the official setup? This is my first time using dockers (been working with PC's since DOS 3.0 but only really messed with linux in VM's before this), I get them in theory but are no settings/information stored inside the containers themselves that would be lost?

     

    Would hate to get everything setup just right and then have to do it all over again when a patch is released.

     

    Course it would be easier to figure that out if we knew an approx timeframe for a fix.

     

    Does anyone know when "run at array start" in user scripts actually runs? Is it when you click the button or after the array has started? It would be cool if it could be scripted into a user script so it is a simple matter of enabling it or disabling it and everything is reverted.

     

    Is it possible to stop / start docker from a script so it could all be automated into a single script?

     

    There is no other write-up on the subject, as far as I know. I improvised in this one to find a solution that would not destroy my SSD's ;) 

     

    Well no docker container should contain persistent data.

    Persistent data should always be mapped to a location on the host.

    Updating a docker container destroys it too, and if setup correctly this doesn't cause you to loose data, by design.

     

    You're right though, if this gets fixed in a newer release, you would have to redownload all image files, but because of docker's nature, this will only take some time to download (depending on your line speed) and (again if setup correctly) will not cause you to lose any data.

     

    The user scripts plugin is indeed able to run scripts when the array comes up, you don't have to press any button or so, it runs like the names says on array startup (I guess straight after).

     

    Taking the approach from page two makes this persistent though, and reverting back to default in a feature upgrade would just require you to remove the added lines from the /boot/config/go file and docker will mount its image file again.

    Link to comment
    7 hours ago, S1dney said:

    There is no other write-up on the subject, as far as I know. I improvised in this one to find a solution that would not destroy my SSD's ;) 

     

    Well no docker container should contain persistent data.

    Persistent data should always be mapped to a location on the host.

    Updating a docker container destroys it too, and if setup correctly this doesn't cause you to loose data, by design.

     

    You're right though, if this gets fixed in a newer release, you would have to redownload all image files, but because of docker's nature, this will only take some time to download (depending on your line speed) and (again if setup correctly) will not cause you to lose any data.

     

    The user scripts plugin is indeed able to run scripts when the array comes up, you don't have to press any button or so, it runs like the names says on array startup (I guess straight after).

     

    Taking the approach from page two makes this persistent though, and reverting back to default in a feature upgrade would just require you to remove the added lines from the /boot/config/go file and docker will mount its image file again.

    Ok, thanks for the info. Not worried about redownloading the containers (although the ability to batch install them using the saved templates would be fantastic). Only really worried about loosing settings/data in the process. I always thought of dockers like portable apps in windows, seems they are a bit more advanced then that.

     

    My reasoning for wanting to use user scripts vs the go file is, put simply, I would easily remember to disable the user script when a patch is released. Still being pretty new to linux, I am worried I might forget exactly what I need to edit/change to revert things if it is terminal based commands, although I suppose I can reference back to this thread to figure that out.

     

    Still, think I might try to get it working with user scripts first before going with the go file.

     

    Is it possible to stop / start docker from a script? It would be cool if run at array start could handle it all automatically before docker starts normally but worst case I guess it could stop docker, copy the file and then start docker up again?

    Link to comment

    In other news, I swapped the docker image to the cache and put appdata on the BTRFS SSD with LBA logging.

     

    The LBA looks to increased by around ~5mb/hour overnight with the appdata.

     

    Cache doesn't have lba logging but iotop is once again showing ~1 gig+ an hour and climbing like this.

     

    Going to reverse it back and see if I get the same results as before, those made no sense at all.

     

    Edit: Reversed things back to appdata on btrfs array drive and docker on xfs array SSD and LBA writes are back to average ~250mb/hour. Very odd that the earlier results were so high, going to put the appdata back on cache and see if that was what caused it before.

    Edited by TexasUnraid
    Link to comment
    1 hour ago, TexasUnraid said:

    Is it possible to stop / start docker from a script?

    Sure, when you make changes to a container in the GUI and apply them you can see the docker run command that is issued by Unraid. You can certainly run that same command in a script to start that container.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.