• [6.8.3] Unnecessary overwriting of JSON-files in docker.img every 5 seconds


    mgutt
    • Minor

    @Valerio found this out first, but never received an answer. Today I found it out, too. But this is present since 2019 (or even longer).

     

    I would say it's a bug, as:

    • it prevents HDD/SSD spindown/sleep (depending on the location of docker.img)
    • it wears out the SSD in the long run (if docker.img is located here) - see this bug, too.
    • it prevents reaching CPU's deep sleep states

     

    What happens:

    • /var/lib/docker/containers/*/hostconfig.json is updated every 5 seconds with the same content
    • /var/lib/docker/containers/*/config.v2.json is updated every 5 seconds with the same content except of some timestamps (which shouldn't be part of a config file I think)

     

    Which docker containers:

    • verified are Plex (Original) and PiHole, but maybe this is a general behaviour

     

    As an example the source of hostconfig.json which was updated yesterday 17280x times with the same content:

    find /var/lib/docker/containers/40b4197fdea122178139e9571ae5f4040a2ef69449acf14e616010c7e293bb44 -ls -name hostconfig.json -exec cat {} \;
    2678289      4 -rw-r--r--   1 root     root         1725 Oct  8 13:46 /var/lib/docker/containers/40b4197fdea122178139e9571ae5f4040a2ef69449acf14e616010c7e293bb44/hostconfig.json
    {
       "Binds":[
          "/mnt/user/tv:/tv:ro",
          "/mnt/cache/appdata/Plex-Media-Server:/config:rw",
          "/mnt/cache/appdata/Plex-Transcode:/transcode:rw",
          "/mnt/user/movie:/movie:ro"
       ],
       "ContainerIDFile":"",
       "LogConfig":{
          "Type":"json-file",
          "Config":{
             "max-file":"1",
             "max-size":"50m"
          }
       },
       "NetworkMode":"host",
       "PortBindings":{
          
       },
       "RestartPolicy":{
          "Name":"no",
          "MaximumRetryCount":0
       },
       "AutoRemove":false,
       "VolumeDriver":"",
       "VolumesFrom":null,
       "CapAdd":null,
       "CapDrop":null,
       "Capabilities":null,
       "Dns":[
          
       ],
       "DnsOptions":[
          
       ],
       "DnsSearch":[
          
       ],
       "ExtraHosts":null,
       "GroupAdd":null,
       "IpcMode":"private",
       "Cgroup":"",
       "Links":null,
       "OomScoreAdj":0,
       "PidMode":"",
       "Privileged":false,
       "PublishAllPorts":false,
       "ReadonlyRootfs":false,
       "SecurityOpt":null,
       "UTSMode":"",
       "UsernsMode":"",
       "ShmSize":67108864,
       "Runtime":"runc",
       "ConsoleSize":[
          0,
          0
       ],
       "Isolation":"",
       "CpuShares":0,
       "Memory":0,
       "NanoCpus":0,
       "CgroupParent":"",
       "BlkioWeight":0,
       "BlkioWeightDevice":[
          
       ],
       "BlkioDeviceReadBps":null,
       "BlkioDeviceWriteBps":null,
       "BlkioDeviceReadIOps":null,
       "BlkioDeviceWriteIOps":null,
       "CpuPeriod":0,
       "CpuQuota":0,
       "CpuRealtimePeriod":0,
       "CpuRealtimeRuntime":0,
       "CpusetCpus":"",
       "CpusetMems":"",
       "Devices":[
          {
             "PathOnHost":"/dev/dri",
             "PathInContainer":"/dev/dri",
             "CgroupPermissions":"rwm"
          }
       ],
       "DeviceCgroupRules":null,
       "DeviceRequests":null,
       "KernelMemory":0,
       "KernelMemoryTCP":0,
       "MemoryReservation":0,
       "MemorySwap":0,
       "MemorySwappiness":null,
       "OomKillDisable":false,
       "PidsLimit":null,
       "Ulimits":null,
       "CpuCount":0,
       "CpuPercent":0,
       "IOMaximumIOps":0,
       "IOMaximumBandwidth":0,
       "MaskedPaths":[
          "/proc/asound",
          "/proc/acpi",
          "/proc/kcore",
          "/proc/keys",
          "/proc/latency_stats",
          "/proc/timer_list",
          "/proc/timer_stats",
          "/proc/sched_debug",
          "/proc/scsi",
          "/sys/firmware"
       ],
       "ReadonlyPaths":[
          "/proc/bus",
          "/proc/fs",
          "/proc/irq",
          "/proc/sys",
          "/proc/sysrq-trigger"
       ]
    }

     

    • Like 2
    • Thanks 3



    User Feedback

    Recommended Comments



    1 hour ago, mgutt said:

    The location of your docker.img is irrelevant. You can even use a docker directory, which is a method I recommend as it reduces write amplification.

    Cool - just implemented now.

    Will monitor and see how it goes! 🙂

    Link to comment

    I've had to revert this change as it breaks both NextCloud & Matomo, causing their config/log area to be read only.

     

    ## edit ##

    ...maybe, just noticed both my cache drives have gone offline, at the same time....investigating....

    Edited by boomam
    Link to comment
    3 hours ago, boomam said:

    both my cache drives have gone offline, at the same time....investigating....

    Sounds like you are using ASPM (NVMe) or ALPM (SATA). This means, because of my code your SSD went the very first time in sleep state and this kills your BTRFS RAID (should not happen = nvme firmware or bios bug). Check your logs for errors.

     

    Link to comment
    9 hours ago, mgutt said:

    Sounds like you are using ASPM (NVMe) or ALPM (SATA). This means, because of my code your SSD went the very first time in sleep state and this kills your BTRFS RAID (should not happen = nvme firmware or bios bug). Check your logs for errors.

     

    SATA drives.

    And yes, BTRFS did mess up, luckily i was able to recover.

    No log errors though, other than being unable to mount.

     

    I'll set the drives to never sleep, should work around it.

     

    For others, it could be worth updating the original post (here and reddit) to list that caveat/s workaround, otherwise this fix will become better known as the cache killer, instead of cache saver ;-)

    Edited by boomam
    Link to comment
    48 minutes ago, boomam said:

    For others, it could be worth updating the original post

    Your hardware problem has nothing to do with this modification. If it is related to the power management of your SSD, it would even happen if you disable docker.

    Link to comment
    18 minutes ago, mgutt said:

    Your hardware problem has nothing to do with this modification. If it is related to the power management of your SSD, it would even happen if you disable docker.

    You literally just clarified the opposite of that, saying that the script affects the drives sleep mode.

     

    Whilst your script doesn't directly affect that, it does dramatically increase the likelihood of it, and warrants a warning/note in the documentation.

    Link to comment
    15 minutes ago, boomam said:

    You literally just clarified the opposite of that, saying that the script affects the drives sleep mode.

     

    Whilst your script doesn't directly affect that, it does dramatically increase the likelihood of it, and warrants a warning/note in the documentation.


    He didn’t clarify the opposite. He said because it no longer uses your SSD for logs, your ssd finally went to sleep, resulting in your issue. The script doesn’t affect the sleep mode. 

     

    He also said if you disable docker, or didn’t use containers, you would still have the problem since your SSD went to sleep. 

    Link to comment
    30 minutes ago, boomam said:

    saying that the script affects the drives sleep mode

    This was only a guess. Maybe the problem lies somewhere else?! As long nobody else has this problem and it isn't verified, it does not make sense to warn everyone. By now you are the only one who had this problem. And as I said, if it is related to the power management, it can happen all the time. Not only because of this modification.

    30 minutes ago, boomam said:

    Whilst your script doesn't directly affect that, it does dramatically increase the likelihood of it

    If this was your problem, then yes, but by the same argumentation Unraid would need to throw a warning if you disable Docker or if you create multiple pools or...?!

     

    PS Wait a week or so. If it does not happen again, revert your sleep setting and we will see if this was the reason. By the way: How did you disable sleep? For SATA these methods exist:

    max_performance
    medium_power
    med_power_with_dipm
    min_power

     

    Which was active in your setup?

    Link to comment
    48 minutes ago, lJakel said:


    He didn’t clarify the opposite. He said because it no longer uses your SSD for logs, your ssd finally went to sleep, resulting in your issue. The script doesn’t affect the sleep mode. 

    I've not said that it does, but if it increases the chances of BTRFS corruption due to allowing sleep, then as a by-product it should be listed as note in the documentation. Directly caused or not.

     

    44 minutes ago, mgutt said:

    As long nobody else has this problem and it isn't verified, it does not make sense to warn everyone. By now you are the only one who had this problem. And as I said, if it is related to the power management, it can happen all the time. Not only because of this modification.

    If this was your problem, then yes, but by the same argumentation Unraid would need to throw a warning if you disable Docker or if you create multiple pools or...?!

    Actually, yes - Unraid should have warnings about this (and other issues) on their install FAQ for new users.

    There's a lot of gotcha's with Unraid that, using this as an example can cause either data risk or hardware failure. That whilst not necessarily a direct cause of Unraid, the fact that its using a component, in this case Docker, that has the bug should be noted so users of the platform are correctly informed.

     

    Using the same logic for your great diagnostic/workaround of the docker/loop2 issue - any perceived issues, small or large, should be noted so people can understand the risk.

    Otherwise many will put issues down to other causes and there will be no consistent thread to tie together a root cause for something due to shared knowledge of an inherent risk, directly caused or not.

     

    It's not commenting on the intent of a given app/script/platform, nor the people creating it - to be clear, its appreciated work - but noting down issues as they occur and then discounting them afterwards, ensures that issues are captured, analyzed and prioritized for further remediation if needed, by allowing that collective knowledge to take place.

    Ignoring and going 'its just you/your hardware' when an issue has never existed previously is not great as it ignores potential issues caused as a side effect until they get much larger in scope.

    It's Dev/QA 101, especially in the Open Source community.

     

    If its not going to be added to the doc's/note's, then I'm sure as this isn't too far into the thread that others will see with a quick scroll down, so its arguably moot, they'll get the info if they read a little further. ;-)

     

    RE: Drive Spin downs

    The spin-down delay variable dictates when the ACHI command to spin down is sent to the drive.

    The command for this, I think, is the same between SSD & HDD, just acted on in different ways - and often differently dependent on the SSD manufacturer & firmware.

    In this case, its running on Crucial drives - from what I read a while ago I think it actually does help give it a nudge in the sleep direction, so setting it to 'never' should help. It just warrant's a little more testing to make sure that it works as intended. Failing that, some modifications elsewhere should achieve the same results too.

     

    Link to comment
    On 8/22/2021 at 7:59 PM, boomam said:

    so setting it to 'never' should help

    As far as I know Unraid does not send any sleep commands to non rotational disks. So "never" should only touch HDDs.

    Link to comment
    1 hour ago, mgutt said:

    As far as I know Unraid does not send any sleep commands to non rotational disks. So "never" should only touch HDDs.

    I'd have to look into it closer, but if they've set it to that, instead of using industry standard ACHI commands, then its a bit of a weird decision.

    We'll see I guess. 😛 

    Link to comment
    8 hours ago, boomam said:

    I'd have to look into it closer

    I did. It was not as easy as I thought, but finally I was successful. At first I though I could open two terminals and watch for smartctl and hdparm processes (which Unraid uses to set standby):

    while true; do pid=$(pgrep 'smartctl' | head -1); if [[ -n "$pid" ]];  then ps -p "$pid" -o args && strace -v -t -p "$pid"; fi; done
    
    while true; do pid=$(pgrep 'hdparm' | head -1); if [[ -n "$pid" ]];  then ps -p "$pid" -o args && strace -v -t -p "$pid"; fi; done

     

    But I found out that some of the processes were to fast to monitor. So I changed the source code of hdparm and smartctl and added in both apps a sleep time of 1 second (Trick 17, we say in Germany ^^). Then I used this command to watch for the processes:

    while true; do for pid in $(pgrep 'smartctl|hdparm'); do if [[ $lastcmd != $cmd ]] || [[ $lastpid != $pid ]]; then cmd=$(ps -p "$pid" -o args); echo $cmd "($pid)"; lastpid=$pid; lastcmd=$cmd; fi; done; done

     

     

    After that I pressed the spin down icon of an HDD which returned:

    COMMAND /usr/sbin/hdparm -y /dev/sdb (5766)

     

    After the disk spun down, Unraid starts to spam the following comand every second:

    COMMAND /usr/sbin/hdparm -C /dev/sdb (5966)

     

    I think by that Unraid's WebGUI is able to update the Icon as fast as possible if a process wakes up the Disk.

     

    Then I pressed the spin up icon which returns this:

    COMMAND /usr/sbin/hdparm -S0 /dev/sdb (27296)

     

    And several seconds later, after the disk spun up, this command appeared (Unraid checks SMART values):

    COMMAND /usr/bin/php /usr/local/sbin/smartctl_type disk1 -A (28152)
    COMMAND /usr/sbin/smartctl -A /dev/sdb (28155)

     

    The next step was to click on the spin down icon of the SSD... but nothing happened. So this icon has no function. Buuh ^^

     

    Now I set my Default spin down delay to 15 minutes and waited... and then this appeared:

    COMMAND /usr/sbin/hdparm -y /dev/sdb (5826)
    COMMAND /usr/sbin/hdparm -y /dev/sde (6203)
    COMMAND /usr/sbin/hdparm -y /dev/sdc (6204)

     

    And Unraid is spamming again:

    COMMAND /usr/sbin/hdparm -C /dev/sde (6465)
    COMMAND /usr/sbin/hdparm -C /dev/sdb (6555)
    COMMAND /usr/sbin/hdparm -C /dev/sdc (6643)

     

    But as I thought.. no command mentions /dev/sdd, which is my SATA SSD. So Unraid never sends any standby commands to your SSD.

     

    I remember that one of the unraid devs said in the forums, that SSDs do not consume measurable more energy if they are in standby state, so they did not implement equivalent commands.

     

    Conclusion: As you did not change any setting which covers your SSDs power management and as they are working now, your problem should be something else.

    hdparm-9.58-sleep1.txz smartmontools-7.2-sleep1.txz

    Link to comment
    20 minutes ago, mgutt said:

    I did. It was not as easy as I thought, but finally I was successful. At first I though I could open two terminals and watch for smartctl and hdparm processes (which Unraid uses to set standby):

    while true; do pid=$(pgrep 'smartctl' | head -1); if [[ -n "$pid" ]];  then ps -p "$pid" -o args && strace -v -t -p "$pid"; fi; done
    
    while true; do pid=$(pgrep 'hdparm' | head -1); if [[ -n "$pid" ]];  then ps -p "$pid" -o args && strace -v -t -p "$pid"; fi; done

     

    But I found out that some of the processes were to fast to monitor. So I changed the source code of hdparm and smartctl and added in both apps a sleep time of 1 second (Trick 17, we say in Germany ^^). Then I used this command to watch for the processes:

    while true; do for pid in $(pgrep 'smartctl|hdparm'); do if [[ $lastcmd != $cmd ]] || [[ $lastpid != $pid ]]; then cmd=$(ps -p "$pid" -o args); echo $cmd "($pid)"; lastpid=$pid; lastcmd=$cmd; fi; done; done

     

     

    After that I pressed the spin down icon of an HDD which returned:

    COMMAND /usr/sbin/hdparm -y /dev/sdb (5766)

     

    After the disk spun down, Unraid starts to spam the following comand every second:

    COMMAND /usr/sbin/hdparm -C /dev/sdb (5966)

     

    I think by that Unraid's WebGUI is able to update the Icon as fast as possible if a process wakes up the Disk.

     

    Then I pressed the spin up icon which returns this:

    COMMAND /usr/sbin/hdparm -S0 /dev/sdb (27296)

     

    And several seconds later, after the disk spun up, this command appeared (Unraid checks SMART values):

    COMMAND /usr/bin/php /usr/local/sbin/smartctl_type disk1 -A (28152)
    COMMAND /usr/sbin/smartctl -A /dev/sdb (28155)

     

    The next step was to click on the spin down icon of the SSD... but nothing happened. So this icon has no function. Buuh ^^

     

    Now I set my Default spin down delay to 15 minutes and waited... and then this appeared:

    COMMAND /usr/sbin/hdparm -y /dev/sdb (5826)
    COMMAND /usr/sbin/hdparm -y /dev/sde (6203)
    COMMAND /usr/sbin/hdparm -y /dev/sdc (6204)

     

    And Unraid is spamming again:

    COMMAND /usr/sbin/hdparm -C /dev/sde (6465)
    COMMAND /usr/sbin/hdparm -C /dev/sdb (6555)
    COMMAND /usr/sbin/hdparm -C /dev/sdc (6643)

     

    But as I thought.. no command mentions /dev/sdd, which is my SATA SSD. So Unraid never sends any standby commands to your SSD.

     

    I remember that one of the unraid devs said in the forums, that SSDs do not consume measurable more energy if they are in standby state, so they did not implement equivalent commands.

     

    Conclusion: As you did not change any setting which covers your SSDs power management and as they are working now, your problem should be something else.

    hdparm-9.58-sleep1.txz smartmontools-7.2-sleep1.txz

    Well that certainly saves me some research at the weekend. 😛

    Link to comment

    Elegant solution mgutt!

     

    I see the logic of how it works, but for the life of me I can't get it to work... I must be doing something wrong.

     

    Opening up two console windows, triggering events and watching the inotifywait output for the below directories:

    /mnt/cache/system/docker/docker/containers

    /var/lib/docker/containers

     

    I see the same exact same events on both, whereas I'm expecting do see the former directory being very quiet (except for the 30 min sync cycle)

     

    Using Docker directory

    Triple checked the code snippet in go file.

    Can see the 3x RAM-disk related log entries in the system log

    Can see the tmpfs mounted on /var/lib/docker/containers

     

    Anything very obvious I'm overlooking?

     

     

     

    Link to comment
    5 hours ago, ungeek67 said:

    /mnt/cache/system/docker/docker/containers

    /var/lib/docker/containers

     

    I see the same exact same events on both

    Yes, this drove me crazy as well. Don't ask me why, but because one path mounts to the other, both return the same activity results from the RAM-Disk although I would expect that /mnt/cache reflects the SSD content?!

     

    The only way to see the real traffic on the SSD, is to mount the parent dir:

    mkdir /var/lib/docker_test
    mount --bind /var/lib/docker /var/lib/docker_test

     

    Now you can monitor /var/lib/docker_test and you see the difference. That's why I needed to use the same trick to create a backup every 30 minutes. It took multiple days to find that out 😅

     

    Screenshot:

    630280015_2021-08-2518_10_34.thumb.png.e33dad348e0fc744b15343a061bd2983.png

     

    If you use the docker.img it is easier. It has the correct timestamp every 30 minutes. 🤷‍♂️

     

    PS: You can see the higher write activity on your SSD in the Main tab if you wait for 30 minute backup (xx:00 and xx:30).

     

    Unmount and remove the test dir, when you finished your tests:

    umount /var/lib/docker_test
    rmdir /var/lib/docker_test

     

    • Like 1
    Link to comment

    Since last rebuild of the cache, i've had my second pool corrupt too - issues that did not exist and are highly coincidental since the script was implemented. I'll look into it properly at the weekend, but for now i've turned it off.

    Let me know if you want/think some diagnostic logs would help diagnose.

     

    On the upside, finally found an excuse to convert from docker vdisk to a directory. 😛

    Link to comment

    @mgutt I went ahead and implemented the changes by following the instructions, all went smoothly so far. My cache drives are 2x NVMe A-Data SX8200. I did not change the docker log file size, it wasn't necessary. I double-checked to make sure that your script worked by checking the RAM-disk size and the code itself in different locations as you have explained, all good. I do see 0.00 B/s, but the writes jump up&down ( averages about 200 Kb./s ). Is it normal? Does it mean I have some other dockers in the appfolder writing something ( besides status and log files ) to my cash drives, like Nextcloud, Plex? Thank you!   

     

    Edited by pervin_1
    Link to comment
    14 hours ago, pervin_1 said:

    Does it mean I have some other dockers in the appfolder writing something ( besides status and log files ) to my cash drives,

    Yes. Execute this command multiple times and check which files are updated frequently:

    find /var/lib/docker -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head -n30

     

    Execute this to find out which folder name belongs to which container:

    csv="CONTAINER ID;NAME;SUBVOLUME\n"; for f in /var/lib/docker/image/*/layerdb/mounts/*/mount-id; do sub=$(cat $f); id=$(dirname $f | xargs basename | cut -c 1-12); csv+="$id;" csv+=$(docker ps --format "{{.Names}}" -f "id=$id")";" csv+="/var/lib/docker/.../$sub\n"; done; echo -e $csv | column -t -s';'
    

     

    14 hours ago, boomam said:

    Let me know if you want/think some diagnostic logs would help diagnose.

    Please post them (and the time when the pool stopped working).

     

     

    Link to comment
    4 hours ago, mgutt said:

    Yes. Execute this command multiple times and check which files are updated frequently:

    find /var/lib/docker -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head -n100

     

    If you are using XFS, something will happen in the overlay2 subfolder. If you are using BTRFS, it will be in the btrfs/subvolumes folder. The BTRFS subvolume<>container dependencies can be check as follows:

    for f in /var/lib/docker/image/btrfs/layerdb/mounts/*/mount-id; do echo $(dirname $f | xargs basename | cut -c 1-12)' (Container-ID) > '$(cat $f)' (BTRFS subvolume ID)'; done && docker ps -a

     

    Please post them (and the time when the pool stopped working).

     

     

    The scripts you shared are working perfectly. TBH, my writes ( I have been monitoring for a good time now ) are not excessive. The first script is barely showing any activiy for my containers. I see some repepetive logs for json files from /docker/containers. But I am assuming this is a RAMDISK for status and logs. Correct me if I am wrong, please. 

    Another question, the appdata is located on my NVMe SSD cache pool. My understanding there will be some kind of writes from dockers regardless, correct? Besides that I left the script to dump the log/status files for safety reason every 30 minutes. 

    BTW, I can see that my writes are basically coming from the OnlyOffice Document server docker ( and some from Nextcloud ). It's connected to my Nextcloud server as a document editor. I am not sure if you are familiar or using it. 

     

    Edit:

    Nextcloud was in the tmp folder. Added extra parameter to mount the tmp folder in RAM disk ( not sure if you script handles the tmp in docker containers ). The OnlyOffice, mainly writes go to /run/postgresql every one minute or so. I am assuming is some kind of database system. Do you think it's a good idea to mount it to the RAM disk under the extra parameters? 

    Edited by pervin_1
    Link to comment
    8 hours ago, pervin_1 said:

    Nextcloud was in the tmp folder. Added extra parameter to mount the tmp folder in RAM disk ( not sure if you script handles the tmp in docker containers ).

    My script covers only /docker/containers. Everything what happens inside the container isn't covered as its in the /docker/overlay2 or /docker/image/brtfs path. So yes, it was a good step to add a RAM disk path for the /tmp folder of Nextcloud.

     

    8 hours ago, pervin_1 said:

    The OnlyOffice, mainly writes go to /run/postgresql every one minute or so

    This is something which I would not touch. PostgreSQL is a database. It contains important data which shouldn't be in tbe RAM. Note: if you link a container's /tmp path to a RAM Disk, all data inside this path will be deleted on server reboot.

     

    Note: Using /tmp as a RAM disk is the default behavior of Unraid, Debian and Ubuntu. It seems not to be the default for Alpine, but as such popular distributions use RAM disks for /tmp, I think the application developers do not store important data in /tmp.

    Link to comment
    7 hours ago, mgutt said:

    My script covers only /docker/containers. Everything what happens inside the container isn't covered as its in the /docker/overlay2 or /docker/image/brtfs path. So yes, it was a good step to add a RAM disk path for the /tmp folder of Nextcloud.

     

    This is something which I would not touch. PostgreSQL is a database. It contains important data which shouldn't be in tbe RAM. Note: if you link a container's /tmp path to a RAM Disk, all data inside this path will be deleted on server reboot.

     

    Note: Using /tmp as a RAM disk is the default behavior of Unraid, Debian and Ubuntu. It seems not to be the default for Alpine, but as such popular distributions use RAM disks for /tmp, I think the application developers do not store important data in /tmp.

    lol you were right about the PostgreSQL database. Messed up my OnlyOffice docker. Had to rebuild it, was not a big deal. Got it rebuilt even better. I was testing a lot of lately, to the point where I corrupted my Nextcloud MaridDb container and decided to get rid off my Nextcloud and related containers foor good. I wasn't utilizing it's features to justify the time and effort spent maintaining it. Besides, I heavily rely on Google Drive anyway ( I do back up my stuff from G Drive regardless ). Yes, I am also going to mount /tmp to the Ram Disk inside of the containers where is safe.. My understanding containers are not aware that /tmp is not real "Ram Disk". So the writes go to SSD instead.  I appreciate the hard work and input! 

    Thank you! 

    Link to comment
    30 minutes ago, pervin_1 said:

    My understanding containers are not aware that /tmp is not real "Ram Disk". So the writes go to SSD instead.

    Correct.

    Link to comment
    On 8/17/2021 at 10:09 AM, mgutt said:

    @limetech

    I solved this issue as follows and successfully tested it in:

    • Unraid 6.9.2
    • Unraid 6.10.0-rc1

     

    1. Add this to /boot/config/go (by Config Editor Plugin):
      # -------------------------------------------------
      # RAM-Disk for Docker json/log files
      # -------------------------------------------------
      # create RAM-Disk on starting the docker service
      sed -i '/^  echo "starting \$BASE ..."$/i \
        # move json/logs to ram disk\
        rsync -aH --delete /var/lib/docker/containers/ ${DOCKER_APP_CONFIG_PATH%/}/containers_backup\
        mount -t tmpfs tmpfs /var/lib/docker/containers\
        rsync -aH --delete ${DOCKER_APP_CONFIG_PATH%/}/containers_backup/ /var/lib/docker/containers\
        logger -t docker RAM-Disk created' /etc/rc.d/rc.docker
      # remove RAM-Disk on stopping the docker service
      sed -i '/^  # tear down the bridge$/i \
        # backup json/logs and remove RAM-Disk\
        rsync -aH --delete /var/lib/docker/containers/ ${DOCKER_APP_CONFIG_PATH%/}/containers_backup\
        umount /var/lib/docker/containers\
        rsync -aH --delete ${DOCKER_APP_CONFIG_PATH%/}/containers_backup/ /var/lib/docker/containers\
        logger -t docker RAM-Disk removed' /etc/rc.d/rc.docker
      # Automatically backup Docker RAM-Disk
      sed -i '/^<?PHP$/a \
      $sync_interval_minutes=30;\
      if ( ! ((date('i') * date('H') * 60 + date('i')) % $sync_interval_minutes) && file_exists("/var/lib/docker/containers")) {\
        exec("mkdir /var/lib/docker_bind");\
        exec("mount --bind /var/lib/docker /var/lib/docker_bind");\
        exec("rsync -aH --delete /var/lib/docker/containers/ /var/lib/docker_bind/containers");\
        exec("umount /var/lib/docker_bind");\
        exec("rmdir /var/lib/docker_bind");\
        exec("logger -t docker RAM-Disk synced");\
      }' /usr/local/emhttp/plugins/dynamix/scripts/monitor
    2. Optional: Limit the Docker LOG size to avoid using too much RAM:
      1082044065_2021-08-1910_34_09.png.30f268aeff1d17f254eacd8d1e7bf9a8.png
    3. Reboot server

     

    Notes:

    • By this change /var/lib/docker/containers, which contains only status and log files, becomes a RAM-Disk and therefore avoids wearing out your SSD and allows a permanent sleeping SSD (energy efficient)
    • It automatically syncs the RAM-Disk every 30 minutes to your default appdata location  (for server crash / power-loss scenarios). If container logs are important to you, feel free to change the value of "$sync_interval_minutes" in the above code to a smaller value to sync the RAM-Disk every x minutes.
    • If you like to update Unraid OS, you should remove the change from the Go File until it's clear that this enhancement is still working/needed!

     

    Your Reward:

    415470549_2021-08-1808_28_47.png.b552e3017be7cbf5a825f6e9d8d13885.png

     

    After you enabled the docker service you can check if the RAM-Disk has been created (and its usage):

    image.png.c52c3e62319c9f0fdbb253cb8462d785.png

     

    Screenshot of changes in /etc/rc.d/rc.docker

    763244413_2021-08-2200_03_37.thumb.png.c16142adb6acaf74f90cd37382d2c490.png

     

    and /usr/local/emhttp/plugins/dynamix/scripts/monitor

    783908454_2021-08-2200_01_57.png.7942055f5871d164050cda1f415930bc.png

    I followed this and now my log memory constantly shows 100%. Is this something to be concerned about?

    unraidmemory.PNG

     

    EDIT: this is unrelated, looks like nginx is running away with the logs

    I found another topic (unsolved) that is about this issue: https://forums.unraid.net/topic/86114-nginx-running-out-of-shared-memory/

    Edited by mehappy
    Link to comment

    Set this up on a new cache drive, verified the directory existed. Still saw 12mb/s writes, thinking maybe its that my drive is btfrs as this is higher than it was without this on my xfs cache drive. Moving data off now to change to xfs on new drive and test again.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.