Guide on how to stop excessive writes destroying your cache SSD


Recommended Posts

------------------------------------------------------------------------------------------------------------------

Part 1, unraid write amplification

------------------------------------------------------------------------------------------------------------------

 

To start out with, this is just a journal of my own experiences dealing with these writes, you do any and all of these commands at your own risk! Always make a backup of any data/drives you will be messing with. I would recommend getting a fresh backup of your cache drive and docker/appdata before starting any of this, just in case. Honestly it is a good excuse to update your complete backup.

 

Ok, so as many have started to realize unraid had a serious issue with massively inflated writes to the cache SSD for the last few years. To the point it has killed a number of SSD's in a very short amount of time and used up a hefty amount of the life of many other drives.

 

A lot of it was documented here:

but instead of reading all that I am going to give you the results of all the testing that went on in that thread.

 

My writes when starting this journey with far less dockers then I have now was around :

 

~200gb+/day IIRC (forgot exact numbers and lot my notes from that long ago but it was a LOT)

 

The first step to reducing writes is to update to upraid 6.9+ and then move all the data off your cache SSD's to the array temporarily. You will then erase the cache pool using the built in erase option and reformat it when you restart the array. This fixes the core unraid side of the excessive writes. It fixes some partition and mounting issues with the filesystem.

 

After that move the data back to the cache from the array.

 

This dropped my writes to around ~75-85gb/day using a single BTRFS formatted drive with BTRFS image.

Formatted as XFS with BTRFS image my writes dropped to ~25gb/day but you can't have redundancy then and has it's own issues.

 

The excessive writes still persist as you see just to a lesser extent after this, the remaining writes will be dependent on what dockers you are using and is an issue with docker.

 

 

 

 

 

------------------------------------------------------------------------------------------------------------------

Part 2: Docker logs causing write inflation

------------------------------------------------------------------------------------------------------------------

All the docker commands I will put below need to be entered into the

 

Extra Parameters:

 

firefox_OpF4F1Puh0.thumb.jpg.6ea94ec3203b9eae2b6e605eb92649f8.jpg

 

section of the docker template in unraid (you will need to go to the advanced view in the top right corner)

 

To match up a long container ID with  container in unraid GUI, simply use crtl+f to search the docker page in unraid for the container ID you see in the activity logs. Generally the first 3 or 4 characters are enough to find the right container.

firefox_ONkkVLLBwg.jpg.cd01055489bbd9361d95eb87e222a23b.jpg

 

firefox_b3p984c2at.jpg.55995941f1c9b02ff510122f66fabcc6.jpg

 

There are a few basic places writes come from with docker and each has it's own fix.

 

------------------------------------------------------------------------------------------------------------------

 

The first step is to run the inotifywait command from mgutt:

 

This command will watch the internal docker image for writes and log them to /mnt/user/system/recentXXXXXX.txt

 

inotifywait -e create,modify,attrib,moved_from,moved_to --timefmt %c --format '%T %_e %w %f' -mr /var/lib/docker > /mnt/user/system/recent_modified_files_$(date +"%Y%m%d_%H%M%S").txt

 

An alternate and less effective method is to use this command to return the 100 most recently modified files in the docker image

 

find /mnt/user/system/docker -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head -n100

 

I chose to make a userscript with the first command and then use the "run in background" option so I don't have to keep the terminal open.

 

To kill the inotifywait run this:

pkill -xc inotifywait

 

------------------------------------------------------------------------------------------------------------------

 

For me the first and most common writes came from the internal logging mechanism in docker. It basically logs all the messages that would show up in the terminal if it was run directly and not in docker among other stuff.

 

These writes will be to:

/var/lib/docker/containers/{containerid}/{containerid}-json.log

 

firefox_sy5SWZQ2aL.jpg.878c45e33d99f04494cbbca573eb18b4.jpg

 

These are stopped by the following command while leaving the unriad GUI logs in tact:

--log-driver syslog --log-opt syslog-address=udp://127.0.0.1:541

 

------------------------------------------------------------------------------------------------------------------

 

The next type of writes are from:

/var/lib/docker/containers/{containerid}/container-cached.log

 

firefox_uwZoNDJR94.jpg.bf948b5f2be1bb522fdbcaf4e0f88fdf.jpg

 

These are the logs you see when you click the log option in the unraid gui, these require a stronger version of the above command:

--log-driver none

 

This disables both the above type of logs.

------------------------------------------------------------------------------------------------------------------

 

Next up are the helthcheck logs, these are seen as writes to these files:

/var/lib/docker/containers/{containerID}/.tmp-hostconfig.json{randomnumbers}
/var/lib/docker/containers/{containerID}/.tmp-config.v2.json{randomnumbers}

 

firefox_eV1sJbqMjf.jpg.a77f636824ad4a2dcc2a1e0addf9c697.jpg

 

These are solved by either extending the health checks or disabling them. I prefer extending them to ~1 hour.

--health-interval=60m

 

They can be disabled completely with:

--no-healthcheck

 

------------------------------------------------------------------------------------------------------------------

 

The next type of writes are internal logs from the program in the container to the /tmp directory of the container

/var/lib/docker/containers/{containerid}/tmp/some type of log file
Or
/var/lib/docker/containers/{containerid}/var/tmp/some type of log file
or
/var/lib/docker/subvolumes/{Randomstring}/tmp/some type of log file
This last one is hard to figure out as it can be difficult to connect the subvolume to a container, 
sometimes opening the log file in question can clue you into what docker it is for. 
This is a more advanced rabbit hole that was not really nesscary to chase in my case

 

This is from a program thinking it is writing to a ramdrive but by default docker does not map a ramdrive to the /tmp directory. You can do it yourself easily though with the following command (can be adapted to other dirs and use cases as well).

 

This command creates a ramdrive in /tmp with full read/write permissions and a max size of 256mb (much larger then needed in most cases but it only uses ram as needed so should not hurt anything in most cases, you can make it smaller as well):

--mount type=tmpfs,destination=/tmp,tmpfs-mode=1777,tmpfs-size=256000000

 

And thats pretty much it for properly created containers.

 

Doing these commands to the worst offending containers dropped my writes down to around ~40gb/day

 

I left a few containers logs in tact as I have needed them a few times.

 

 

 

 

 

------------------------------------------------------------------------------------------------------------------

Part 3: Dealing with appdata writes

------------------------------------------------------------------------------------------------------------------

After this things get a bit more complicated. Each container will behave differently and you will kinda have to wing it. I saw random writes to various files in containers, sometimes you could change the logging folder in the program to the /tmp folder and add a ramdisk to the container.

 

Others you cap map another ramdrive to some other log folder and still others can use other workarounds unique to that specific program. It takes some know-how and digging to fix writes internally in the dockers.

 

The alternate and universal option (and required option in many cases) is to simply copy the appdata folder to a ramdisk on unraid and sync it back to the SSD hourly. This works with any docker and vastly reduces writes from logs / constant database access.

 

Like above first you need to log the appdata folder to see where the writes come from:

 

 This command will watch the appdata folder for writes and log them to /mnt/user/system/appdata_recentXXXXXX.txt

 

inotifywait -e create,modify,attrib,moved_from,moved_to --timefmt %c --format '%T %_e %w %f' -mr /mnt/user/appdata/*[!ramdisk] > /mnt/user/system/appdata_recent_modified_files_$(date +"%Y%m%d_%H%M%S").txt

 

From here it will take some detective work to find the misbehaving containers and see what and where they are writing to. In my case all the *arr's (sonarr etc) were causing a lot of writes and there was nothing that could be done internally to fix it.

 

After figuring out which appdata needs to move to the ramdisk is to create the ramdisk itself and then copy the appdata into it from the SSD.

 

First create a folder in /mnt/cache/appdata/, Very important to create the folder on the drive itself and NOT in /user.

 

mkdir /mnt/cache/appdata/appramdisk
chmod 777 /mnt/cache/appdata/appramdisk

 

after this I use a very basic user script that is set to "run at array start", adjust the max size of the disk to suite your use case, it only uses ram as needed so there is not a lot of harm in making it too big as long as it leaves enough room for everything else to run.

 

You will need to customize the rsync commands with the folders you want to copy naturally.

 

#!/bin/bash
#description=This script runs at the start of the array creating the appdata ramdisk and rsyncing the data into it

echo ---------------------------------Create ramdisk for appdata----------------------------------
mount -vt tmpfs -o size=8G appramdisk /mnt/cache/appdata/appramdisk


echo ---------------------------------rsync to ramdisk in appdata----------------------------------
rsync -ah --stats --delete /mnt/user/appdata/binhex-qbittorrentvpn /mnt/user/appdata/appramdisk
rsync -ah --stats --delete /mnt/user/appdata/binhex-nzbhydra2 /mnt/user/appdata/appramdisk
rsync -ah --stats --delete /mnt/user/appdata/*arr /mnt/user/appdata/appramdisk

 

I then have a separate script set to run hourly that rsync's everything in the ramdisk back to the SSD, it only copied the data that was changed to save writes:

 

#!/bin/bash
#description=This script syncs the ramdisk appdata back to the ssd

rsync -ahv --progress --delete /mnt/user/appdata/appramdisk/* /mnt/user/appdata/

 

You will also need to apply a delay to the first docker container that is set to autostart in the unraid GUI (enable advanced view, right side of the container). Preferably put a container that is not being run out of the ramdisk first and put the delay on it as the delay takes effect after the selected container has started.

 

The delay needs to be long enough for the ramdisk rsync to complete.

 

firefox_ApwiNSKgRE.thumb.jpg.21fdead9e206e52b9540a06b9e2628cc.jpg

 

UPDATE THE DOCKER APPDATA FOLDER TO USE THE NEW "appramdisk" copy of the appdata or it will just keep writing to the cache.

 

Now for a clean shutdown, I created a "stop" file on the USB drive at /boot/config. It is called first thing when you click shutdown/reboot in the GUI and the rest of the shutdown will wait until it is finished.

 

touch /boot/config/stop

In the stop file I decided to simply redirect it to a script in user scripts called "Run at Shutdown" to make it easier to manage.

 

#!/bin/bash

#Runs the user script "Run at Shutdown" during shutdown or reboot.
#it is called before anything else during the shutdown process

# Invoke 'Run at Shutdown' script if present
if [ -f /boot/config/plugins/user.scripts/scripts/Run\ at\ Shutdown/script ]; then
  echo "Preparing Run at Shutdown script"
  cp /boot/config/plugins/user.scripts/scripts/Run\ at\ Shutdown/script /var/tmp/shutdown
  chmod +x /var/tmp/shutdown
  logger Starting Run at Shutdown script
  /var/tmp/shutdown
fi

 

The run at shutdown script itself first stops all running docker containers so they can close out open files. It then rsyncs the appramdisk back to the SSD before clearing the ramdisk and unmounting it.

 

#!/bin/bash
#description=This script runs first thing at shutdown or reboot and handles rsyncing appramdisk and unmounting it.

logger Stopping Dockers
docker stop $(docker ps -q)
logger Dockers stopped

logger Started appramdisk rsync
rsync -ah --stats --delete /mnt/user/appdata/appramdisk/* /mnt/user/appdata/ | logger
logger rsync finished

logger clearing appramdisk data
rm -r /mnt/user/appdata/appramdisk/* | logger

logger unmounting appramdisk
umount -v appramdisk | logger

 

And thats it, seems to be working good, no hang-ups when rebooting and everything is working automatically.

 

Risks are minimal for these containers as worst case I loose an hours worth of data from sonarr, big deal. I would not use this on a container that has data you can't afford to loose an hours worth of.

 

The writes are finally low enough that I would be ok putting appdata and docker back onto my main SSD's with redundancy instead of the single piece of junk drive I am using now (which has gone from ~98% life to 69% in the last year doing nothing but handling docker on unraid)

 

I am really impressed with how well this is working.

 

So to recap:

 

unraid 6.8 > BTRFS image > BTRFS formatted cache =

~200gb++/day

 

unraid 6.9 > BTRFS image > separate unprotected XFS SSD everything stock =

~25GB/day

 

unraid 6.9 > BTRFS image > BTRFS SSD everything stock =

75-85GB/day

 

unraid 6.9 > Docker Folder > BTRFS SSD everything stock =

~60gb/day

 

unraid 6.9 > BTRFS image > BTRFS SSD > Disabled the low hanging fruit docker json logs =

~48gb/day

 

unraid 6.9 > BTRFS image > BTRFS SSD > Disabled all misbehaving docker json logs for running containers except those I want to see + added ramdrives to /tmp in containers that do internal logging =

~30gb/day

 

unraid 6.9 > BTRFS image > BTRFS SSD > Disabled all misbehaving docker json logs for running containers except those I want to see + added ramdrives to /tmp in containers that do internal logging + moved appdata for the *arr's and qbittorrent to a ramdisk with hourly rsyncs to the ssd appdata =

~10-12gb/day

 

Since most of the writes are large writes from the rsync, there is very little write amplification which vastly improves the total writes from the day even though that is possibly more raw data being written to the drive.

 

I dont use plex but it and database dockers are known for being FAR worse then what I run in writes. People were regularly seeing hundreds of GB in writes a day from these alone. They could be vastly improved with the above commands.

Edited by TexasUnraid
  • Like 1
  • Thanks 7
Link to comment

These steps would lend themselves to a plugin very well and easily IMHO.

 

The commands are quite simple and a plugin would just need to:

 

setup the logging and let the user figure out the problem containers

Let the user select the containers to tweak

Have check boxes to add each of the above docker fixes to the docker templates

Setup the ramdisk for appdata and calculate the size required so the user can know what to expect

handle the rsync'ing of the appdata to and from the ramdisk at a user selected interval

unmount the ramdisk at shutdown

 

I have no clue how the plugin system works or even where to start but someone that knows what they are doing should be able to figure it out without a lot of hassle I would think.

Edited by TexasUnraid
Link to comment

Thanks for all your hard work here TexasUnraid.

I am quite shocked to see that you can bring it lower than an xfs cache. I just converted my desktop over to unraid over the weekend which has a cache pool and immediately jumped over to check if this was fixed, I will duck down the rabbit hole maybe in the coming evenings and see if I can replicate what you have done here.

 

Couple of questions, I am using Influxdb and telegraf to monitor writes, do you mind sharing the monitoring software and process you were using to measure this as well?

Link to comment

Never heard of Influxdb, it looks like just a database, how does it collect writes? Sounds interesting. Got a link to what you are using?

 

Me I am just monitoring the raw LBA writes from the SSD smart data, this is the only way to see the actual writes to the SSD, every other method will not correctly account for things like write amplification.

 

The exact command varies depending on how the drive calculates the LBA's but here is the script I use or just enter the data into this site and play with the numbers until the writes look right lol.

 

https://www.virten.net/2016/12/ssd-total-bytes-written-calculator/

 

As you can see the samsung, liteon and sandisk all use different calculations for the LBA's to MB

#!/bin/bash
#description=Basic script to display the amount of data written to SSD on drives that support this. Set "argumentDefault" to the drive you want if you will schedule this.
#argumentDescription= Set drive you want to see here
#argumentDefault=sdm


#path to save logs
DIR=/mnt/user/Backup/Smart\ reports/SSD\ TBW\ Logs

### Argument selected drive, replace argument above with label of drive you want TBW calculated for  ###
DRIVE1="$1"
DRIVE1NAME=Argument_SSD

#hard coded Samsung compatible LBA reporting drives
DRIVE2=disk/by-id/ata-Samsung_SSD_860_EVO_1TB_
DRIVE2NAME=Samsung-_1TB-860_EVO_1
DRIVE3=disk/by-id/ata-Samsung_SSD_860_EVO_1TB_
DRIVE3NAME=Samsung-_1TB_860_EVO_2

#Sandisk compatible LBA reporting drives
DRIVE10=disk/by-id/ata-SanDisk_
DRIVE10NAME=Sandisk_256GB_ssd

#liteon compatible LBA reporting drives
DRIVE11=disk/by-id/ata-LITEONIT_LCT-
DRIVE11NAME=Liteon_128GB_SSD


echo ---------------- First drive you want to log from argument "$1" ----------------
sudo smartctl -A /dev/"$1" |awk '
$0 ~ /Power_On_Hours/ { poh=$10; printf "%s / %d hours / %d days / %.2f years\n",  $2, $10, $10 / 24, $10 / 24 / 365.25 }
$0 ~ /Total_LBAs_Written/ {
   lbas=$10;
   bytes=$10 * 512;
   mb= bytes / 1024^2;
   gb= bytes / 1024^3;
   tb= bytes / 1024^4;
   #printf "%s / %s  / %d mb / %.1f gb / %.3f tb\n", $2, $10, mb, gb, tb
     printf "%s / %.2f gb / %.2f tb\n", $2, gb, tb
   printf "mean writes per hour:  / %.3f gb / %.3f tb",  gb/poh, tb/poh
}
$0 ~ /Wear_Leveling_Count/ { printf "%s / %d (%% health)\n", $2, int($4) }
' |
   sed -e 's:/:@:' |
   sed -e "s\$^\$/dev/"$1" @ \$" |
   column -ts@


# Get the TBW of /dev/s!db
TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/"$1" | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^4 }')
TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/"$1" | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^3 }')
TBWSDB_MB=$(/usr/sbin/smartctl -A /dev/"$1" | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^2 }')

echo "TBW on $DRIVE1 $(date +"%d-%m-%Y %H:%M:%S") -->> $TBWSDB_TB TB, which is $TBWSDB_GB GB, which is $TBWSDB_MB MB." >> "$DIR/TBW_$DRIVE1NAME.log"


echo
echo ---------------- Samsung Compatible LBA $DRIVE2 ----------------
sudo smartctl -A /dev/$DRIVE2 |awk '
$0 ~ /Power_On_Hours/ { poh=$10; printf "%s / %d hours / %d days / %.2f years\n",  $2, $10, $10 / 24, $10 / 24 / 365.25 }
$0 ~ /Total_LBAs_Written/ {
   lbas=$10;
   bytes=$10 * 512;
   mb= bytes / 1024^2;
   gb= bytes / 1024^3;
   tb= bytes / 1024^4;
   #printf "%s / %s  / %d mb / %.1f gb / %.3f tb\n", $2, $10, mb, gb, tb
     printf "%s / %.2f gb / %.2f tb\n", $2, gb, tb
   printf "mean writes per hour:  / %.3f gb / %.3f tb",  gb/poh, tb/poh
}
$0 ~ /Wear_Leveling_Count/ { printf "%s / %d (%% health)\n", $2, int($4) }
' |
   sed -e 's:/:@:' |
   sed -e "s\$^\$/dev/$DRIVE2 @ \$" |
   column -ts@


# Get the TBW of /dev/s!db
TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/$DRIVE2 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^4 }')
TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/$DRIVE2 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^3 }')
TBWSDB_MB=$(/usr/sbin/smartctl -A /dev/$DRIVE2 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^2 }')

echo "TBW on $DRIVE2 $(date +"%d-%m-%Y %H:%M:%S") -->> $TBWSDB_TB TB, which is $TBWSDB_GB GB, which is $TBWSDB_MB MB." >> "$DIR/TBW_$DRIVE2NAME.log"


echo
echo ---------------- Samsung Compatible LBA $DRIVE3 ----------------
sudo smartctl -A /dev/$DRIVE3 |awk '
$0 ~ /Power_On_Hours/ { poh=$10; printf "%s / %d hours / %d days / %.2f years\n",  $2, $10, $10 / 24, $10 / 24 / 365.25 }
$0 ~ /Total_LBAs_Written/ {
   lbas=$10;
   bytes=$10 * 512;
   mb= bytes / 1024^2;
   gb= bytes / 1024^3;
   tb= bytes / 1024^4;
   #printf "%s / %s  / %d mb / %.1f gb / %.3f tb\n", $2, $10, mb, gb, tb
     printf "%s / %.2f gb / %.2f tb\n", $2, gb, tb
   printf "mean writes per hour:  / %.3f gb / %.3f tb",  gb/poh, tb/poh
}
$0 ~ /Wear_Leveling_Count/ { printf "%s / %d (%% health)\n", $2, int($4) }
' |
   sed -e 's:/:@:' |
   sed -e "s\$^\$/dev/$DRIVE3 @ \$" |
   column -ts@


# Get the TBW of /dev/s!db
TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/$DRIVE3 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^4 }')
TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/$DRIVE3 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^3 }')
TBWSDB_MB=$(/usr/sbin/smartctl -A /dev/$DRIVE3 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 * 512 / 1024^2 }')

echo "TBW on $DRIVE3 $(date +"%d-%m-%Y %H:%M:%S") -->> $TBWSDB_TB TB, which is $TBWSDB_GB GB, which is $TBWSDB_MB MB." >> "$DIR/TBW_$DRIVE3NAME.log"


echo
echo ---------------- Sandisk LBA drive you want to log $DRIVE10 ----------------
#this uses a different LBA to byte calculation for sandisk SSD drives

sudo smartctl -A /dev/$DRIVE10 |awk '
$0 ~ /Power_On_Hours/ { poh=$10; printf "%s / %d hours / %d days / %.2f years\n",  $2, $10, $10 / 24, $10 / 24 / 365.25 }
$0 ~ /Total_LBAs_Written/ {
   lbas=$10;
   bytes=$10;
   mb= bytes / 64 * 1000;
   gb= bytes / 64 ;
   tb= bytes / 64 / 1000;
   #printf "%s / %s  / %d mb / %.1f gb / %.3f tb\n", $2, $10, mb, gb, tb
     printf "%s / %.2f gb / %.2f tb\n", $2, gb, tb
   printf "mean writes per hour:  / %.3f gb / %.3f tb",  gb/poh, tb/poh
}
$0 ~ /Wear_Leveling_Count/ { printf "%s / %d (%% health)\n", $2, int($4) }
' |
   sed -e 's:/:@:' |
   sed -e "s\$^\$/dev/$DRIVE10 @ \$" |
   column -ts@


# Get the TBW of /dev/s!db
TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/$DRIVE10 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 / 64 / 1000}')
TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/$DRIVE10 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 / 64 }')
TBWSDB_MB=$(/usr/sbin/smartctl -A /dev/$DRIVE10 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 / 64 * 1000 }')

echo "TBW on $DRIVE10 $(date +"%d-%m-%Y %H:%M:%S") -->> $TBWSDB_TB TB, which is $TBWSDB_GB GB, which is $TBWSDB_MB MB." >> "$DIR/TBW_$DRIVE10NAME.log"


echo
echo ---------------- Liteon LBA drive you want to log $DRIVE11 ----------------
#this uses a different LBA to byte calculation for liteon drives

sudo smartctl -A /dev/$DRIVE11 |awk '
$0 ~ /Power_On_Hours/ { poh=$10; printf "%s / %d hours / %d days / %.2f years\n",  $2, $10, $10 / 24, $10 / 24 / 365.25 }
$0 ~ /Total_LBAs_Written/ {
   lbas=$10;
   bytes=$10;
   mb= bytes / 32 * 1000;
   gb= bytes / 32 ;
   tb= bytes / 32 / 1000;
   #printf "%s / %s  / %d mb / %.1f gb / %.3f tb\n", $2, $10, mb, gb, tb
     printf "%s / %.2f gb / %.2f tb\n", $2, gb, tb
   printf "mean writes per hour:  / %.3f gb / %.3f tb",  gb/poh, tb/poh
}
$0 ~ /Wear_Leveling_Count/ { printf "%s / %d (%% health)\n", $2, int($4) }
' |
   sed -e 's:/:@:' |
   sed -e "s\$^\$/dev/$DRIVE11 @ \$" |
   column -ts@


# Get the TBW of /dev/s!db
TBWSDB_TB=$(/usr/sbin/smartctl -A /dev/$DRIVE11 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 / 32 / 1000}')
TBWSDB_GB=$(/usr/sbin/smartctl -A /dev/$DRIVE11 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 / 32 }')
TBWSDB_MB=$(/usr/sbin/smartctl -A /dev/$DRIVE11 | awk '$0~/LBAs_Written/{ printf "%.1f\n", $10 / 32 * 1000 }')

echo "TBW on $DRIVE11 $(date +"%d-%m-%Y %H:%M:%S") -->> $TBWSDB_TB TB, which is $TBWSDB_GB GB, which is $TBWSDB_MB MB." >> "$DIR/TBW_$DRIVE11NAME.log"

 

Link to comment
  • 2 weeks later...

Hey, what a great work. Thanks for this. 

I have some questions about some Topic. 
I just created my Unraid, so im a noob :D. So please be patient with me and maybe some of my questions :D

 

So in the last days i saw a lot of random writes to my cache. I think the docker app pi-hole does a lot of it. But im not 100% sure. I just disabled it and it seems to be a bit less.

So my first question is, how did you see the exact gb witten to your ssd? Im only able to see the total writes in the main section in unraid. 
And also, how can i see what type of format does my cache have. And how can i change it for the better one. Maybe there is a step by step guide? 

Unraid is so feature rich, so i need to start low :D 

 

Thanks again for the post here and the help for the community

Link to comment

2 posts up shows the script I use to track SSD writes but if you are new to this, it is a bit advanced for you.

 

You can simply click the SSD in the main unraid page and look for "total LBA written" in the smart report.

 

Enter that number into this site: https://www.virten.net/2016/12/ssd-total-bytes-written-calculator/

 

You might have to play around with the multiplier to get the correct conversion to GB depending on the drive brand. This is the only way to see the actual writes to the drive.

 

The format is also listed on the main unraid page under "FS" (aka file system).

 

To change the format of the drive there should be a video on https://www.youtube.com/c/SpaceinvaderOne youtube.

Link to comment
  • SpencerJ changed the title to Guide on how to stop excessive writes destroying your cache SSD
  • 2 weeks later...

Excellent and detailed guide, thank you very much! I pretty much completed the first two steps. But did not want to gamble with the third step since I do have sensitive data with the rest of the containers. Not that the 3rd step is hard to follow, but as you mentioned, you don't want to implement these steps with sensitive information like database, Nextcloud and etc. My basic calculation show that my two NVME Adata drives had in average 180-192GB of random writes daily staring from 2018-2019. Glad randomly decided to read this topic delivered to my email box lol. Noneless, always make sure to have a backup. I made some mistakes, experienced some problems with my appdata folder with the "mover" function. Made a few important dockers unusable. Restored the dockers from the backup, boom - ready to Rock & Roll!

Again, I appreciate the effort and all time taken to write this guide! You are awesome! 

Link to comment
On 7/3/2021 at 5:03 PM, TexasUnraid said:
/var/lib/docker/containers/{containerid}/{containerid}-json.log

 

 

I found an alternative solution:

https://forums.unraid.net/bug-reports/stable-releases/683-unnecessary-overwriting-of-json-files-in-dockerimg-every-5-seconds-r1079/?tab=comments#comment-15472

 

By that "--no-healthcheck" isn't needed anymore.

 

On 7/3/2021 at 5:03 PM, TexasUnraid said:

This is from a program thinking it is writing to a ramdrive but by default docker does not map a ramdrive to the /tmp directory. You can do it yourself easily though with the following command (can be adapted to other dirs and use cases as well).

 

This command creates a ramdrive in /tmp with full read/write permissions and a max size of 256mb (much larger then needed in most cases but it only uses ram as needed so should not hurt anything in most cases, you can make it smaller as well):

--mount type=tmpfs,destination=/tmp,tmpfs-mode=1777,tmpfs-size=256000000

 

You missed to explain, that it is needed to add this to "extra parameters" of a container. The "tmpfs-mode" can be removed as 1777 is the default.

 

PS I'm using the same trick for Plex and changed the transcode path to /tmp:

image.png.930c27e84dba347999832db5314f5c9e.png

 

Another idea would be to replace all these paths by tmpfs mounts:

find /var/lib/docker/overlay2/*/merged/tmp -type d -maxdepth 0

 

But I'm not sure about this as docker changes the ids everytime the container is re-created. Needs testing.

Link to comment

First part

45 minutes ago, mgutt said:

 

Interesting idea, I had not even considered that. I like it and think I will switch my setup over to that, it is just a few extra commands to add to my existing sync script.

 

I will add this to the OP as well.

 

1 hour ago, mgutt said:

You missed to explain, that it is needed to add this to "extra parameters" of a container. The "tmpfs-mode" can be removed as 1777 is the default.

 

I said at the start of the section that all commands need to be entered into the extra parameters, maybe that was not enough?

1 hour ago, mgutt said:

Another idea would be to replace all these paths by tmpfs mounts:

find /var/lib/docker/overlay2/*/merged/tmp -type d -maxdepth 0

 

But I'm not sure about this as docker changes the ids everytime the container is re-created. Needs testing.

 

Another interesting idea but also not sure of the ramifications.

Link to comment
1 minute ago, TexasUnraid said:

Interesting idea, I had not even considered that. I like it and think I will switch my setup over to that, it is just a few extra commands to add to my existing sync script.

Sadly it does not work. Something is really strange between /var/lib/docker and /mnt/user/system/docker. I tried several ram disk paths, but it seems to write to an invisible path.

Link to comment
Just now, mgutt said:

Sadly it does not work. Something is really strange between /var/lib/docker and /mnt/user/system/docker. I tried several ram disk paths, but it seems to write to an invisible path.

 

Yeah, I had this issue when dealing with the inotifywait command.

 

Have you tried working directly in the /var/lib/docker? This is how I got the inotifywait to work and better yet it worked fine with both folder and image docker setups.

Link to comment

Ok, I found the problem. Although I created the ram disk under /mnt/cache/system/docker:

df
Filesystem        1K-blocks         Used   Available Use% Mounted on
tmpfs              32850360          176    32850184   1% /mnt/cache/system/docker/docker/containers

 

It is not part of /var/lib/docker (which is a mount of /mnt/cache/system/docker/docker)

ls -la /mnt/cache/system/docker/docker/containers
total 0
drwx-----x  5 root root 100 Aug 17 17:08 ./
drwx--x--x 16 root root 271 Aug 17 17:49 ../
drwx-----x  4 root root 220 Aug 17 17:39 48d6ab4b4ef6c6a480f59a8b8f7152be3d9e7a2b3c3f1e95a87225e1a25c8cfd/
drwx-----x  4 root root 220 Aug 17 17:41 7d92ab7aad8a794e78635a346b700585a30af1bea35946ed6a5146a1799bfd27/
drwx-----x  4 root root 200 Aug 17 17:41 82f6d43a65b699d1c988349274720cd59b230866159d638c5278c701ee66e3b1/
ls -la /var/lib/docker/containers
total 0
drwx-----x  2 root root   6 Aug 17 17:47 ./
drwx--x--x 16 root root 271 Aug 17 17:49 ../

 

Not sure why this happens, but I can't add a ram disk inside of /var/lib/docker as after this paths contains files, the docker service is already running, which would break the containers. I even tried to create a tmpfs mount on /var/lib/docker/containers before I start the docker service, but starting the docker service creates the mount /var/lib/docker > /mnt/cache/system/docker/docker, which overwrites the tmpfs mount 🙈

Link to comment

Sad, it is a good idea and would solve a lot of issues.

 

Good enough that it should be done in the official unraid docker mount command IMHO. It would not be hard at all to sort it out and the timing of the syncs could be adjusted in the docker settings menu easily.

Edited by TexasUnraid
Link to comment

@TexasUnraid

Ok finally solved it:

https://forums.unraid.net/bug-reports/stable-releases/683-unnecessary-overwriting-of-json-files-in-dockerimg-every-5-seconds-r1079/?tab=comments#comment-15472

 

By that the RAM-Disk is created every time the docker service starts and removed if is stopped.

 

I tested it only with the docker "path" option, but as the ram disk is created on "/var/lib/docker/containers" it should work for the docker.img as well.

 

My Plex container now runs without the "--no-healthcheck" option and I have no write activity on my SSD 😀 In addition all container logs are in the RAM Disk as well. So killed two birds with one stone 🥳

Link to comment

A very involved fix but should work. Although think I will stick with the setup I have now. Once I start making permanent changes that will not flow through updates is where I tend to draw the line unless there are no other options.

 

Still very good to have a complete option to deal with the writes, I will keep it in mind if I have to keep dealing with it.

 

Seems like something that could be added into unraid in general considering the write it saves and how easy it would be to implement.

Link to comment
1 hour ago, JonathanM said:

I've not followed this very closely, I have a couple questions.

 

1. How much RAM would this use, worst case?

2. If the power is pulled mid write, will the docker subsystem be ok with it on next boot?

 

1: I did a du of the folder that is being converted to a ramdrive and even with my 70 dockers I only had 60mb in it. Figure it would grow some as the logs grew but still I don't see it growing more then ~256mb even with all my dockers unless there is a misbehaving docker with far too many log writes or VERY long uptimes. Edit: come to think of it, the docker log rotation setting should limit the max size it could grow to.

 

2: This folder seems to only hold log files, so docker should not care if a write was missed as long as the folders exist (it might even re-create them if they are missing, not sure). During testing I deleted a few of the log files and don't remember there being any issues, it simply started a new one.

 

That said an hourly rsync to the backup folder would not be a horrible idea, it could be a adjustable setting in the docker settings as well if this was combined into the official docker. This basically cut my writes in half.

 

Worst possible case, the docker can simply be re-installed / updated and it would correct any issues.

Edited by TexasUnraid
Link to comment

@TexasUnraid

I rewrote the enhancement again. Now it's only needed to change the Go-File:

https://forums.unraid.net/bug-reports/stable-releases/683-unnecessary-overwriting-of-json-files-in-dockerimg-every-5-seconds-r1079/?tab=comments#comment-15472

 

And it even includes an auto backup feature (more in #2 of this post).

 

On 8/18/2021 at 3:24 PM, JonathanM said:

1. How much RAM would this use, worst case?

2. If the power is pulled mid write, will the docker subsystem be ok with it on next boot?

 

1.) After a fresh reboot the RAM-Disk uses 328KB:

image.png.d53045ddef048eb36f126fe2fa2ee3ef.png

EDIT: After 60 days it uses 500MB (11 containers):

image.png.1810b80197839de57a8a503d812eb78e.png

 

It all depends on the container logs. But they can be limited through the docker settings:

image.png.aa13ce63607cc26937831ff20a4488af.png

 

By that 10 containers with each having a full log file (which is rather unlikely), would use 500MB of RAM.

 

2.) At first you should know: /var/lib/docker/containers contains "useless" data like the container logs, healthcheck state and network settings. Each folder will be deleted and re-created if you change something in the container's settings. They are only re-used on starting/stoping a container or rebooting the server. If they would miss, the containers aren't loaded anymore, but you could easily re-install all containers through Apps > Previous Apps without data loss. I tested this by simply deleting the folder:

rm -r /var/lib/docker/containers/*

 

But: To avoid the loss of these folders on server crash or power-loss, they are automatically synced to disk everytime you open the docker tab. This creates a syslog entry:

image.png.d44b1af0fdf6cf35ed15465be934efae.png

 

But what happens if you don't access the docker tab and the server crashes? This means the container log entries which are added after the backup, are lost. So if your container logs are super important to you, don't use this enhancement (or add a cron job).

Link to comment

very interesting idea, I guess it still counts on unraid not changing the docker file too much but interesting for sure. I will have to think about if I want to implement it. Leaning towards yes.

 

Personally I would just setup a script to sync the ramdisk hourly. I do that with my ramdisk now and still get 1/8th the writes I used to get. Could even sync it to the array as well I suppose if you have a drive you don't spin down.

Link to comment
On 8/19/2021 at 2:48 PM, TexasUnraid said:

Personally I would just setup a script to sync the ramdisk hourly.

Of course possible, but finally I don't see the benefit. I mean, when do we need the container log files? There is only one scenario: Multiple Server Crashes where we like to investigate why they happen. So we would do the following steps:

- remove/disable the docker RAM-Disk enhancement to check the very last log entries

- Enable Mirror syslog to flash to check the very last log entries

- connect a monitor to our server / check output through IPMI to check direct output of errors

 

If you instead create only every hour a backup of the RAM-Disk / container logs, they will be to old to contain relevant information.

 

Link to comment

@TexasUnraid

By this command, it's now easier to check which path belongs to which container:

 

csv="CONTAINER;PATHS\n"; for f in /var/lib/docker/image/*/layerdb/mounts/*/mount-id; do subid=$(cat $f); idlong=$(dirname $f | xargs basename); id="$(echo $idlong | cut -c 1-12)"; name=$(docker ps --format "{{.Names}}" -f "id=$id"); [[ -z $name ]] && continue; csv+="\n"$(printf '=%.0s' {1..20})";"$(printf '=%.0s' {1..100})"\n"; [[ -n $name ]] && csv+="$name;" csv+="/var/lib/docker/(btrfs|overlay2).../$subid\n"; csv+="$id;"; csv+="/var/lib/docker/containers/$idlong\n"; for vol in $(docker inspect -f '{{ range .Mounts }}{{ if eq .Type "volume" }}{{ .Destination }}{{ printf ";" }}{{ .Source }}{{ end }}{{ end }}' $id); do csv+="$vol\n"; done; done; echo ""; echo -e $csv | column -t -s';'; echo "";

 

Sample output:

image.png.8f872d58101ec1d37f989aca1d6229a6.png

 

And I optimized the "find" command as well:

find /var/lib/docker -type f -not -path "*/diff*" -print0 | xargs -0 stat --format '%Y:%.19y %n' | sort -nr | cut -d: -f2- 2> /dev/null | head -n30 | sed -e 's|/merged|/...|; s|^[0-9-]* ||'

 

Sample output:

image.thumb.png.cc853faa68efcc91f5ea5241f031484d.png

 

PS Instead of this:

--mount type=tmpfs,destination=/tmp,tmpfs-mode=1777,tmpfs-size=256000000

 

I prefer creating a new path and link it to Unraid's /tmp, which is already a RAM-Disk:

image.png.996701576dbd4c240547baec38796f95.png

 

Another example:

image.png.fad9de6e0b7cb4fcd2617e5b2ae31065.png

 

The tmp size is already limited to 50% of the RAM and finally no applicaton should write so much temporary data. My total size in Unraid is 57MB:

du -sh /tmp
57M     /tmp

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.