[Guide] How to keep cache drive full of media


Recommended Posts

I have the desire to keep my cache drive full-ish of new media, so that : 

 

* Newly downloaded torrents seed from NVME rather than disk

* Users watching media have better performance (and more recent downloads are more likely to be watched)

* My disks stay spun down more, for power consumption and noise

* In flight torrent/usenet downloads that are happening when mover is triggered do not get moved mid-download

 

* When new media is downloaded that would mean the cache is too full, have the mover move off the oldest files to stay under the threshold. (keep newest files on cache up to threshhold)

 

the advantage of my script over what you can do with mover tuning natively is : The settings on age and size will delay moves, but once you reach those triggers, it will move everything that matches. This could result in completely emptying your cache, or moving more than it needed to.

For example, lets say you start with an empty cache, and have cache set to move older than 90 days, at 50% full. then download enough content all on the same day to reach 49% of your cache space. Nothing happens. 90 days go by. Nothing happens. You download 2% additional content. All 49% of the 90 day old content will be moved all at once, leaving you at 2% cache.

My script will only move the oldest content enough to drop your usage below the desired threshhold. Whats oldest could be yesterday, or could be a decade.

 

WARNING : ONLY USE THIS SCRIPT IF YOU HAVE MIRRORED CACHE, OR REALLY DON'T CARE IF YOU LOSE ITEMS HELD IN CACHE. (Because without mirrored cache, items kept in cache are not protected from drive failure)

 

The Mover Tuning plugin supports two features we can use to make this happen : 

1) Run a script before move

2) Don't move any files which are listed in a given text file. 

 

So, all we have to do is make a script that puts the newest files into a text file. 

 

Setup

Copy the script into your appdata or somewhere (preferably someplace that won't get moved by mover). You may need to chmod the file to make it executable.  I recommend appdata, because if you put it in data it may either get moved by the mover, or reading the file will spin up a drive when maybe there is nothing to actually move. 

 

Modify the variables at the top of the file as needed.

 

Add the script to mover tuner to run before moves. 

image.png.a1860024430baf5cb0f22071b42b4a

 

Set mover tuner to ignore files contained in the output filename : 

image.png.85709674756918a9aecc0387c6cee6

 

 

If you run the script by hand (bash moverignore.sh) you can see the files that it will keep on the cache. 

You can then run the following command to test if mover will not move the files in question (any files in the ignore file, should not appear in the output of this command)   

find "/mnt/cache/data" -depth | grep -vFf '/mnt/user/appdata/moverignore.txt'

 

You can also run mover in the commandline, and verify that it does not move any of the listed files (but should continue to move unlisted files)

 

Additionally, if you want to manually move a directory off of cache you can run the following command

 

find /mnt/user/cache/DirectoryNameGoesHere* -type f | /usr/local/sbin/move&

Edited by Terebi
clarify advantage
  • Like 3
Link to comment
#!/bin/bash

# Define variables
TARGET_DIR="/mnt/cache/data"
OUTPUT_DIR="/mnt/user/appdata"
OUTPUT_FILE="$OUTPUT_DIR/moverignore.txt"
MAX_SIZE="500000000000"  # 500 gigabytes in bytes
EXTENSIONS=("mkv" "srt" "mp4" "avi" "rar")

# Ensure the output directory exists
mkdir -p "$OUTPUT_DIR"

# Cleanup previous temporary files
rm -f "$OUTPUT_DIR/temp_metadata.txt" "$OUTPUT_DIR/temp_filtered_metadata.txt"
rm -f $OUTPUT_FILE

# Step 1: Change directory to the target directory
cd "$TARGET_DIR" || exit

# Step 2: Find files with specified extensions and obtain metadata (loop through extensions)
for ext in "${EXTENSIONS[@]}"; do
    find "$(pwd)" -type f -iname "*.$ext" -exec stat --printf="%i %Z %n\0" {} + >> "$OUTPUT_DIR/temp_metadata.txt"
done

# Step 3: Sort metadata by ctime (second column) in descending order
sort -z -k 2,2nr -o "$OUTPUT_DIR/temp_metadata.txt" "$OUTPUT_DIR/temp_metadata.txt"


# Step 4: Get the newest files up to the specified size limit
total_size=0
processed_inodes=()
while IFS= read -r -d $'\0' line; do
    read -r inode ctime path <<< "$line"
	
	  # Skip if the inode has already been processed
if [[ "${processed_inodes[*]}" =~ $inode ]]; then
        continue
    fi
	
    size=$(stat --printf="%s" "$path")

    if ((total_size + size <= MAX_SIZE)); then
        echo "Processing file: $total_size $path"  # Debug information to screen
        #echo "$path" >> "$OUTPUT_FILE"  # Appending only path and filename to the file
        total_size=$((total_size + size))
		
		 # Mark the current inode as processed
        processed_inodes+=("$inode")

        # Step 4a: List hardlinks for the current file
		hard_links=$(find "$TARGET_DIR" -type f -samefile "$path")
		if [ -n "$hard_links" ]; then
				echo "$hard_links" >> "$OUTPUT_FILE"
		else
			echo $path >> $OUTPUT_FILE
		fi

    else
        break
    fi
done < "$OUTPUT_DIR/temp_metadata.txt"

# Step 5: Cleanup temporary files
rm "$OUTPUT_DIR/temp_metadata.txt"

echo "File list generated and saved to: $OUTPUT_FILE"

 

* 1.0 Initial public post

* 1.1 Fix not clearing main output file, skip file if same hardlink already processed. 

* 1.2 Fix hardlinks not outputting correctly. 

* 1.3 Sort explicitly by date, in case inodes getout of order

Edited by Terebi
fix sort by ctime
Link to comment

Really excited to try this out.  Been having problems with getting mover tuning to run 12.6 or 12.8.  Only standard mover seems to be triggering when forced from a share.  I will take some time to try the script for sure. Love the idea of moving some data from the performance tier to the capacity tier based on age.  Brilliant.

Edited by wuudogg
Link to comment

Thanks! I'm in the process of implementing similar functionality in the plugin and also fix some bugs. It might be some time though so for the time being I think this is an excellent solution for those that want it :) 

Great work and appreciate the effort!

Edited by Swarles
  • Like 1
Link to comment
5 minutes ago, flyize said:

So wait, is MAX_SIZE how much free space to keep on the disk? And then it adds older files until it hits that number? Sorry, I'm terrible with bash scripts.

Moz it's the amount of space to use for files to stay on cache. So if you have a 1tb drive you would like to leave 25% free, set it to 750gb in bytes.. this does not account for files in appdaya etc that may also be taking up spacz so adjust accordingly.

Link to comment

So in your example, if it gets over 750GB, the script will make a list of files from newest to oldest. Then remove the oldest file from the list and recalculate total space used, and do this over and over until it gets under 750GB. Then pass all this off to the Mover, with a list of files it can't touch. Am I understanding correctly?

Link to comment

Kind of the other way. Each night when mover runs it will run the script and list files up to 750gb from newest to oldest.  The mover will then ignore those. 

 

So on day 1 there will only be a small handful of files listed (whatever you downloaded that day). And that list will keep growing each day until you get to 750gb. At that point the oldest files will not be added into the list and will get moved.

Link to comment

Although you can use the other mover tuning settings like age and threshold the script works fine by moving every day so that everything not listed keeps moving off cache (files of different extensions) and the script sort of self manages keeping cache full 

 

Link to comment

Just so that I can keep track of all my scripts in one place, can I use User Scripts? It puts scripts in /boot/config/plugins/user.scripts/scripts/. I'd still call if from the Mover Tuner, but at least all my scripts are in one place then.

Link to comment
On 3/6/2024 at 2:13 PM, flyize said:

Just so that I can keep track of all my scripts in one place, can I use User Scripts? It puts scripts in /boot/config/plugins/user.scripts/scripts/. I'd still call if from the Mover Tuner, but at least all my scripts are in one place then.

 

You can stick it there, just without a schedule yeah.   As long as the mover plugin has read access to that folder (and it runs as root I think, so it should?)

 

It doesn't really matter where it is, you just want it someplace that it can be run without spinning up drives. 

You will def want tuner to call it tho, because then its up to date as of the moment mover is about to run.  Otherwise data could change between the time the script was run and mover was run. 

Link to comment

Hey thanks for this!  This was pretty much exactly what I needed the mover to do with some slight modifications.  I've attached my version below in case this is useful for anyone else:

 

#!/bin/bash

START_TIME=`date +%s`
DATE=`date`
# Define variables
TARGET_DIRS=("/mnt/cache/plex_lib")
OUTPUT_DIR="/dev/shm/cache_mover"
OUTPUT_FILE="$OUTPUT_DIR/ignore.txt"
MOVE_FILE="$OUTPUT_DIR/moved.log"
LOG_FILE="$OUTPUT_DIR/verbose.log"
MAX_SIZE="500000000000"  # 500 gigabytes in bytes
#EXTENSIONS=("mkv" "srt" "mp4" "avi" "rar")
VERBOSE=false

# Ensure the output directory exists
mkdir -p "$OUTPUT_DIR"

# Ensure the moved log exists
touch $MOVE_FILE

# Cleanup previous temporary files
rm -f "$OUTPUT_DIR/temp_metadata.txt" "$OUTPUT_DIR/temp_all_files.txt"
rm -f $OUTPUT_FILE
# MOVE_FILE and LOG_FILE intentionally kept persistent

for target_dir in "${TARGET_DIRS[@]}"; do
    # Step 1: Change directory to the target directory
    cd "$target_dir" || exit

    # Step 2: Find files with specified extensions and obtain metadata (loop through extensions)
    #for ext in "${EXTENSIONS[@]}"; do
    #    find "$(pwd)" -type f -iname "*.$ext" -exec stat --printf="%i %Z %n\0" {} + >> "$OUTPUT_DIR/temp_metadata.txt"
    #done
    # Step 2(alt): Find all files.  No filter.
    find "$(pwd)" -type f -iname "*" -exec stat --printf="%i %Z %n\0" {} + >> "$OUTPUT_DIR/temp_metadata.txt"
done

# Step 3: Sort metadata by ctime (second column) in descending order
sort -z -k 2,2nr -o "$OUTPUT_DIR/temp_metadata.txt" "$OUTPUT_DIR/temp_metadata.txt"


# Step 4: Get the newest files up to the specified size limit
total_size=0
move_size=0
processed_inodes=()
while IFS= read -r -d $'\0' line; do
    read -r inode ctime path <<< "$line"
    
    # Keep track of all files
    echo $path >> $OUTPUT_DIR/temp_all_files.txt
    
    # Skip if the inode has already been processed
    if [[ "${processed_inodes[*]}" =~ $inode ]]; then
        continue
    fi
    
    size=$(stat --printf="%s" "$path")

    if ((total_size + size <= MAX_SIZE)); then
        if $VERBOSE; then
            echo "$DATE: Processing file: $total_size $path" >> $LOG_FILE  # Debug information to log
        fi
        #echo "$path" >> "$OUTPUT_FILE"  # Appending only path and filename to the file
        total_size=$((total_size + size))
        
        # Mark the current inode as processed
        processed_inodes+=("$inode")

        # Step 4a: List hardlinks for the current file
        #hard_links=$(find "$TARGET_DIR" -type f -samefile "$path")
        #if [ -n "$hard_links" ]; then
        #       echo "$hard_links" >> "$OUTPUT_FILE"
        #else
        #   echo $path >> $OUTPUT_FILE
        #fi
        # Step 4a(alt): Script does not support hardlinks, but is significantly faster and supports multiple TARGET_DIR
        echo $path >> $OUTPUT_FILE

    else
        if $VERBOSE; then
            echo "$DATE: Moving file: $move_size $path" >> $LOG_FILE  # Debug information to log
        fi
        move_size=$((move_size + size))
        
        # Do not add to the move file log if previously added
        if ! grep -q "$path" $MOVE_FILE ; then
            echo "$DATE: $path" >> $MOVE_FILE
        fi
        continue
        #break
    fi
done < "$OUTPUT_DIR/temp_metadata.txt"

# Step 5: Cleanup temporary files
rm "$OUTPUT_DIR/temp_metadata.txt"

END_TIME=`date +%s`

if $VERBOSE; then
    echo "$DATE: File list generated and saved to: $OUTPUT_FILE" >> $LOG_FILE
    echo "$DATE: Execution time: $(($END_TIME - $START_TIME)) seconds." >> $LOG_FILE
fi


Summary:
- Removed hardlinks from step 4a.  I will not have hardlinks in my share, and it significantly improves execution time

- Removed looping over an extension list.  I didn't feel like there was anything in my share that couldn't be moved between cache and array

- Added execution time logging

- Added a 'temp_all_files.txt" file.  This was useful in convincing myself that the ignore.txt was capturing everything on the share and nothing was missed.  You can just diff the two files together when the cache is below MAX_SIZE and there should be identical.

- Added a verbose logging mode

- Added a 'moved.log' for tracking what has been moved between cache/array

- Changed TARGET_DIR to TARGET_DIRS to loop over all potential shares.  At the moment this is just my plex library, but I have some ideas where else this could be used.

- Moved the output directory to /dev/shm.  My mover is setup to run on the hour, so this reduces the number of read/writes on my cache drive

Link to comment
On 3/13/2024 at 7:41 PM, ronia said:

Hey thanks for this!  This was pretty much exactly what I needed the mover to do with some slight modifications.  I've attached my version below in case this is useful for anyone else:

 

 


Summary:
- Removed hardlinks from step 4a.  I will not have hardlinks in my share, and it significantly improves execution time

- Removed looping over an extension list.  I didn't feel like there was anything in my share that couldn't be moved between cache and array

- Added execution time logging

- Added a 'temp_all_files.txt" file.  This was useful in convincing myself that the ignore.txt was capturing everything on the share and nothing was missed.  You can just diff the two files together when the cache is below MAX_SIZE and there should be identical.

- Added a verbose logging mode

- Added a 'moved.log' for tracking what has been moved between cache/array

- Changed TARGET_DIR to TARGET_DIRS to loop over all potential shares.  At the moment this is just my plex library, but I have some ideas where else this could be used.

- Moved the output directory to /dev/shm.  My mover is setup to run on the hour, so this reduces the number of read/writes on my cache drive

 

Some interesting changes.  If I ever need to make big updates to the script I may port some of these back into the main line. 

Link to comment

Does this script handle active torrent seeds at all?  I have a qbit mover script that is supposed to pause active torrents and move if they are over a certain age, but I like your methodology better, and would like to be able to preserve hardlinks and execute this mover based on cache fill instead of only time.  I'm probably not saying it right, but hopefully you understand enough of what I am looking for?

Link to comment

Since we're already using MAX_SIZE here to define the maximum amount we want to take files into account as they occupy the cache, how should the mover tuning plugin be setup? I currently have it so it only moves if it's at 90% usage, should I remove that?

Link to comment
1 hour ago, Sptz87 said:

Since we're already using MAX_SIZE here to define the maximum amount we want to take files into account as they occupy the cache, how should the mover tuning plugin be setup? I currently have it so it only moves if it's at 90% usage, should I remove that?

 

Yes, you can remove the tuner threshhold which will let my script incrementally move things off each day

Link to comment
2 hours ago, Terebi said:

 

Yes, you can remove the tuner threshhold which will let my script incrementally move things off each day

 

Thank you! That makes total sense.

 

Also, jsut regarding your warning: WARNING : ONLY USE THIS SCRIPT IF YOU HAVE MIRRORED CACHE, OR REALLY DON'T CARE IF YOU LOSE ITEMS HELD IN CACHE.

 

Why do you say that? As far as I understand, you're just collection the files age and sorting them by that order and mover is ignoring the files that are in the output txt file. So why would there be any loss at all?

 

Link to comment
Posted (edited)
39 minutes ago, Sptz87 said:

 

 

Why do you say that? As far as I understand, you're just collection the files age and sorting them by that order and mover is ignoring the files that are in the output txt file. So why would there be any loss at all?

 

 

The array is (or should be) protected by parity against disk failure.  But if you have a single cache drive, then the files there are not protected. For users not using mover shenanigans like this script, thats probably ok, because things will get moved to the array every morning.  But using this script, things may stay on cache for weeks/months, so if you don't have mirrored cache setup, there is increased risk of data loss if the cache drive fails.   If you do have mirrored cache, then the risk is pretty minimal. 

Edited by Terebi
Link to comment
  • 2 weeks later...

Thanks for linking to this from the movertuning page.  Looks exactly what i am looking for. 

 

So for the simple minded like me:

Copy the scrpt to notepad++, change the "MAX_SIZE" to 3.5tb (in bytes, i have a 4tb cache), save as "moverignore.sv" and place it \appdata\scripts.  Do i need to change anything in TARGET_DIR or OUTPUT_DIR?  I created moverignore.txt and placed it in \appdata\scripts

In mover tuning path to moverignore.sv in "Script to run before mover (No checks, always runs):"  and moverignore.txt for the "Ignore files listed inside of a text file:"

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.