Jump to content

[Guide] How to keep cache drive full of media


Recommended Posts

  • 3 weeks later...

Have we considered using (60s just a sample timer)

inotifywatch -v -e access -e modify -t 60 -r /mnt/user0

which basically collects statistics of all non-cached (/mnt/user0) files and folders.

 

And then using a script to move the most accessed files to the (unraid) cache (/mnt/user).

 

This would effectively create a "read cache" which could be iterated over and over, for X amount of time, untill the cache is X % full.

 

Please note this records literal disk reads and memory cached content and application cached content is not included.

 

Random example excerpt from the depths of the interwebz...

% inotifywatch -v -e access -e modify -t 60 -r ~/.beagle
Establishing watches...
Setting up watch(es) on /home/rohan/.beagle
OK, /home/rohan/.beagle is now being watched.
Total of 302 watches.
Finished establishing watches, now collecting statistics.
Will listen for events for 60 seconds.
total  access  modify  filename
1436   1074    362     /home/rohan/.beagle/Indexes/FileSystemIndex/PrimaryIndex/
1323   1053    270     /home/rohan/.beagle/Indexes/FileSystemIndex/SecondaryIndex/
303    116     187     /home/rohan/.beagle/Indexes/KMailIndex/PrimaryIndex/
261    74      187     /home/rohan/.beagle/TextCache/
206    0       206     /home/rohan/.beagle/Log/
42     0       42      /home/rohan/.beagle/Indexes/FileSystemIndex/Locks/
18     6       12      /home/rohan/.beagle/Indexes/FileSystemIndex/
12     0       12      /home/rohan/.beagle/Indexes/KMailIndex/Locks/
3      0       3       /home/rohan/.beagle/TextCache/54/
3      0       3       /home/rohan/.beagle/TextCache/bc/
3      0       3       /home/rohan/.beagle/TextCache/20/
3      0       3       /home/rohan/.beagle/TextCache/62/
2      2       0       /home/rohan/.beagle/Indexes/KMailIndex/SecondaryIndex/

 

Link to comment
2 hours ago, Samsonight said:

all non-cached (/mnt/user0) files and folders.

 

And then using a script to move the most accessed files to the (unraid) cache (/mnt/user).

/mnt/user is the user shares including all pools. You would have to specify /mnt/poolname, such as /mnt/cache, for the move destination.

 

You definitely don't want to try to move anything from /mnt/user0 to /mnt/user. That would almost certainly result in lost data.

Link to comment
Posted (edited)
1 hour ago, trurl said:

/mnt/user is the user shares including all pools. You would have to specify /mnt/poolname, such as /mnt/cache, for the move destination.

 

You definitely don't want to try to move anything from /mnt/user0 to /mnt/user. That would almost certainly result in lost data.

 

Afaik. If a share has its primary storage on a Pool (cache) and you write to /mnt/user/ShareX it goes to the Pool (instead of the Array).

If you write to the /mnt/user0 it goes to the Array (disk).

 

If /mnt/user0 isn't the (array) disk-only path and /mnt/user is the top layer then what does Mover do when moving to / from array / pool. However it seems Lime is depricating user0, but equally they have not clearly stated before, nor stating now where this functionality would occur. Basically causing threads and confusions like this to fill the forums.

 

Documentation search for user0 https://docs.unraid.net/search/?q=user0

returns only https://docs.unraid.net/unraid-os/manual/shares/#user-shares

while the actual reference is https://docs.unraid.net/unraid-os/manual/shares/user-shares/

 

Please, since you imply factual knowledge of the absolute state of the FUSE layering, could you explain in detail why/how "That would almost certainly result in lost data". Preferrably citing documentation. Just to avoid obscure fud and misunderstandings like this... and how you would suggest a better result achieved. ❤️

Edited by Samsonight
clarification seems needed
Link to comment

It certainly might be possible to expand my script to include recently read item into things to preserve on cache, more than just things that were added recently. 

 

For my purposes, recently added and recently viewed overlap a lot though, so it might not be as useful as imagined. 

 

And without more complicated work would not handle promoting things from the array to the cache. 

Link to comment
Posted (edited)

I deleted the moverignore.txt file, since it was keeping the entries of files I had deleted manually from the cache drive which were no longer present. Instead of recreating the .txt file, it emptied the whole cache drive last night. Is the .txt file always meant to be created before a first run? I don't think the script was able to create it itself.

 

EDIT: I created the moverignore.txt file manually with the command but, seems like it is not being updated any more automatically either via the mover plugin.

Edited by Unraidmule
Link to comment
On 5/27/2024 at 2:22 AM, Unraidmule said:

I deleted the moverignore.txt file, since it was keeping the entries of files I had deleted manually from the cache drive which were no longer present. Instead of recreating the .txt file, it emptied the whole cache drive last night. Is the .txt file always meant to be created before a first run? I don't think the script was able to create it itself.

 

EDIT: I created the moverignore.txt file manually with the command but, seems like it is not being updated any more automatically either via the mover plugin.

 

Every time my script runs it should create the file.  Perhaps the script is not in the same location the mover tuner setting is looking for it in? Or the permissions on the script may need to be fixed?

Link to comment
6 hours ago, Terebi said:

 

Every time my script runs it should create the file.  Perhaps the script is not in the same location the mover tuner setting is looking for it in? Or the permissions on the script may need to be fixed?

 

I think it is a permission issue — it lost rights to execute, IDK what the original file permission was I did chmod 777 because I forgot about the +x command. Will let it run tonight and hopefully get the desired results, when it was running it was actually really cool.

Link to comment
Just now, Unraidmule said:

 

I think it is a permission issue — it lost rights to execute, IDK what the original file permission was I did chmod 777 because I forgot about the +x command. Will let it run tonight and hopefully get the desired results, when it was running it was actually really cool.

 

in a console if you just run mover, you can see if it triggers my script or not

Link to comment
Posted (edited)
12 hours ago, Terebi said:

 

in a console if you just run mover, you can see if it triggers my script or not

 

I can test it, what do I type in the console? Just “mover”?

 

EDIT: It executed overnight and the .txt file seems to be updating now. IDK what happened, but the executing permissions got removed somehow. If anyone else has issues, that would be the first thing to check.

Edited by Unraidmule
Link to comment
On 5/28/2024 at 5:17 PM, Unraidmule said:

 

I can test it, what do I type in the console? Just “mover”?

 

EDIT: It executed overnight and the .txt file seems to be updating now. IDK what happened, but the executing permissions got removed somehow. If anyone else has issues, that would be the first thing to check.

Are you using user-scripts to make it?  Is your script located on your boot device?

I had same problem, found out that unraid wont let you make anything on the boot drive executable.  I made a copy to cache disk and I believe its working right now.

 

Had another issue with some files not being added to the moverignore.txt file, specifically some files with 3 digit inodes.

If it was currently looking at a file with inode 416, and inode 5416 had already been processed and added to the array, it would count it as a match.  Fixed it by changing the line to:

 

if  [[ "${processed_inodes[*]}" =~ (^|[[:space:]])"$inode"($|[[:space:]]) ]]; then

 

Link to comment
  • 2 weeks later...

Great idea with this, I was looking for exactly this behavior in the Mover Tuning Plugin!

Have you considered allowing us to specify a desired amount of free space to maintain, rather than specifying the size we want to TARGET_DIR to be?

 

Unless I am missing something, it would be pretty easy to calculate the MAX_SIZE from the user specified free space requirement: (total size of cache pool - desired free space) - ((total current size of data in /mnt/cache) - (total current size of data in TARGET_DIR ))

 

 

Link to comment
1 minute ago, crafty35a said:

Great idea with this, I was looking for exactly this behavior in the Mover Tuning Plugin!

Have you considered allowing us to specify a desired amount of free space to maintain, rather than specifying the size we want to TARGET_DIR to be?

 

Unless I am missing something, it would be pretty easy to calculate the MAX_SIZE from the user specified free space requirement: (total size of cache pool - desired free space) - ((total current size of data in /mnt/cache) - (total current size of data in TARGET_DIR ))

 

 

Its probably doable,  but since the two values are mostly equivalent (unless appdata grows suddenly) it doesn't matter enough to me to fix :)

Link to comment
  • 2 weeks later...

Hi,

 

for the extensions part i want to keep rar files - but in this case the parts are r01-r99 for eg. so is there a way to add a wild card? like r*? because after r99 its going to for eg s01 - s99 etc

 

and its too long to add all the different numbers in

 

Thanks

Link to comment
  • 2 weeks later...
Posted (edited)
On 6/10/2024 at 1:18 PM, crafty35a said:

Great idea with this, I was looking for exactly this behavior in the Mover Tuning Plugin!

Have you considered allowing us to specify a desired amount of free space to maintain, rather than specifying the size we want to TARGET_DIR to be?

 

Unless I am missing something, it would be pretty easy to calculate the MAX_SIZE from the user specified free space requirement: (total size of cache pool - desired free space) - ((total current size of data in /mnt/cache) - (total current size of data in TARGET_DIR ))

 

 

I wanted something similar to this as well and this is what I came up with:

 

#!/bin/bash

START_TIME=`date +%s`
DATE=`date`
# Define variables
CACHE_DRIVE="/mnt/cache"
TARGET_DIRS=("/mnt/cache/plex_lib")
DUPLICATES_LOG="/dev/shm/duplicates.log" # Note, this file must be created by another script that tracks duplicated files (ie. tv_series_to_cache script)
OUTPUT_DIR="/dev/shm/cache_mover"
OUTPUT_FILE="$OUTPUT_DIR/ignore.txt"
OUTPUT_FILE_SHADOW="$OUTPUT_DIR/ignore_shadow.txt" # Shadow file so OUTPUT_FILE is always in valid state
MOVE_FILE="$OUTPUT_DIR/move_queue.log" # Note, shadow file not needed as move file is not used by any process
MOVE_LOG="$OUTPUT_DIR/moved.log" # List of files that are moved if mover is expected to be invoked
LOG_FILE="$OUTPUT_DIR/verbose.log"
TRIM_RATIO_TARGET=50 # Ratio of used space to target after moving/cleaning
MOVER_RATIO_TARGET=80 # Must match the value setup in mover scheduler
#EXTENSIONS=("mkv" "srt" "mp4" "avi" "rar")
VERBOSE=false

### Perform size calculations on cache drive, target directories and trim target ###
cd $CACHE_DRIVE
# Get remaining size of cache
remaining_size=$(df --output=avail $CACHE_DRIVE | awk 'FNR ==2 {print}')
remaining_size=$(( $remaining_size * 1024 ))

# Get cache usage
cache_used=0
target_dirs_used=0
non_target_dirs_used=0
for dir in */; do
    dsize=$(du -bs $dir | awk '{print $1}')
    cache_used=$(($cache_used + $dsize))
    for target_dir in "${TARGET_DIRS[@]}"; do
        full_path=$CACHE_DRIVE/$dir
        if [ $full_path = $target_dir/ ]; then
            target_dirs_used=$((target_dirs_used + $dsize))
        else
            non_target_dirs_used=$((non_target_dirs_used + $dsize))
        fi
    done
    #echo "$dir $dsize"
done

total_size=$(($cache_used + $remaining_size))
#echo "total_size $total_size"


# Get ratio of target_dirs to total size of cache
target_dirs_ratio=$(( 100 * $target_dirs_used / $total_size ))
remaining_size_ratio=$(( 100 * $remaining_size / $total_size ))
max_possible_target_ratio=$(( target_dirs_ratio + remaining_size_ratio ))
cache_used_ratio=$(( 100* $cache_used / $total_size ))
#echo "target_dirs_ratio=$target_dirs_ratio"
if (( max_possible_target_ratio < 100-TRIM_RATIO_TARGET )); then
    # notification
    /usr/local/emhttp/webGui/scripts/notify -i alert -s "cache_mover script failed!" -d "Trim ratio target ($((100-TRIM_RATIO_TARGET))%) is greater than possible free space of monitored targets ($max_possible_target_ratio%).  Ignore file could not be created."
    exit
fi

trim_size=$(( total_size * TRIM_RATIO_TARGET / 100 ))
#echo "TARGET_DIRS needs to be trimmed to $trim_size bytes"

# Ensure the output directory exists
mkdir -p "$OUTPUT_DIR"

# Ensure the moved log exists
touch $MOVE_FILE

# Ensure the duplicates log exists
touch $DUPLICATES_LOG

# Cleanup previous temporary files
rm -f "$OUTPUT_DIR/temp_metadata.txt" "$OUTPUT_DIR/temp_all_files.txt" "$OUTPUT_DIR/temp_deleted.txt"
touch "$OUTPUT_DIR/temp_deleted.txt"
rm -f $OUTPUT_FILE_SHADOW
rm -f $MOVE_FILE
# LOG_FILE intentionally kept persistent

for target_dir in "${TARGET_DIRS[@]}"; do
    # Step 1: Change directory to the target directory
    cd "$target_dir" || exit

    # Step 2: Find files with specified extensions and obtain metadata (loop through extensions)
    #for ext in "${EXTENSIONS[@]}"; do
    #    find "$(pwd)" -type f -iname "*.$ext" -exec stat --printf="%i %Z %n\0" {} + >> "$OUTPUT_DIR/temp_metadata.txt"
    #done
    # Step 2(alt): Find all files.  No filter.
    find "$(pwd)" -type f -iname "*" -exec stat --printf="%i %Z %n\0" {} + >> "$OUTPUT_DIR/temp_metadata.txt"
done

# Step 3: Sort metadata by ctime (second column) in descending order
sort -z -k 2,2nr -o "$OUTPUT_DIR/temp_metadata.txt" "$OUTPUT_DIR/temp_metadata.txt"


# Step 4: Get the newest files up to the specified size limit
total_size=$non_target_dirs_used
move_size=0
processed_inodes=()
while IFS= read -r -d $'\0' line; do
    read -r inode ctime path <<< "$line"
    
    # Check if folder was deleted
    deleted=false
    while IFS=':' read -r ln_num; do
        ln_num=$(echo "$ln_num" | xargs)
        
        if [ -z "$ln_num" ]; then
            break
        else
            deleted=true  
        fi
    done <<< "$(grep -n "$(dirname "${path}")" "$OUTPUT_DIR/temp_deleted.txt")"
    if [ "$deleted" = true ]; then
        continue
    fi
    
    # Keep track of all files
    echo $path >> $OUTPUT_DIR/temp_all_files.txt
    
    # Skip if the inode has already been processed
    if [[ "${processed_inodes[*]}" =~ $inode ]]; then
        continue
    fi
    
    size=$(stat --printf="%s" "$path")

    if ((total_size + size <= trim_size)); then
        if $VERBOSE; then
            echo "$DATE: Processing file: $total_size $path" >> $LOG_FILE  # Debug information to log
        fi
        #echo "$path" >> "$OUTPUT_FILE_SHADOW"  # Appending only path and filename to the file
        total_size=$((total_size + size))
        
        # Mark the current inode as processed
        processed_inodes+=("$inode")

        # Step 4a: List hardlinks for the current file
        #hard_links=$(find "$TARGET_DIR" -type f -samefile "$path")
        #if [ -n "$hard_links" ]; then
        #       echo "$hard_links" >> "$OUTPUT_FILE_SHADOW"
        #else
        #   echo $path >> $OUTPUT_FILE_SHADOW
        #fi
        # Step 4a(alt): Script does not support hardlinks, but is significantly faster and supports multiple TARGET_DIR
        echo $path >> $OUTPUT_FILE_SHADOW

    else
        # TODO: I don't actually think the duplicates logic is necessary.  rsync seems to be able to reconcile duplicates when performed manually.  Mover uses rsync so it shouldn't be a problem?
        # Check if file is already duplicated on the array
        #duplicated=false
        #while IFS=':' read -r ln_num array_path cache_path; do
        #    # Trim trailing spaces
        #    ln_num=$(echo "$ln_num" | xargs)
        #    array_path=$(echo "$array_path" | xargs)
        #    cache_path=$(echo "$cache_path" | xargs)
        #    
        #    if [ -z "$ln_num" ]; then
        #        break
        #    else
        #        duplicated=true  
        #    fi
        #    
        #    # Rsync cache version based on crc.  Skip mod-time & size.
        #    rsync -avcP -X "$cache_path"/ "$array_path"/
        #done <<< "$(grep -n "$(dirname "${path}")" $DUPLICATES_LOG)"
        
        #if [ "$duplicated" = true ]; then
        #    # Remove log entry
        #    sed -i "\#$(dirname "${path}")#d" $DUPLICATES_LOG
        #    
        #    size=$(du -b "$(dirname "${path}")" | awk '{print $1}')
        #    
        #    # Delete cache folder
        #    rm -rf "$(dirname "${path}")"
        #    echo "$(dirname "${path}")" >> "$OUTPUT_DIR/temp_deleted.txt"
        #fi
        
        #if $VERBOSE; then
        #    if [ "$duplicated" = true ]; then
        #        echo "$DATE: Removing duplicate folder: $move_size "$(dirname "${path}")"" >> $LOG_FILE  # Debug information to log
        #    else
        #        echo "$DATE: Moving file: $move_size $path" >> $LOG_FILE  # Debug information to log
        #    fi
        #fi
        
        move_size=$((move_size + size))
        
        ## Do not add to the move file log if previously added
        #if ! grep -q "$path" $MOVE_FILE ; then
        #    echo "$DATE: $path" >> $MOVE_FILE
        #fi
        # Add the path to the move file
        echo "$path" >> $MOVE_FILE
        
        continue
        #break
    fi
done < "$OUTPUT_DIR/temp_metadata.txt"

# Step 5: Cleanup temporary files
rm "$OUTPUT_DIR/temp_metadata.txt"

# Step 6: Overwrite the output file with the shadow file.  
rm -f $OUTPUT_FILE
mv $OUTPUT_FILE_SHADOW $OUTPUT_FILE

# Step 7: Update move log if mover is expected to run
if [[ $cache_used_ratio -ge $MOVER_RATIO_TARGET ]]; then
    cat $MOVE_FILE >> $MOVE_LOG
fi

END_TIME=`date +%s`

if $VERBOSE; then
    echo "$DATE: File list generated and saved to: $OUTPUT_FILE" >> $LOG_FILE
    echo "$DATE: Execution time: $(($END_TIME - $START_TIME)) seconds." >> $LOG_FILE
fi

 

Notes:

- ignore the "duplicates" log logic.  I use this in conjunction with an "array to cache" script that polls tautulli for in progress media and sends it back to the cache.  It also probably doesn't work. :)

- my version does not deal with hard links, unlike OPs.  It's assumed that the media library is hard link free or being taken care of externally.

- I think all you would need to do is cut the portions involving "TRIM_RATIO_TARGET" and "MOVER_RATIO_TARGET" to OPs script.  (his is also cleaner because I'm still working/debugging mine)

Edited by ronia
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...