rsync Incremental Backup


Recommended Posts

The following script creates incremental backups by using rsync. Check the settings to define your own paths.

 

Donate?¬†ūü§ó

 

#!/bin/bash
# #####################################
# Name:        rsync Incremental Backup
# Description: Creates incremental backups and deletes outdated versions
# Author:      Marc Gutt
# Version:     1.3
# 
# Changelog:
# 1.3
# - Fixed typo which prevented deleting skipped backups
# - Fixed typo in notify function which returned wrong importance level
# - Better error reporting
# - Added support for SSH sources
# - Fixed bug while creating destination path on remote server
# - Empty dir is not a setting anymore
# - Logfile is now random to avoid race conditions
# - Delete logfile of skipped backups
# 1.2
# - Fixed typo in new "backup is older than..." feature
# 1.1
# - Fixed copying log file although backup has been skipped
# - Fixed deleting empty_dir while creating destination path
# - Create notification if last backup is older than X days (something went wrong for a long time)
# 1.0
# - Allow setting "--dry-run" as rsync option (which skips some parts of the script)
# - Create destination path if it does not exist
# - Fixed wrong minimum file count check
# - Fixed broken recognition of remote "mv" command
# 0.9
# - Fixed wrong backup path in some situations
# - User-defined replacement rules for backup path
# - new setting "skip_error_host_went_down" which skips backups if host went down during file transfers
# - Important Change: /source/<timestamp>/source has been changed to /source/<timestamp>
# - Fixed wrong counting on keeped backups if multiple source paths have been set
# 0.8
# - Fixed wrong path while making backups visible on SSH targets
# 0.7
# - Empty backups stay invalid (must include at least X files)
# - Fixed growing log file problem
# - Logs are now located in the backup dir itself
# - Added support for SSH destinations (replaced find, rm and mv commands against pure rsync hacks)
# - User-defined rsync options
# - User can exclude directories, defaults are /Temp, /Tmp and /Cache
# - Enhanced user settings (better description and self-explaning variable names)
# - Multi-platform support (should now work with Unraid, Ubuntu, Synology...)
# - Replaced potentially unsafe "rm -r" command against rsync
# - User-defined rsync command to allow optional sshpass support
# - Keep multiple backups of a day only of the last X days (default 1)
# - Important Change: The latest backup of a month is kept as monthly backup (in the past it was only the backup of the 1st of a month)
# - Important Change: The latest backup of a year is kept as yearly backup (in the past it was only the backup of the 1st january of a year)
# 
# Todo:
# - chunksync hardlinks for huge files (like images)
# - docker auto stop and start for consistent container backups (compare container volumes against source paths)
# - what happens if backup source disappears while creating backup (like a mounted smb share which goes offline)
# - rare case scenario: log filename is not atomic
# - test on very first backup if destination supports hardlinks
# - resume if source went offline during last backup
# #####################################

# #####################################
# Settings
# #####################################

# backup source to destination
backup_jobs=(
  # source                          # destination
  "/mnt/user/Music"                 "/mnt/user/Backups/Shares/Music"
  "user@server:/home/Maria/Photos"  "/mnt/user/Backups/server/Maria/Photos"
  "/mnt/user/Documents"             "user@server:/home/Backups/Documents"
)

# keep backups of the last X days
keep_days=14

# keep multiple backups of one day for X days
keep_days_multiple=1

# keep backups of the last X months
keep_months=12

# keep backups of the last X years
keep_years=3

# keep the most recent X failed backups
keep_fails=3

# rsync options which are used while creating the full and incremental backup
rsync_options=(
#  --dry-run
  --archive # same as --recursive --links --perms --times --group --owner --devices --specials
  --human-readable # output numbers in a human-readable format
  --itemize-changes # output a change-summary for all updates
  --exclude="[Tt][Ee][Mm][Pp]/" # exclude dirs with the name "temp" or "Temp" or "TEMP"
  --exclude="[Tt][Mm][Pp]/" # exclude dirs with the name "tmp" or "Tmp" or "TMP"
  --exclude="Cache/" # exclude dirs with the name "Cache"
)

# notify if the backup was successful (1 = notify)
notification_success=0

# notify if last backup is older than X days
notification_backup_older_days=30

# create destination if it does not exist
create_destination=1

# backup does not fail if files vanished during transfer https://linux.die.net/man/1/rsync#:~:text=vanished
skip_error_vanished_source_files=1

# backup does not fail if source path returns "host is down".
# This could happen if the source is a mounted SMB share, which is offline.
skip_error_host_is_down=1

# backup does not fail if file transfers return "host is down"
# This could happen if the source is a mounted SMB share, which went offline during transfer
skip_error_host_went_down=1

# backup does not fail, if source path does not exist, which for example happens if the source is an unmounted SMB share
skip_error_no_such_file_or_directory=1

# a backup fails if it contains less than X files
backup_must_contain_files=2

# a backup fails if more than X % of the files couldn't be transfered because of "Permission denied" errors
permission_error_treshold=20

# user-defined rsync command
#alias rsync='sshpass -p "<password>" rsync -e "ssh -o StrictHostKeyChecking=no"'

# user-defined ssh command
#alias ssh='sshpass -p "<password>" ssh -o "StrictHostKeyChecking no"'

# #####################################
# Script
# #####################################

# make script race condition safe
if [[ -d "/tmp/${0//\//_}" ]] || ! mkdir "/tmp/${0//\//_}"; then echo "Script is already running!" && exit 1; fi; trap 'rmdir "/tmp/${0//\//_}"' EXIT;

# allow usage of alias commands
shopt -s expand_aliases

# functions
remove_last_slash() { [[ "${1%?}" ]] && [[ "${1: -1}" == "/" ]] && echo "${1%?}" || echo "$1"; }
notify() {
  echo "$2"
  if [[ -f /usr/local/emhttp/webGui/scripts/notify ]]; then
    /usr/local/emhttp/webGui/scripts/notify -i "$([[ $2 == Error* ]] && echo alert || echo normal)" -s "$1 ($src_path)" -d "$2" -m "$2"
  fi
}

# check user settings
backup_path=$(remove_last_slash "$backup_path")
[[ "${rsync_options[*]}" == *"--dry-run"* ]] && dryrun=("--dry-run")

# check if rsync exists
! command -v rsync &> /dev/null && echo "rsync command not found!" && exit 1

# check if sshpass exists if it has been used
echo "$(type rsync) $(type ssh)" | grep -q "sshpass" && ! command -v sshpass &> /dev/null && echo "sshpass command not found!" && exit 1

# set empty dir
empty_dir="/tmp/${0//\//_}"

# loop through all backup jobs
for i in "${!backup_jobs[@]}"; do

  # get source path and skip to next element
  ! (( i % 2 )) && src_path="${backup_jobs[i]}" && continue

  # get destination path
  dst_path="${backup_jobs[i]}"

  # check user settings
  src_path=$(remove_last_slash "$src_path")
  dst_path=$(remove_last_slash "$dst_path")
 
  # get ssh login and remote path
  ssh_login=$(echo "$dst_path" | grep -oP "^.*(?=:)")
  remote_dst_path=$(echo "$dst_path" | grep -oP "(?<=:).*")
  if [[ ! "$remote_dst_path" ]]; then
    ssh_login=$(echo "$src_path" | grep -oP "^.*(?=:)")
  fi

  # create timestamp for this backup
  new_backup="$(date +%Y%m%d_%H%M%S)"

  # create log file
  log_file="$(mktemp)"
  exec &> >(tee "$log_file")

  # obtain last backup
  if last_backup=$(rsync --dry-run --recursive --itemize-changes --exclude="*/*/" --include="[0-9]*/" --exclude="*" "$dst_path/" "$empty_dir" 2>&1); then
    last_backup=$(echo "$last_backup" | grep -oP "[0-9_/]*" | sort -r | head -n1)
  # create destination path
  elif echo "$last_backup" | grep -q "No such file or directory" && [[ "$create_destination" == 1 ]]; then
    unset last_backup last_include
    if [[ "$remote_dst_path" ]]; then
      mkdir -p "$empty_dir$remote_dst_path" || exit 1
    else
      mkdir -p "$empty_dir$dst_path" || exit 1
    fi
    IFS="/" read -r -a includes <<< "${dst_path:1}"
    for j in "${!includes[@]}"; do
      includes[j]="--include=$last_include/${includes[j]}"
      last_include="${includes[j]##*=}"
    done
    rsync --itemize-changes --recursive "${includes[@]}" --exclude="*" "$empty_dir/" "/"
    find "$empty_dir" -mindepth 1 -type d -empty -delete
  else
    rsync_errors=$(grep -Pi "rsync:|fail|error:" "$log_file" | tail -n3)
    notify "Could not obtain last backup!" "Error: ${rsync_errors//[$'\r\n'=]/ } ($rsync_status)!"
    continue
  fi

  # create backup
  echo "# #####################################"
  # incremental backup
  if [[ "$last_backup" ]]; then
    echo "last_backup: '$last_backup'"
    # warn user if last backup is really old
    last_backup_days_old=$(( ($(date +%s) - $(date +%s -d "${last_backup:0:4}${last_backup:4:2}${last_backup:6:2}")) / 86400 ))
    if [[ $last_backup_days_old -gt $notification_backup_older_days ]]; then
      notify "Last backup is too old!" "Error: The last backup is $last_backup_days_old days old!"
    fi
    # rsync returned only the subdir name, but we need an absolute path
    last_backup="$dst_path/$last_backup"
    echo "Create incremental backup from $src_path to $dst_path/$new_backup by using last backup $last_backup"
    # remove ssh login if part of path
    last_backup="${last_backup/$(echo "$dst_path" | grep -oP "^.*:")/}"
    rsync "${rsync_options[@]}" --stats --delete --link-dest="$last_backup" "$src_path/" "$dst_path/.$new_backup"
  # full backup
  else
    echo "Create full backup from $src_path to $dst_path/$new_backup"
    rsync "${rsync_options[@]}" --stats "$src_path/" "$dst_path/.$new_backup"
  fi

  # check backup status
  rsync_status=$?
  # obtain file count of rsync
  file_count=$(grep "^Number of files" "$log_file" | cut -d " " -f4)
  file_count=${file_count//,/}
  [[ "$file_count" =~ ^[0-9]+$ ]] || file_count=0
  echo "File count of rsync is $file_count"
  # success
  if [[ "$rsync_status" == 0 ]]; then
    message="Success: Backup of $src_path was successfully created in $dst_path/$new_backup ($rsync_status)!"
  # source path is a mounted SMB server which is offline
  elif [[ "$rsync_status" == 23 ]] && [[ "$file_count" == 0 ]] && [[ $(grep -c "Host is down (112)" "$log_file") == 1 ]]; then
    message="Skip: Backup of $src_path has been skipped as host is down"
    [[ "$skip_error_host_is_down" != 1 ]] && message="Error: Host is down!"
  elif [[ "$rsync_status" == 23 ]] && [[ "$file_count" -gt 0 ]] && [[ $(grep -c "Host is down (112)" "$log_file") == 1 ]]; then
    message="Skip: Backup of $src_path has been skipped as host went down"
    [[ "$skip_error_host_went_down" != 1 ]] && message="Error: Host went down!"
  # source path is wrong (maybe unmounted SMB server)
  elif [[ "$rsync_status" == 23 ]] && [[ "$file_count" == 0 ]] && [[ $(grep -c "No such file or directory (2)" "$log_file") == 1 ]]; then
    message="Skip: Backup of $src_path has been skipped as source path does not exist"
    [[ "$skip_error_no_such_file_or_directory" != 1 ]] && message="Error: Source path does not exist!"
  # check if there were too many permission errors
  elif [[ "$rsync_status" == 23 ]] && grep -c "Permission denied (13)" "$log_file"; then
    message="Warning: Some files had permission problems"
    permission_errors=$(grep -c "Permission denied (13)" "$log_file")
    error_ratio=$((100 * permission_errors / file_count)) # note: integer result, not float!
    if [[ $error_ratio -gt $permission_error_treshold ]]; then
      message="Error: $permission_errors/$file_count files ($error_ratio%) return permission errors ($rsync_status)!"
    fi
  # some source files vanished
  elif [[ "$rsync_status" == 24 ]]; then
    message="Warning: Some files vanished"
    [[ "$skip_error_vanished_source_files" != 1 ]] && message="Error: Some files vanished while backup creation ($rsync_status)!"
  # all other errors are critical
  else
    rsync_errors=$(grep -Pi "rsync:|fail|error:" "$log_file" | tail -n3)
    message="Error: ${rsync_errors//[$'\r\n'=]/ } ($rsync_status)!"
  fi

  # backup remains or is deleted depending on status
  # delete skipped backup
  if [[ "$message" == "Skip"* ]]; then
    echo "Delete $dst_path/.$new_backup"
    rsync "${dryrun[@]}" --recursive --delete --include="/.$new_backup**" --exclude="*" "$empty_dir/" "$dst_path"
  # check if enough files have been transferred
  elif [[ "$message" != "Error"* ]] && [[ "$file_count" -lt "$backup_must_contain_files" ]]; then
    message="Error: rsync transferred less than $backup_must_contain_files files! ($message)!"
  # keep successful backup
  elif [[ "$message" != "Error"* ]]; then
    echo "Make backup visible ..."
    # remote backup
    if [[ "$remote_dst_path" ]]; then
      # check if "mv" command exists on remote server as it is faster
      if ssh -n "$ssh_login" "command -v mv &> /dev/null"; then
        echo "... through remote mv (fast)"
        [[ "${dryrun[*]}" ]] || ssh "$ssh_login" "mv \"$remote_dst_path/.$new_backup\" \"$remote_dst_path/$new_backup\""
      # use rsync (slower)
      else
        echo "... through rsync (slow)"
        # move all files from /.YYYYMMDD_HHIISS to /YYYYMMDD_HHIISS
        if ! rsync "${dryrun[@]}" --delete --recursive --backup --backup-dir="$remote_dst_path/$new_backup" "$empty_dir/" "$dst_path/.$new_backup"; then
          message="Error: Could not move content of $dst_path/.$new_backup to $dst_path/$new_backup!"
        # delete empty source dir
        elif ! rsync "${dryrun[@]}" --recursive --delete --include="/.$new_backup**" --exclude="*" "$empty_dir/" "$dst_path"; then
          message="Error: Could not delete empty dir $dst_path/.$new_backup!"
        fi
      fi
    # use local renaming command
    else
      echo "... through local mv"
      [[ "${dryrun[*]}" ]] || mv -v "$dst_path/.$new_backup" "$dst_path/$new_backup"
    fi
  fi

  # notification
  if [[ $message == "Error"* ]]; then
    notify "Backup failed!" "$message"
  elif [ "$notification_success" == 1 ]; then
    notify "Backup done." "$message"
  fi

  # loop through all backups and delete outdated backups
  echo "# #####################################"
  echo "Clean up outdated backups"
  unset day month year day_count month_count year_count
  while read -r backup_name; do

    # failed backups
    if [[ "${backup_name:0:1}" == "." ]] && ! [[ "$backup_name" =~ ^[.]+$ ]]; then
      if [[ "$keep_fails" -gt 0 ]]; then
        echo "Keep failed backup: $backup_name"
        keep_fails=$((keep_fails-1))
        continue
      fi
      echo "Delete failed backup: $backup_name"

    # successful backups
    else
      last_year=$year
    last_month=$month
    last_day=$day
    year=${backup_name:0:4}
    month=${backup_name:4:2}
    day=${backup_name:6:2}
    # all date parts must be integer
    if ! [[ "$year$month$day" =~ ^[0-9]+$ ]]; then
      echo "Error: $backup_name is not a backup!"
      continue
    fi
    # keep all backups of a day
    if [[ "$day_count" -le "$keep_days_multiple" ]] && [[ "$last_day" == "$day" ]] && [[ "$last_month" == "$month" ]] && [[ "$last_year" = "$year" ]]; then
      echo "Keep multiple backups per day: $backup_name"
      continue
    fi
    # keep daily backups
    if [[ "$keep_days" -gt "$day_count" ]] && [[ "$last_day" != "$day" ]]; then
      echo "Keep daily backup: $backup_name"
      day_count=$((day_count+1))
      continue
    fi
    # keep monthly backups
    if [[ "$keep_months" -gt "$month_count" ]] && [[ "$last_month" != "$month" ]]; then
      echo "Keep monthly backup: $backup_name"
      month_count=$((month_count+1))
      continue
    fi
    # keep yearly backups
    if [[ "$keep_years" -gt "$year_count" ]] && [[ "$last_year" != "$year" ]]; then
      echo "Keep yearly backup: $backup_name"
      year_count=$((year_count+1))
      continue
      fi
      # delete outdated backups
      echo "Delete outdated backup: $backup_name"
    fi

    # ssh
    if [[ "$remote_dst_path" ]]; then
      if ssh -n "$ssh_login" "command -v rm &> /dev/null"; then
        echo "... through remote rm (fast)"
        [[ "${dryrun[*]}" ]] || ssh "$ssh_login" "rm -r \"${remote_dst_path:?}/${backup_name:?}\""
      else
        echo "... through rsync (slow)"
        rsync "${dryrun[@]}" --recursive --delete --include="/$backup_name**" --exclude="*" "$empty_dir/" "$dst_path"
      fi
    # local (rm is 50% faster than rsync)
    else
      [[ "${dryrun[*]}" ]] || rm -r "${dst_path:?}/${backup_name:?}"
    fi

  done < <(rsync --dry-run --recursive --itemize-changes --exclude="*/*/" --include="[.0-9]*/" --exclude="*" "$dst_path/" "$empty_dir" | grep -oP "[.0-9_]*" | sort -r)

  # move log file to destination
  log_path=$(rsync --dry-run --itemize-changes --include=".$new_backup/" --include="$new_backup/" --exclude="*" --recursive "$dst_path/" "$empty_dir" | cut -d " " -f 2)
  [[ $log_path ]] && rsync "${dryrun[@]}" --remove-source-files "$log_file" "$dst_path/$log_path/$new_backup.log"
  [[ -f "$log_file" ]] && rm "$log_file"

done

 

Explanations

  • All created backups are full backups with hardlinks to already existing files (~ incremental backup)
  • All backups use the most recent backup to create hardlinks or new files. Deleted files are not copied (1:1 backup)
  • There are no dependencies between the most recent backup and the previous backups. You can delete as many backups as you like. All backups that are left, are still full backups. This could be confusing as most incremental backup softwares need the previous backups for restoring the data. But this is not valid for rsync and hardlinks. Read here if you need more informations about links, inodes and files.
  • After a backup has been created the script purges the backup dir and keeps only the backups of the last 14 days, 12 month and 3 years, which can be defined through the settings
  • logs can be found inside of each backup folder
  • Sends notifications after job execution

 

How to execute this script?

  • Use the User Scripts Plugin (Unraid Apps) to execute it by schedule
  • Use the Unassigned Devices Plugin (Unraid Apps) to execute it after mounting a USB drive

 

How does a backup look like?

This is how the backup dir looks like after several month (it kept the backups of 2020-07-01, 2020-08-01 ... and all backups of the last 14 days):

1830509929_2020-10-1908_51_36_2.png.bad74f9567a2ba0aa0cfb8ddf626ec1e.png

 

And as it's an incremental backup, the storage usage is low: (as you can see I bought new music before "2020-08-01" and before "2020-10-01"):

du -d1 -h /mnt/user/Backup/Shares/Music | sort -k2
168G    /mnt/user/Backup/Shares/Music/20200701_044011
4.2G    /mnt/user/Backup/Shares/Music/20200801_044013
3.8M    /mnt/user/Backup/Shares/Music/20200901_044013
497M    /mnt/user/Backup/Shares/Music/20201001_044014
4.5M    /mnt/user/Backup/Shares/Music/20201007_044016
4.5M    /mnt/user/Backup/Shares/Music/20201008_044015
4.5M    /mnt/user/Backup/Shares/Music/20201009_044001
4.5M    /mnt/user/Backup/Shares/Music/20201010_044010
4.5M    /mnt/user/Backup/Shares/Music/20201011_044016
4.5M    /mnt/user/Backup/Shares/Music/20201012_044020
4.5M    /mnt/user/Backup/Shares/Music/20201013_044014
4.5M    /mnt/user/Backup/Shares/Music/20201014_044015
4.5M    /mnt/user/Backup/Shares/Music/20201015_044015
4.5M    /mnt/user/Backup/Shares/Music/20201016_044017
4.5M    /mnt/user/Backup/Shares/Music/20201017_044016
4.5M    /mnt/user/Backup/Shares/Music/20201018_044008
4.5M    /mnt/user/Backup/Shares/Music/20201018_151120
4.5M    /mnt/user/Backup/Shares/Music/20201019_044002
172G    /mnt/user/Backup/Shares/Music

 

Warnings

  1. Its not the best idea to backup huge files like disk images that changes often as the whole file will be copied.
  2. A file change while copying it through rsync will cause a corrupted file as rsync does not lock files. If you like to backup for example a VM image file or Container database, stop it first (to avoid further writes), before executing this script!
  3. Never change a file, which is inside a backup directory. This changes all files in all backups (this is how hardlinks work)!

 

 

 

  • Like 9
Link to comment
2 hours ago, kizer said:

From your Example which is the complete up today with all changes directory?

You mean those directories, which are visible in the screenshot? The most recent backup is inside "/20201019_044002" (generated at 2020-10-19 04:40:02). It contains a full 1:1 backup of the source path, but as you can see it has only a small size (because of the hardlinks):

du -d1 -h /mnt/user/Backup/Shares/Music
168G    /mnt/user/Backup/Shares/Music/20200701_044011
4.2G    /mnt/user/Backup/Shares/Music/20200801_044013
3.8M    /mnt/user/Backup/Shares/Music/20200901_044013
497M    /mnt/user/Backup/Shares/Music/20201001_044014
4.5M    /mnt/user/Backup/Shares/Music/20201007_044016
4.5M    /mnt/user/Backup/Shares/Music/20201008_044015
4.5M    /mnt/user/Backup/Shares/Music/20201009_044001
4.5M    /mnt/user/Backup/Shares/Music/20201010_044010
4.5M    /mnt/user/Backup/Shares/Music/20201011_044016
4.5M    /mnt/user/Backup/Shares/Music/20201012_044020
4.5M    /mnt/user/Backup/Shares/Music/20201013_044014
4.5M    /mnt/user/Backup/Shares/Music/20201014_044015
4.5M    /mnt/user/Backup/Shares/Music/20201015_044015
4.5M    /mnt/user/Backup/Shares/Music/20201016_044017
1.1M    /mnt/user/Backup/Shares/Music/20201017_044016
5.0M    /mnt/user/Backup/Shares/Music/20201018_044008
4.5M    /mnt/user/Backup/Shares/Music/20201018_151120
4.5M    /mnt/user/Backup/Shares/Music/20201019_044002
172G    /mnt/user/Backup/Shares/Music

 

 

Edited by mgutt
Link to comment
On 10/18/2020 at 2:14 PM, mgutt said:
  • All following backups use the most recent backup to copy only new files while existing files a hardlinked (= incremental backup)
  • This means each backup contains a 1:1 full backup, but does not waste storage

Hi, new to linux .. excuse my ignorance ...

-If I delete the main backup file (the 1st one created=full backup) > I lost everything, right?

-If I delete any of the "incremental" backups (the ones starting after the 1st backup) > I should at least have the latest backup file + the 1st one to recover my latest backup status?

On 10/18/2020 at 2:14 PM, mgutt said:

Yearly backups are only kept if they were generated on the first january

Just want to make sure what this means. Let¬īs assume I created a 1st full backup in 10th october 2019. Then I have not been running my server everyday, so I have not daily incremental backups (it is posible that some months the server is not powered on too). Assume on the 1st january 2021 my server is not working. The rule means it will delete all daily and monthly backups (incremental) created during 2020?

 

Rgds

Link to comment
40 minutes ago, luca2 said:

If I delete the main backup file (the 1st one created=full backup) > I lost everything, right?

No. Only the link to the file is deleted. A hardlink is only an additional link to the file and as long links exist, the file exists.

 

47 minutes ago, luca2 said:

I should at least have the latest backup file + the 1st one to recover my latest backup status?

No. You need only one of the folders. Nice, isn't it ;)

 

48 minutes ago, luca2 said:

Let¬īs assume I created a 1st full backup in 10th october 2019. Then I have not been running my server everyday, so I have not daily incremental backups (it is posible that some months the server is not powered on too). Assume on the 1st january 2021 my server is not working. The rule means it will delete all daily and monthly backups (incremental) created during 2020?

 

I'm not sure if I understand the question properly. I try to explain as follows:

- the backup from 10th October 2019 will be deleted if 14 newer daily backups exist

- the 10th of October is not kept as monthly backup as it's not generated on the 1st of a month

- the 10th of October is not kept as yearly backup as it's not generated on the 1st of January

 

Note: I know this is not an optimal situation. I'm still working on a solution for that.

Link to comment

 

You got me thinking about the missing DB backup of the Plex Appdata folder being very large on the NVme

So I wanted to use your incremental backup script, but I am not sure if I need to remove the cache in the path

and also not sure if this supports UAD for drives. I am trying to create the backup on a drive not the Array

Like this:

 

# settings
user_share="/appdata/plex"
backup_share="/mnt/disks/Drive1/backup" 
days=14 # preserve backups of the last X days
months=12 # preserve backups of the first day of the last X month
years=3 # preserve backups of the first january of the last X years
fails=3 # preserve the recent X failed backups
.....

But cache path fails (Got it working with normal path)

Update this will create a backup on my UAD disk with path:

\Drive1\backup\appdata\plex\.20201021_144056

 

But I am having a little problem using the cache path, I tried to replace all the /mnt/user/ in the script but keep getting errors

Is it better just to use the /mnt/user and not the cache direct for the script to work?

(You got me using /cache/ every now! ūüôā¬†)

 


 

 

Edited by casperse
Link to comment

Some questions allowed?

 

1.) The Plex folder on my cache contains "trillions" of directories and files. Those files change rapidly. New files are added at high frequency. Does that mean that the result are trillions of hardlinks? Stupid question, I know. But I never worked with rsync that way.

 

2.) What about backup to remote locations? I do feed backups to Unraid servers at two different remote locations (see below). Will this work and create hardlinks at the remote location?

 

rsync -avPX --delete-during --protect-args -e ssh "/mnt/diskx/something/" "user@###.###.###.###:/mnt/diskx/Backup/something/"

 

Thanks in advance.

 

Link to comment
22 minutes ago, hawihoney said:

Does that mean that the result are trillions of hardlinks?

Yes and no. You said they change rapidly. This means they are new files. New files will be copied. Hardlinks are only generated if the files already exist = "old" files. And yes, many files mean many hardlinks. Thats the reason why each folder is an independent full backup. If you are concerned about performance. No clue. I would say it takes longer than only copying new files like this is done by usual incremental backup softwares. But those have the downside that you need to keep all old folders and they must generate an index file which covers deletions. Not sure which method is finally faster. I only know that its easier to know "this folder is my full backup of day X".

 

22 minutes ago, hawihoney said:

Will this work and create hardlinks at the remote location

Yes. "--link-dest" works remotely, too. rsync compares size and date of a file and if they do not fit it creates a new copy. If it already exists in the "--link-dest" folder, it creates a hardlink to the already copied file. Only requirement: --link-dest and destination must be on the same volume (I think this is logical).

 

I will consider remote backups in a future version.

 

Link to comment
 

I'm not sure if I understand the question properly. I try to explain as follows:

- the backup from 10th October 2019 will be deleted if 14 newer daily backups exist

- the 10th of October is not kept as monthly backup as it's not generated on the 1st of a month

- the 10th of October is not kept as yearly backup as it's not generated on the 1st of January

 

Note: I know this is not an optimal situation. I'm still working on a solution for that.

Thx for your detailed explanation. Will try it hopefully this weekend.

 

Sent from my NX569J using Tapatalk

 

 

 

Link to comment

Hi,

 

I finally started testing today (did pick up a small share). I still must do some daily backups but it is very easy. Thx!

 

Besides shares backups, I have a particular scenario where I placed some important shares (which are critical to me)  in a disk (disk1) which belongs to the array. I am thinking about doing a backup of the full disk instead of several backups of the shares. Do you think it would be posible?

 

Now I get this when I backup a share (isos) using your script. It is the first full backup.

/mnt/disks/UD_hdd2/backuptoexternalHDD/Shares/isos/20201023_195306

Maybe in my specific scenario I am looking for this:

/mnt/disks/UD_hdd2/backuptoexternalHDD/disk1/20201023_195306

Since I am not into coding I just want to make sure it is feasible. If yes pls let me know what I should look into and I will try to modify the script to adapt it.

 

Rgds.

Link to comment

How should we handle soft errors? At the moment my script marks the complete backup as "failed". I had these three permission problems although it transfered ~200k files:

rsync: readdir("/mnt/disks/DESKTOP-1234_Documents/Eigene Bilder"): Permission denied (13)
rsync: readdir("/mnt/disks/DESKTOP-1234_Documents/Eigene Musik"): Permission denied (13)
rsync: readdir("/mnt/disks/DESKTOP-1234_Documents/Eigene Videos"): Permission denied (13)

I checked my Client and those paths are hidden and I'm not able to open them, too:

166088059_2020-11-0323_38_31.png.fa18a1f285b1a1652307acbd5d62f4a6.png

 

I have multiple paths which where marked as "not successful"  because of similar problems:

1516260663_2020-11-0400_19_50.png.1edebe9ba511775f2d06bad79aed2483.png

 

Sadly rsync does not return a error/success statistic. Only "code 23" without the amount of files:

sent 28,943,321,517 bytes  received 2,773,155 bytes  32,688,983.25 bytes/sec
total size is 28,927,317,322  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1189) [sender=3.1.3]
Preserve failed backup: .20201103_231558

And rsync even does not return a proper error code, as "23" means "partial transfer" and not only "permission error":

https://unix.stackexchange.com/a/491461/101920

 

"23" is even returned if the complete source path is wrong. :|

Edited by mgutt
Link to comment

Sorry, need to ask an additional question:

 

Consider an existing full backup. Two more daily inremental backups exist as well. Now I delete a file. What's the state after the next run of the script?

 

Does the file exist in the full backup folder? Should be.

Does the file exist in the two incremental backup folders? Should be.

Does the file exist in the new latest incremental backups? Should not.

 

Thanks. This stuff is new to me.

 

Link to comment
4 hours ago, hawihoney said:

Does the file exist in the full backup folder? Should be.

Does the file exist in the two incremental backup folders? Should be.

Does the file exist in the new latest incremental backups? Should not.

Yes, works exactly as you described ;)

 

 

Edited by mgutt
Link to comment
21 hours ago, mgutt said:

How should we handle soft errors?

I tried the "--stats" option:

https://serverfault.com/a/678308/44086

 

The result of the "failing" job

...
DESKTOP-I0HHMD9_Downloads/FileBot_4.9.0_x64.msi
DESKTOP-I0HHMD9_Downloads/FileBot_4.9.1_x64.msi
DESKTOP-I0HHMD9_Downloads/FileZilla_Pro_3.49.2_win64-setup.exe
DESKTOP-I0HHMD9_Downloads/FileZilla_Pro_3.50.0_win64-setup.exe
rsync: send_files failed to open "/mnt/disks/DESKTOP-I0HHMD9_Downloads/FileZilla_Pro_3.51.0_win64-setup.exe": Permission denied (13)
DESKTOP-I0HHMD9_Downloads/FilmeMKVsortbyAudioLastFile.txt
DESKTOP-I0HHMD9_Downloads/Firefox Installer.exe
DESKTOP-I0HHMD9_Downloads/Firefox Setup 82.0.2.exe
...

Number of files: 24,371 (reg: 22,745, dir: 1,626)
Number of created files: 24,371 (reg: 22,745, dir: 1,626)
Number of deleted files: 0
Number of regular files transferred: 22,745
Total file size: 29,887,743,868 bytes
Total transferred file size: 29,887,743,868 bytes
Literal data: 29,875,526,492 bytes
Matched data: 0 bytes
File list size: 589,788
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 29,884,270,967
Total bytes received: 442,439

sent 29,884,270,967 bytes received 442,439 bytes 30,386,083.79 bytes/sec
total size is 29,887,743,868 speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1189) [sender=3.1.3]
Script Finished Nov 04, 2020 17:01.49

 

 

As it returned "22,745 of 22,745" transferred files altough one is missing, I opened an rsync bug report.

 

So we can't rely on the "stats" summary to solve this issue. I need to think about this further.

Link to comment

I don't use User Shares so I did use Disk Shares and these don't work well with this script. I know, I know, it's not designed that way but I want to share my experience. The reason why I don't use User Shares is, that I do have two different remote Backup Locations and using User Shares with huge directories and files over SMB to remote locations often crash. Using Disk Shares helps most of the time:

 

What I did is:

[...]
source_paths=(
    "/mnt/disk17/Notizen"
)
backup_path1="/mnt/hawi/192.168.178.101_disk17/Backup"
#backup_path2="/mnt/hawi/192.168.178.102_disk1/Backup"

That's the result:

Create backup of /mnt/disk17/Notizen
Backup path has been set to /mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen
Create full backup 20201105_130528
sending incremental file list
Notizen/
[...]
sent 61,223 bytes received 759 bytes 41,321.33 bytes/sec
total size is 57,872 speedup is 0.93
mv: cannot move '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201105_130528' to '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201105_130528': Permission denied
Preserve failed backup: .20201105_130528
DONE

In your original post you mention 'shares' but disk shares are shares as well.

 

 

I don't want to rant, I just want to add feature requests:

 

1.) Always use the last subdir and don't check for /mnt/user resp. /mnt. So in my case 'Notizen' would be used as new subdir for the backup path. It's irrelevant if it's /mnt/user/Notizen or /mnt/disk17/Notizen then. The resulting backup path would be better then and independent from a user share or a disk share. 

 

2.) Don't know why there's a permission problem. My own rsync jobs work that way since years. Any idea?

 

 

Link to comment
Quote

I don't use User Shares so I did use Disk Shares

Nothing wrong with that. The script should work with all paths (and not only "shares").

 

Quote

Always use the last subdir and don't check for /mnt/user resp. /mnt. So in my case 'Notizen' would be used as new subdir for the backup path. It's irrelevant if it's /mnt/user/Notizen or /mnt/disk17/Notizen then. The resulting backup path would be better then and independent from a user share or a disk share. 

This would cause a huge problem if a user backups different paths with the same last subdir name. Example:

/mnt/user/Moritz/Notizen
/mnt/user/Max/Notizen

Both would target "/backup/Notizen". I know its "ugly" having super long paths, but how could we solve this? Maybe an optional setting like "force last subdir name"?

 

2 hours ago, hawihoney said:

Don't know why there's a permission problem

The permission problem is not related to rsync. It's only related to the "mv" command. The "mv" renames (or "moves") the backup folder from the hidden ".20201105" to "20201105" if the backup was successful:

mv "${backup_path}/.${new_backup}" "${backup_path}/${new_backup}"

It's a really basic linux command so I wonder why it does not work for you.

 

How did you mount "/mnt/hawi/192.168.178.101_disk17/Backup" and what could be the reason why read & write is allowed, but not renaming?

 

Please manually repeat the command through the WebTerminal:

mv '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201105_130528' '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201105_130528'

If it works now, then something in this path locked the directory or a file inside of it. Maybe an index service on the external location or similar?
 

Edited by mgutt
Link to comment

Interesting. When issued manually from console on source server it works:

root@Tower:~# mv '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201105_183346' '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201105_183346'
root@Tower:~#

This is what I see on target server after the mv:

root@TowerVM01:/mnt/disk17/Backup# ls -lisa /mnt/disk17/Backup/disk17/Notizen/
total 0
10952386795 0 drwxrwxrwx 3 hawi users 37 Nov  5 18:34 ./
 8877660570 0 drwxrwxrwx 3 hawi users 29 Nov  5 18:33 ../
   30738631 0 drwxrwxrwx 3 hawi users 51 Nov  5 18:33 20201105_183346/

This is the mount command:

mount -t cifs -o rw,nounix,iocharset=utf8,_netdev,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.0,username=hawi,password=******** '//192.168.178.101/disk17' '/mnt/hawi/192.168.178.101_disk17'

Looks ok to me. Is it possible that something from within the script is being hold when issued via SMB to a remote server?

 

Edited by hawihoney
Link to comment
4 hours ago, hawihoney said:

Looks ok to me. Is it possible that something from within the script is being hold when issued via SMB to a remote server?

 

As it works for me while collecting files from an external SMB share and backing up local paths, it should not related to my script I think. I can only guess. I would say if you are sure that no other processes on the destination server are accessing the freshly generated backup, it could be something related to the Linux write cache. Maybe its not possible to rename the folder because its not fully written to the HDD. If this is the case a timeout should help.

 

So I released v0.3

# - rsync returns summary
# - typo in notification corrected
# - skip some rsync errors (defaults are "0" = skip on success and "24" = skip if some files vanish from the source while transfer)
# - add timeout for backup renaming https://forums.unraid.net/topic/97958-rsync-incremental-backup/?tab=comments#comment-910188

This version tries every second to rename the backup and fails after rename_timeout which has a default setting of 100. Please return feedback if this solves your issue and if it works, how many tries / seconds were needed (are visible through the logs).

Edited by mgutt
Link to comment

I found a small bug. If the 15th day, which should be deleted, contains multiple backups, it will delete only one on each script execution while the rest is kept by the "Keep multiple backups per day" condition:

Preserve daily backup: 20201106_011906
Preserve daily backup: 20201105_232908
Preserve daily backup: 20201103_044001
Preserve daily backup: 20201102_044001
Preserve monthly backup: 20201101_044004
Preserve daily backup: 20201031_044001
Preserve daily backup: 20201030_044001
Preserve daily backup: 20201029_044001
Preserve daily backup: 20201028_044001
Preserve daily backup: 20201027_044001
Preserve daily backup: 20201026_044001
Preserve daily backup: 20201025_044005
Preserve daily backup: 20201024_044001
Preserve daily backup: 20201023_044002
Preserve daily backup: 20201022_044001
Delete 20201018_183054
Keep multiple backups per day: 20201018_182807
Keep multiple backups per day: 20201018_181402
Keep multiple backups per day: 20201018_181234
Keep multiple backups per day: 20201018_181134
Keep multiple backups per day: 20201018_151209
Keep multiple backups per day: 20201018_044006
Preserve monthly backup: 20201001_044014
Preserve monthly backup: 20200901_044011

Instead it should delete all of them.

Edited by mgutt
Link to comment
8 hours ago, mgutt said:

Please return feedback if this solves your issue and if it works, how many tries / seconds were needed (are visible through the logs).

Thanks for your change. Unfortunately it didn't work neither - even after 100 tries to move the temporary folder. Without the hidden temporary folder and it's finishing 'mv' the script works perfect.

 

Please forget it. It must be something on my side I think. No idea what. I would suggest to remove that particular change because it's related to one (mine) system only.

 

[...]
Try #99 to make backup visible
mv: cannot move '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201106_072853' to '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201106_072853': Permission denied
Preserve failed backup: .20201106_072853

 

Edited by hawihoney
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.