rsync Incremental Backup


Recommended Posts

On 8/19/2021 at 7:30 PM, Shantarius said:

Hi,

 

if i execute the script i become the folloing error:

 

root@Avalon:/boot/config/scripts# sh backup_rsync.sh 
backup_rsync.sh: line 74: syntax error near unexpected token `>'
backup_rsync.sh: line 74: `    exec &> >(tee "${backup_path}/logs/${new_backup}.log")'
root@Avalon:/boot/config/scripts# 

 

How can i solve this?

 

Thank you!

 

I am running into the exact same issue... is there any fix for this yet already?

Link to comment
  • 4 weeks later...
  • 2 weeks later...

Ok, thank you! Once the initial backup is completed I will compare the directories with krusader synchronize and see what happened.

 

Edit: You were right, the destination size is almost half of the original directory. Maybe you could put a short warning in your opening post. Sooner or later there will be somebody out there that set this up the way I intended and thinks that it works.

 

Looking further opened a whole new can of worms. I suspect using another backup solution such as CCC instead of time machine would end with the same issue.

I will keep looking for another solution for simple synchronizing, maybe something like unison.

Edited by iripmotoles
Link to comment
  • 3 weeks later...
On 8/19/2021 at 7:30 PM, Shantarius said:

Hi,

 

if i execute the script i become the folloing error:

 

root@Avalon:/boot/config/scripts# sh backup_rsync.sh 
backup_rsync.sh: line 74: syntax error near unexpected token `>'
backup_rsync.sh: line 74: `    exec &> >(tee "${backup_path}/logs/${new_backup}.log")'
root@Avalon:/boot/config/scripts# 

 

How can i solve this?

 

Thank you!

 

@daNick73

 

You are trying to execute a bash script as a shell script. Either you execute "bash backup_rsync.sh" or you use only "./backup_rsync.sh" and let the OS decide which language should be used.

Link to comment
  • 4 weeks later...

At the moment I'm working on the next release. Many people asked for SSH destinations and I'm still working on that. I created some really nice hacks to reach this:

 

As an example I use this rsync hack to get the most recent backup instead of using the classic "find" command:

last_backup=$(rsync --dry-run --recursive --itemize-changes --exclude="*/*/" --include="[0-9]*/" --exclude="*" "$backup_path"/ /tmp | grep -oP "[0-9_/]*" | sort -r | head -n1)

 

Or I use this rsync hack to delete old backups, instead of using the "rm" command:

rsync --recursive --delete --include="/${backup_name}**" --exclude="*" /tmp/empty_dir/ "$backup_path"

 

Of course I could instead use "ssh" to execute "find", "rm", etc on the destination itself, but this would blow up the script and it still won't work if the destination server only supports rsync and not bash commands. So my target is to use rsync only for any file manipulation on the destination server.  This is kinda fun... okay non-devs won't understand it 😂

 

But stay tuned 😋

  • Like 2
Link to comment

Today I hopefully solved the very last hurdle to backup to SSH destinations and replaced the renaming command "mv" against a new rsync hack as follows:

# SSH (this rsync hack renames each file individually)
if [[ "$backup_path" == *"@"* ]] && [[ "$backup_path" == *":"* ]]; then
  # move all files from /.YYYY-MM-DD to /YYYY-MM-DD
  if ! "$rsync" --delete --recursive --backup --backup-dir="$backup_path/$new_backup" "$empty_dir/" "$backup_path/.$new_backup"; then
    message="Error: Could not move content of $backup_path/.$new_backup to $backup_path/$new_backup!"
  # delete empty source dir
  elif ! "$rsync" --recursive --delete --include="/.$new_backup**" --exclude="*" "$empty_dir/" "$backup_path"; then
    message="Error: Could not delete empty dir $backup_path/.$new_backup!"
  fi
# local (mv is much faster as it renames only the root dir)
else
  mv -v "$backup_path/.$new_backup" "$backup_path/$new_backup"
fi

 

I posted this solution here as well as everyone expects rsync is only able to copy files:

https://unix.stackexchange.com/questions/43957/using-rsync-to-move-not-copy-files-between-directories/

 

 

The downside is that rsync renames each file individually making it much slower than executing "mv" on the SSH destination. Because of that I'll try to add a check which verifies if the usual SSH command "mv" is available, before using the rsync hack.

 

Link to comment

I've released version 0.7 with the following enhancements:

# - Empty backups stay invalid (must include at least X files)
# - Fixed growing log file problem
# - Logs are now located in the backup dir itself
# - Added support for SSH destinations (replaced find, rm and mv commands against pure rsync hacks)
# - User-defined rsync options
# - User can exclude directories, defaults are /Temp, /Tmp and /Cache
# - Enhanced user settings (better description and self-explaning variable names)
# - Multi-platform support (should now work with Unraid, Ubuntu, Synology...)
# - Replaced potentially unsafe "rm -r" command against rsync
# - User-defined rsync command to allow optional sshpass support
# - Keep multiple backups of a day only of the last X days (default 1)
# - Important Change: The latest backup of a month is kept as monthly backup (in the past it was only the backup of the 1st of a month)
# - Important Change: The latest backup of a year is kept as yearly backup (in the past it was only the backup of the 1st january of a year)

 

If you want to backup to an SSH server, simply add the login information to the backup_path in the following format:

backup_path="user@server:/home/backups/"

 

Of course this works only if you have exported your ssh keys to enable passwordless SSH connections. If you instead prefer "sshpass", you could enable these two settings:

# user-defined rsync command
alias rsync='sshpass -p "<password>" rsync'

# user-defined ssh command
alias ssh='sshpass -p "<password>" ssh -o "StrictHostKeyChecking no"'

 

 

I have several ideas for the next version(s) left:

# - chunksync hardlinks for huge files (like images)
# - docker auto stop and start for consistent container backups (compare container volumes against source paths)
# - auto accept ssh key, but only if /backup_path/source_path is empty (on very first execution)
# - what happens if backup source disappears while creating backup (like a mounted smb share which goes offline)
# - add support for ssh sources
# - rare case scenario: log filename is not atomic
# - test on very first backup if destination supports hardlinks
# - check if ssh server supports "rm" (is 50% faster than rsync hack)
# - should we use failed backups as source if last X backups failed (to allow progress)?

 

  • Thanks 1
Link to comment

whilst i'm on a roll...

 

rsync cracks the poops if you ask it to create more than one folder deep at once (at least when operating via ssh), so i've made it work by putting the following in (just above "# obtain most recent backup")

  if [[ $dst_path == *"@"*":"* ]]; then 
    # this is a remote destination
    mkdirDir=$(cut -d ":" -f2 <<< $dst_path)
    sshDest=$(cut -d ":" -f1 <<< $dst_path)
    ssh $sshDest -f "mkdir -p '$mkdirDir' && exit"
  else
    mkdir -p $dst_path
  fi

 

Edited by Meles Meles
doofus
Link to comment
17 hours ago, Meles Meles said:

rsync cracks the poops if you ask it to create more than one folder deep at once

Why do you set a destination path, which does not exist?!

 

This is something which I should not add, as some people backup to directories, which are mounts and if they don't exist while the backup starts, mkdir would create a destination which should never exist. Simply think about a user who accidentally starts a backup to an USB drive, SMB mount or SSHFS mount, which isn't connected or lost connection. In unRAID you would then create a dir which is located in the RAM and crash the server. Same is valid if you write locally to Unraid and stop the array.

 

18 hours ago, Meles Meles said:

Why does it add "Shares" into the destination path?

It replaces /mnt/user against /Shares and not only adds /Shares. It's only a small feature to shorten the destination path. I think I will move this to the settings and let the user decide which replacement rules should be used.

 

17 hours ago, Meles Meles said:

surely the "shortening" of dst_path "if" statement needs an else

Yes you are correct. I will fix this bug.

Link to comment

Version 0.9 has been released:

 

# - Fixed wrong backup path in some situations
# - User-defined replacement rules for backup path
# - new setting "skip_error_host_went_down" which skips backups if host went down during file transfers
# - Important Change: /source/<timestamp>/source has been changed to /source/<timestamp>
# - Fixed wrong counting on keeped backups if multiple source paths have been set

 

 

Now, everyone is able to setup their own replacement rules for the backup path. The default replacements are:

replace_paths=(
  "/mnt/user;/Shares"
  "/mnt/;/"
)

 

 

This means if the source path contains "/mnt/user" it will be replaced against "/Shares". So instead of this path:

/mnt/user/Backups/mnt/user/Music/<timestamp>

 

It will copy the files to:

/mnt/user/Backups/Shares/Music/<timestamp>

 

Feel free to create your own rules or remove all of them.

 

Another important change is the subdir after the <timestamp> path. Many people are confused why it's generated "twice" like "/Music/<timestamp>/Music". Now it's only "/Music/<timestamp>". Note: This change will cause generating a new full backup without hardlinks. Of course old backups will stay intact.

  • Thanks 1
Link to comment

@Meles Meles With version 1.0 I will add a solution which creates the target path through an rsync hack if it does not exist:

image.png.953bb1bbf9d6ddd312125e873991790c.png

 

As you can see it requires enabling the new setting "create_destination" (enabled by default).

 

In addition I totally changed the source/path settings as follows:

image.png.129ff9b821d94560fa70f10123272a05.png

 

Now the user needs to set the full destination path for each backup job individually (which is automatically created if it does not exist). And thanks to this change I could completely remove the replacement path thing. 😋

 

I will test the new version and release it in a few hours.

 

 

 

 

 

Link to comment

Version 1.0 released:

# - Allow setting "--dry-run" as rsync option (which skips some parts of the script)
# - Create destination path if it does not exist
# - Fixed wrong minimum file count check
# - Fixed broken recognition of remote "mv" command

 

As announced the script creates the destination directory, now. And it is possible to test the script by adding the rsync option "--dry-run" which disables all "mv" and "rm" commands.

Link to comment

Version 1.1 released

 

# - Fixed copying log file although backup has been skipped
# - Fixed deleting empty_dir while creating destination path
# - Create notification if last backup is older than X days (something went wrong for a long time)

 

The new feature warns the user if the last successful backup is older than 30 days.

  • Like 1
Link to comment

I released version 1.3:

# - Fixed typo which prevented deleting skipped backups
# - Fixed typo in notify function which returned wrong importance level
# - Better error reporting
# - Added support for SSH sources
# - Fixed bug while creating destination path on remote server
# - Empty dir is not a setting anymore
# - Logfile is now random to avoid race conditions
# - Delete logfile of skipped backups

 

As you can see it's now possible to backup SSH sources, which is shown as an example in the settings:

# backup source to destination
backup_jobs=(
  # source                          # destination
  "/mnt/user/Music"                 "/mnt/user/Backups/Shares/Music"
  "user@server:/home/Maria/Photos"  "/mnt/user/Backups/server/Maria/Photos"
  "/mnt/user/Documents"             "user@server:/home/Backups/Documents"
)

 

 

  • Like 2
Link to comment

I'm working on a new feature to stop containers which uses a path that is part of a backup job.

 

This is how I obtain all paths used by all running containers:

docker_mounts=$(docker container inspect -f '{{$id := .Id}}{{range .Mounts}}{{if .Source}}{{printf $id}}:{{.Source}}{{println}}{{end}}{{end}}' "$(docker ps -q)" | grep -v -e "^$")

 

Which returns something as follows:

ea6b8e2ace9179:/mnt/user/Photos
91b65a1fcb95e0:/mnt/disk5/Movies
9a9e5f90ae1767:/mnt/cache/appdata/npm
7a55c107226adf:/mnt/user/appdata/pihole

 

Now I'm able to stop the relevant container(s):

 # stop docker container
 for container_id in $(echo "$docker_mounts" | grep -oP ".*?(?=:$src_path)" | uniq); do
  container_name=$(docker container inspect -f '{{ .Name }}' "$container_id")
  container_name=${container_name//\/}
  echo "Stop container $container_name as it uses $src_path"
  docker stop "$container_id"
 done

 

But I'm facing a problem: Let's say the user creates a backup of the path "/mnt/user/Movies". By the above example, it would fail to stop the container "91b65a1fcb95e0" as it uses the "different" path "/mnt/disk5/Movies" (which is not different in the Unraid universe as we know).

 

The only idea I had is to ignore the subdirectory and check for /mnt/*/Movies instead, but I'm not sure how this would result in a problem if my script is used on a different OS?!

  • Like 1
Link to comment
1 hour ago, mgutt said:

The only idea I had is to ignore the subdirectory and check for /mnt/*/Movies instead, but I'm not sure how this would result in a problem if my script is used on a different OS?!

Depending on how you code it, surely the worst case scenario is you stop additional containers at the same time unnecessarily, right ?

I would think it is an edge case, so the small performance/efficiency penalty would only be felt by a minority.

 

You could always do a uname -r check (or similar) to see whether the script should run in an unraid optimised way.

  • Like 1
Link to comment
On 12/29/2021 at 1:06 AM, tjb_altf4 said:

Depending on how you code it, surely the worst case scenario is you stop additional containers at the same time unnecessarily, right ?

Yes. It's "only" this.

 

Yesterday I had another idea. I'm creating backups of my clients every hour. The creation is really fast as it adds a handful files and then creates the hardlinks locally and done. But deleting all these multiple backups of one day is really slow on the next day. In addition it needs to wake up the HDD on every backup as the hardlinks are created there instead on the cache (which is logical). 

 

Now I thought about adding a differential backup feature. So at first it creates a full backup and if already a full backup of the same day exists, it creates only differential backups. But this makes restoring all files more complicate for the user as they need to restore the full backup, then the differential backup (as it contains the more recent files) and finally delete all files which are inside a third folder (rsync calls it the backup folder). Could be too complicate for some users I think 🤔

 

 

Another "problem" is that the user opens the most recent differential backup folder and does not see all files he probably expects as they are in a different backup folder depending on their last modification time.

 

 

 

  • Like 1
Link to comment
On 12/29/2021 at 1:06 AM, tjb_altf4 said:

You could always do a uname -r check (or similar) to see whether the script should run in an unraid optimised way.

 

Ok, I solved it as follows:

 

  # stop docker container
  host_path="$src_path"
  if [[ $( uname -r ) == *"Unraid"* ]]; then
    host_path=${src_path/\/mnt\/$( echo "$src_path" | grep -oP "(?<=/mnt/).*(?=/)")//mnt/.*}
  fi
  for container_id in $(echo "$docker_mounts" | grep -oP ".*?(?=:$host_path)" | uniq); do
    container_name=$(docker container inspect -f '{{ .Name }}' "$container_id")
    container_name=${container_name//\/}
    echo "Stop container $container_name as it uses $host_path"
    #docker stop "$container_id"
  done

 

Now I need to expand this to support SSH sources as well.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.