Jump to content
mgutt

Incremental Backup through rsync

14 posts in this topic Last Reply

Recommended Posts

The following script creates incremental backups of a share to a target path by using rsync. As an example the default setting is creating a backup of the share "Music" to the path "/mnt/user/Backup":

#!/bin/bash

# settings
user_share="Music"
backup_share="/mnt/user/Backup"
days=14 # preserve backups of the last X days
months=12 # preserve backups of the first day of the last X month
years=3 # preserve backups of the first january of the last X years
fails=3 # preserve the recent X failed backups

# make script race condition safe
if [[ -d "/tmp/${0///}" ]] || ! mkdir "/tmp/${0///}"; then exit 1; fi; trap 'rmdir "/tmp/${0///}"' EXIT;

# new backup timestamp
new_backup="$(date +%Y%m%d_%H%M%S)"

# create directory tree as rsync is not able to do that (https://askubuntu.com/a/561239/227119)
mkdir -p "${backup_share}/Shares/${user_share}/.${new_backup}"

# create log file
exec &> >(tee "${backup_share}/Shares/${user_share}/.${new_backup}/backup.log")

# obtain most recent backup
last_backup=$(ls -t "${backup_share}/Shares/${user_share}/" | head -n1)
echo "Most recent backup in ${backup_share}/Shares/${user_share}/* is ${last_backup}"

# create full backup
if [[ ! -d "${backup_share}/Shares/${user_share}/${last_backup}" ]]; then
    echo "Create full backup ${new_backup}"
    # create very first backup
    rsync -av "/mnt/user/${user_share}" "${backup_share}/Shares/${user_share}/.${new_backup}"
# create incremental backup
else
    echo "Create incremental backup ${new_backup} by using last backup ${last_backup}"
    rsync -av --delete --link-dest="${backup_share}/Shares/${user_share}/${last_backup}" "/mnt/user/${user_share}" "${backup_share}/Shares/${user_share}/.${new_backup}"
fi

job_name="$(dirname "$0")"
job_name="$(basename "$job_name")"
if [[ $? -eq 0 ]]; then
    /usr/local/emhttp/webGui/scripts/notify -i normal -s "Backup done." -d "Job $job_name successfully finished."
else
    /usr/local/emhttp/webGui/scripts/notify -i alert -s "Backup failed!" -d "Job $job_name failed!"
fi

# make backup visible
mv "${backup_share}/Shares/${user_share}/.${new_backup}" "${backup_share}/Shares/${user_share}/${new_backup}"

# clean up
ls -tA "${backup_share}/Shares/${user_share}/" | while read backup; do
    if [ "${backup:0:1}" = "." ]; then
        if [ "$fails" -gt "0" ]; then
            echo "Preserve failed backup: $backup"
            fails=$(($fails-1))
            continue
        fi
        echo "Delete failed backup: $backup"
        rm -r "${backup_share}/Shares/${user_share}/${backup}"
       continue
    fi
    last_year=$year
    last_month=$month
    last_day=$day
    year=${backup:0:4}
    month=${backup:4:2}
    day=${backup:6:2}
    if [ "$last_day" = "$day" ] && [ "$last_month" = "$month" ] && [ "$last_year" = "$year" ]; then
        echo "Keep multiple backups per day: $backup"
        continue
    fi
    # preserve yearly backups
    if [ "$month" = "01" ] && [ "$day" = "01" ] && [ "$years" -gt "0" ]; then
        echo "Preserve yearly backup: $backup"
        years=$(($years-1))
        continue
    fi
    # preserve monthly backups
    if [ "$day" = "01" ] && [ "$months" -gt "0" ]; then
        echo "Preserve monthly backup: $backup"
        months=$(($months-1))
        continue
    fi
    # preserve daily backups
    if [ "$days" -gt "0" ]; then
        echo "Preserve daily backup: $backup"
        days=$(($days-1))
        continue
    fi
    echo "Delete $backup"
    rm -r "${backup_share}/Shares/${user_share}/${backup}"
done

Explanation

  • The first backup is a full backup
  • All following backups use the most recent backup to copy only new files while existing files a hardlinked (= incremental backup)
  • This means each backup contains a 1:1 full backup, but does not waste storage
  • Finally the script purges the backup dir and keeps only the backups of the last 14 days, 12 month and 3 years, which can be defined through the settings
  • Monthly backups are only kept if they were generated on the first of a month
  • Yearly backups are only kept if they were generated on the first january
  • Partial/crashed backups stay hidden and will be deleted after 14 days
  • rsync logs can be found inside of each backup folder
  • Sends notifications after job execution

 

This is how the backup dir looks like after several month (it kept the backups of 2020-07-01, 2020-08-01 ... and all backups of the last 14 days):

1830509929_2020-10-1908_51_36_2.png.bad74f9567a2ba0aa0cfb8ddf626ec1e.png

 

While the storage usage is low (I bought new music before 2020-08-01 and 2020-10-01):

du -d1 -h /mnt/user/Backup/Shares/Music
168G    /mnt/user/Backup/Shares/Music/20200701_044011
4.2G    /mnt/user/Backup/Shares/Music/20200801_044013
3.8M    /mnt/user/Backup/Shares/Music/20200901_044013
497M    /mnt/user/Backup/Shares/Music/20201001_044014
4.5M    /mnt/user/Backup/Shares/Music/20201007_044016
4.5M    /mnt/user/Backup/Shares/Music/20201008_044015
4.5M    /mnt/user/Backup/Shares/Music/20201009_044001
4.5M    /mnt/user/Backup/Shares/Music/20201010_044010
4.5M    /mnt/user/Backup/Shares/Music/20201011_044016
4.5M    /mnt/user/Backup/Shares/Music/20201012_044020
4.5M    /mnt/user/Backup/Shares/Music/20201013_044014
4.5M    /mnt/user/Backup/Shares/Music/20201014_044015
4.5M    /mnt/user/Backup/Shares/Music/20201015_044015
4.5M    /mnt/user/Backup/Shares/Music/20201016_044017
1.1M    /mnt/user/Backup/Shares/Music/20201017_044016
5.0M    /mnt/user/Backup/Shares/Music/20201018_044008
4.5M    /mnt/user/Backup/Shares/Music/20201018_151120
4.5M    /mnt/user/Backup/Shares/Music/20201019_044002
172G    /mnt/user/Backup/Shares/Music

 

Notes

  1. Its not the best idea to backup VM images or docker.img through rsync as one small change will cause backing up the whole file
  2. A file change while copying it through rsync will cause a corrupted file as rsync does not lock files. If you like to backup for example a VM image file, stop the VM first (to avoid further writes), before executing this script, to prevent this!

 

 

 

Edited by mgutt

Share this post


Link to post

From your Example which is the complete up today with all changes directory?

 

I'm asking because I'm rather intrigued

Share this post


Link to post
2 hours ago, kizer said:

From your Example which is the complete up today with all changes directory?

You mean those directories, which are visible in the screenshot? The most recent backup is inside "/20201019_044002" (generated at 2020-10-19 04:40:02). It contains a full 1:1 backup of the source path, but as you can see it has only a small size (because of the hardlinks):

du -d1 -h /mnt/user/Backup/Shares/Music
168G    /mnt/user/Backup/Shares/Music/20200701_044011
4.2G    /mnt/user/Backup/Shares/Music/20200801_044013
3.8M    /mnt/user/Backup/Shares/Music/20200901_044013
497M    /mnt/user/Backup/Shares/Music/20201001_044014
4.5M    /mnt/user/Backup/Shares/Music/20201007_044016
4.5M    /mnt/user/Backup/Shares/Music/20201008_044015
4.5M    /mnt/user/Backup/Shares/Music/20201009_044001
4.5M    /mnt/user/Backup/Shares/Music/20201010_044010
4.5M    /mnt/user/Backup/Shares/Music/20201011_044016
4.5M    /mnt/user/Backup/Shares/Music/20201012_044020
4.5M    /mnt/user/Backup/Shares/Music/20201013_044014
4.5M    /mnt/user/Backup/Shares/Music/20201014_044015
4.5M    /mnt/user/Backup/Shares/Music/20201015_044015
4.5M    /mnt/user/Backup/Shares/Music/20201016_044017
1.1M    /mnt/user/Backup/Shares/Music/20201017_044016
5.0M    /mnt/user/Backup/Shares/Music/20201018_044008
4.5M    /mnt/user/Backup/Shares/Music/20201018_151120
4.5M    /mnt/user/Backup/Shares/Music/20201019_044002
172G    /mnt/user/Backup/Shares/Music

 

 

Edited by mgutt

Share this post


Link to post
On 10/18/2020 at 2:14 PM, mgutt said:
  • All following backups use the most recent backup to copy only new files while existing files a hardlinked (= incremental backup)
  • This means each backup contains a 1:1 full backup, but does not waste storage

Hi, new to linux .. excuse my ignorance ...

-If I delete the main backup file (the 1st one created=full backup) > I lost everything, right?

-If I delete any of the "incremental" backups (the ones starting after the 1st backup) > I should at least have the latest backup file + the 1st one to recover my latest backup status?

On 10/18/2020 at 2:14 PM, mgutt said:

Yearly backups are only kept if they were generated on the first january

Just want to make sure what this means. Let´s assume I created a 1st full backup in 10th october 2019. Then I have not been running my server everyday, so I have not daily incremental backups (it is posible that some months the server is not powered on too). Assume on the 1st january 2021 my server is not working. The rule means it will delete all daily and monthly backups (incremental) created during 2020?

 

Rgds

Share this post


Link to post
40 minutes ago, luca2 said:

If I delete the main backup file (the 1st one created=full backup) > I lost everything, right?

No. Only the link to the file is deleted. A hardlink is only an additional link to the file and as long links exist, the file exists.

 

47 minutes ago, luca2 said:

I should at least have the latest backup file + the 1st one to recover my latest backup status?

No. You need only one of the folders. Nice, isn't it ;)

 

48 minutes ago, luca2 said:

Let´s assume I created a 1st full backup in 10th october 2019. Then I have not been running my server everyday, so I have not daily incremental backups (it is posible that some months the server is not powered on too). Assume on the 1st january 2021 my server is not working. The rule means it will delete all daily and monthly backups (incremental) created during 2020?

 

I'm not sure if I understand the question properly. I try to explain as follows:

- the backup from 10th October 2019 will be deleted if 14 newer daily backups exist

- the 10th of October is not kept as monthly backup as it's not generated on the 1st of a month

- the 10th of October is not kept as yearly backup as it's not generated on the 1st of January

 

Note: I know this is not an optimal situation. I'm still working on a solution for that.

Share this post


Link to post

 

You got me thinking about the missing DB backup of the Plex Appdata folder being very large on the NVme

So I wanted to use your incremental backup script, but I am not sure if I need to remove the cache in the path

and also not sure if this supports UAD for drives. I am trying to create the backup on a drive not the Array

Like this:

 

# settings
user_share="/appdata/plex"
backup_share="/mnt/disks/Drive1/backup" 
days=14 # preserve backups of the last X days
months=12 # preserve backups of the first day of the last X month
years=3 # preserve backups of the first january of the last X years
fails=3 # preserve the recent X failed backups
.....

But cache path fails (Got it working with normal path)

Update this will create a backup on my UAD disk with path:

\Drive1\backup\appdata\plex\.20201021_144056

 

But I am having a little problem using the cache path, I tried to replace all the /mnt/user/ in the script but keep getting errors

Is it better just to use the /mnt/user and not the cache direct for the script to work?

(You got me using /cache/ every now! 🙂 )

 


 

 

Edited by casperse

Share this post


Link to post
2 hours ago, casperse said:

But I am having a little problem using the cache path

Good point. I will upgrade the script so it supports multiple and complete paths (not only a share name).

Share this post


Link to post

Some questions allowed?

 

1.) The Plex folder on my cache contains "trillions" of directories and files. Those files change rapidly. New files are added at high frequency. Does that mean that the result are trillions of hardlinks? Stupid question, I know. But I never worked with rsync that way.

 

2.) What about backup to remote locations? I do feed backups to Unraid servers at two different remote locations (see below). Will this work and create hardlinks at the remote location?

 

rsync -avPX --delete-during --protect-args -e ssh "/mnt/diskx/something/" "user@###.###.###.###:/mnt/diskx/Backup/something/"

 

Thanks in advance.

 

Share this post


Link to post
22 minutes ago, hawihoney said:

Does that mean that the result are trillions of hardlinks?

Yes and no. You said they change rapidly. This means they are new files. New files will be copied. Hardlinks are only generated if the files already exist = "old" files. And yes, many files mean many hardlinks. Thats the reason why each folder is an independent full backup. If you are concerned about performance. No clue. I would say it takes longer than only copying new files like this is done by usual incremental backup softwares. But those have the downside that you need to keep all old folders and they must generate an index file which covers deletions. Not sure which method is finally faster. I only know that its easier to know "this folder is my full backup of day X".

 

22 minutes ago, hawihoney said:

Will this work and create hardlinks at the remote location

Yes. "--link-dest" works remotely, too. rsync compares size and date of a file and if they do not fit it creates a new copy. If it already exists in the "--link-dest" folder, it creates a hardlink to the already copied file. Only requirement: --link-dest and destination must be on the same volume (I think this is logical).

 

I will consider remote backups in a future version.

 

Share this post


Link to post
 

I'm not sure if I understand the question properly. I try to explain as follows:

- the backup from 10th October 2019 will be deleted if 14 newer daily backups exist

- the 10th of October is not kept as monthly backup as it's not generated on the 1st of a month

- the 10th of October is not kept as yearly backup as it's not generated on the 1st of January

 

Note: I know this is not an optimal situation. I'm still working on a solution for that.

Thx for your detailed explanation. Will try it hopefully this weekend.

 

Sent from my NX569J using Tapatalk

 

 

 

Share this post


Link to post

Hi,

 

I finally started testing today (did pick up a small share). I still must do some daily backups but it is very easy. Thx!

 

Besides shares backups, I have a particular scenario where I placed some important shares (which are critical to me)  in a disk (disk1) which belongs to the array. I am thinking about doing a backup of the full disk instead of several backups of the shares. Do you think it would be posible?

 

Now I get this when I backup a share (isos) using your script. It is the first full backup.

/mnt/disks/UD_hdd2/backuptoexternalHDD/Shares/isos/20201023_195306

Maybe in my specific scenario I am looking for this:

/mnt/disks/UD_hdd2/backuptoexternalHDD/disk1/20201023_195306

Since I am not into coding I just want to make sure it is feasible. If yes pls let me know what I should look into and I will try to modify the script to adapt it.

 

Rgds.

Share this post


Link to post

Wait for the next release. It comes in a few days and supports complete path names and not only sharenames so you can set a disk path.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.