rsync Incremental Backup


Recommended Posts

The following script creates incremental backups by using rsync. Check the settings to define your own paths.

 

Donate? 🤗

 

https://codeberg.org/mgutt/rsync-incremental-backup > incbackup.sh

 

Explanations

  • All created backups are full backups with hardlinks to already existing files (~ incremental backup)
  • All backups use the most recent backup to create hardlinks or new files. Deleted files are not copied (1:1 backup)
  • There are no dependencies between the most recent backup and the previous backups. You can delete as many backups as you like. All backups that are left, are still full backups. This could be confusing as most incremental backup softwares need the previous backups for restoring the data. But this is not valid for rsync and hardlinks. Read here if you need more informations about links, inodes and files.
  • After a backup has been created the script purges the backup dir and keeps only the backups of the last 14 days, 12 month and 3 years, which can be defined through the settings
  • logs can be found inside of each backup folder
  • Sends notifications after job execution
  • Unraid exclusive: Stops docker containers if the source path is the appdata path, to create consistent backups
  • Unraid exclusive: Creates a snapshot of the docker container source path, before creating a backup of it. This allows an extremely short downtime of the containers (usually only seconds).

 

How to execute this script?

  • Use the User Scripts Plugin (Unraid Apps) to execute it by schedule
  • Use the Unassigned Devices Plugin (Unraid Apps) to execute it after mounting a USB drive
  • Call the script manually (Example: /usr/local/bin/incbackup /mnt/cache/appdata /mnt/disk6/Backups/Shares/appdata)

 

How does a backup look like?

This is how the backup dir looks like after several month (it kept the backups of 2020-07-01, 2020-08-01 ... and all backups of the last 14 days):

1830509929_2020-10-1908_51_36_2.png.bad74f9567a2ba0aa0cfb8ddf626ec1e.png

 

And as it's an incremental backup, the storage usage is low: (as you can see I bought new music before "2020-08-01" and before "2020-10-01"):

du -d1 -h /mnt/user/Backup/Shares/Music | sort -k2
168G    /mnt/user/Backup/Shares/Music/20200701_044011
4.2G    /mnt/user/Backup/Shares/Music/20200801_044013
3.8M    /mnt/user/Backup/Shares/Music/20200901_044013
497M    /mnt/user/Backup/Shares/Music/20201001_044014
4.5M    /mnt/user/Backup/Shares/Music/20201007_044016
4.5M    /mnt/user/Backup/Shares/Music/20201008_044015
4.5M    /mnt/user/Backup/Shares/Music/20201009_044001
4.5M    /mnt/user/Backup/Shares/Music/20201010_044010
4.5M    /mnt/user/Backup/Shares/Music/20201011_044016
4.5M    /mnt/user/Backup/Shares/Music/20201012_044020
4.5M    /mnt/user/Backup/Shares/Music/20201013_044014
4.5M    /mnt/user/Backup/Shares/Music/20201014_044015
4.5M    /mnt/user/Backup/Shares/Music/20201015_044015
4.5M    /mnt/user/Backup/Shares/Music/20201016_044017
4.5M    /mnt/user/Backup/Shares/Music/20201017_044016
4.5M    /mnt/user/Backup/Shares/Music/20201018_044008
4.5M    /mnt/user/Backup/Shares/Music/20201018_151120
4.5M    /mnt/user/Backup/Shares/Music/20201019_044002
172G    /mnt/user/Backup/Shares/Music

 

Warnings

  1. Its not the best idea to backup huge files like disk images that changes often as the whole file will be copied.
  2. A file change while copying it through rsync will cause a corrupted file as rsync does not lock files. If you like to backup for example a VM image file, stop it first (to avoid further writes), before executing this script!
  3. Never change a file, which is inside a backup directory. This changes all files in all backups (this is how hardlinks work)!
  4. Do not use NTFS or other partition formats, which do not support Hardlinks and/or Linux permissions. Format external USB drives with BTRFS and install WinBTRFS, if you want to access your backups through Windows.
  5. Do NOT use the docker safe perms tool if you backup the appdata share to the array. By that all file permissions are changed and can not be used by your docker containers anymore. Docker safe perms skips only the /mnt/*/appdata share and not for example /mnt/disk5/Backups/appdata!

 

 

  • Like 10
  • Thanks 1
  • Upvote 1
Link to comment
2 hours ago, kizer said:

From your Example which is the complete up today with all changes directory?

You mean those directories, which are visible in the screenshot? The most recent backup is inside "/20201019_044002" (generated at 2020-10-19 04:40:02). It contains a full 1:1 backup of the source path, but as you can see it has only a small size (because of the hardlinks):

du -d1 -h /mnt/user/Backup/Shares/Music
168G    /mnt/user/Backup/Shares/Music/20200701_044011
4.2G    /mnt/user/Backup/Shares/Music/20200801_044013
3.8M    /mnt/user/Backup/Shares/Music/20200901_044013
497M    /mnt/user/Backup/Shares/Music/20201001_044014
4.5M    /mnt/user/Backup/Shares/Music/20201007_044016
4.5M    /mnt/user/Backup/Shares/Music/20201008_044015
4.5M    /mnt/user/Backup/Shares/Music/20201009_044001
4.5M    /mnt/user/Backup/Shares/Music/20201010_044010
4.5M    /mnt/user/Backup/Shares/Music/20201011_044016
4.5M    /mnt/user/Backup/Shares/Music/20201012_044020
4.5M    /mnt/user/Backup/Shares/Music/20201013_044014
4.5M    /mnt/user/Backup/Shares/Music/20201014_044015
4.5M    /mnt/user/Backup/Shares/Music/20201015_044015
4.5M    /mnt/user/Backup/Shares/Music/20201016_044017
1.1M    /mnt/user/Backup/Shares/Music/20201017_044016
5.0M    /mnt/user/Backup/Shares/Music/20201018_044008
4.5M    /mnt/user/Backup/Shares/Music/20201018_151120
4.5M    /mnt/user/Backup/Shares/Music/20201019_044002
172G    /mnt/user/Backup/Shares/Music

 

 

Edited by mgutt
Link to comment
On 10/18/2020 at 2:14 PM, mgutt said:
  • All following backups use the most recent backup to copy only new files while existing files a hardlinked (= incremental backup)
  • This means each backup contains a 1:1 full backup, but does not waste storage

Hi, new to linux .. excuse my ignorance ...

-If I delete the main backup file (the 1st one created=full backup) > I lost everything, right?

-If I delete any of the "incremental" backups (the ones starting after the 1st backup) > I should at least have the latest backup file + the 1st one to recover my latest backup status?

On 10/18/2020 at 2:14 PM, mgutt said:

Yearly backups are only kept if they were generated on the first january

Just want to make sure what this means. Let´s assume I created a 1st full backup in 10th october 2019. Then I have not been running my server everyday, so I have not daily incremental backups (it is posible that some months the server is not powered on too). Assume on the 1st january 2021 my server is not working. The rule means it will delete all daily and monthly backups (incremental) created during 2020?

 

Rgds

Link to comment
40 minutes ago, luca2 said:

If I delete the main backup file (the 1st one created=full backup) > I lost everything, right?

No. Only the link to the file is deleted. A hardlink is only an additional link to the file and as long links exist, the file exists.

 

47 minutes ago, luca2 said:

I should at least have the latest backup file + the 1st one to recover my latest backup status?

No. You need only one of the folders. Nice, isn't it ;)

 

48 minutes ago, luca2 said:

Let´s assume I created a 1st full backup in 10th october 2019. Then I have not been running my server everyday, so I have not daily incremental backups (it is posible that some months the server is not powered on too). Assume on the 1st january 2021 my server is not working. The rule means it will delete all daily and monthly backups (incremental) created during 2020?

 

I'm not sure if I understand the question properly. I try to explain as follows:

- the backup from 10th October 2019 will be deleted if 14 newer daily backups exist

- the 10th of October is not kept as monthly backup as it's not generated on the 1st of a month

- the 10th of October is not kept as yearly backup as it's not generated on the 1st of January

 

Note: I know this is not an optimal situation. I'm still working on a solution for that.

Link to comment

 

You got me thinking about the missing DB backup of the Plex Appdata folder being very large on the NVme

So I wanted to use your incremental backup script, but I am not sure if I need to remove the cache in the path

and also not sure if this supports UAD for drives. I am trying to create the backup on a drive not the Array

Like this:

 

# settings
user_share="/appdata/plex"
backup_share="/mnt/disks/Drive1/backup" 
days=14 # preserve backups of the last X days
months=12 # preserve backups of the first day of the last X month
years=3 # preserve backups of the first january of the last X years
fails=3 # preserve the recent X failed backups
.....

But cache path fails (Got it working with normal path)

Update this will create a backup on my UAD disk with path:

\Drive1\backup\appdata\plex\.20201021_144056

 

But I am having a little problem using the cache path, I tried to replace all the /mnt/user/ in the script but keep getting errors

Is it better just to use the /mnt/user and not the cache direct for the script to work?

(You got me using /cache/ every now! 🙂 )

 


 

 

Edited by casperse
Link to comment

Some questions allowed?

 

1.) The Plex folder on my cache contains "trillions" of directories and files. Those files change rapidly. New files are added at high frequency. Does that mean that the result are trillions of hardlinks? Stupid question, I know. But I never worked with rsync that way.

 

2.) What about backup to remote locations? I do feed backups to Unraid servers at two different remote locations (see below). Will this work and create hardlinks at the remote location?

 

rsync -avPX --delete-during --protect-args -e ssh "/mnt/diskx/something/" "user@###.###.###.###:/mnt/diskx/Backup/something/"

 

Thanks in advance.

 

Link to comment
22 minutes ago, hawihoney said:

Does that mean that the result are trillions of hardlinks?

Yes and no. You said they change rapidly. This means they are new files. New files will be copied. Hardlinks are only generated if the files already exist = "old" files. And yes, many files mean many hardlinks. Thats the reason why each folder is an independent full backup. If you are concerned about performance. No clue. I would say it takes longer than only copying new files like this is done by usual incremental backup softwares. But those have the downside that you need to keep all old folders and they must generate an index file which covers deletions. Not sure which method is finally faster. I only know that its easier to know "this folder is my full backup of day X".

 

22 minutes ago, hawihoney said:

Will this work and create hardlinks at the remote location

Yes. "--link-dest" works remotely, too. rsync compares size and date of a file and if they do not fit it creates a new copy. If it already exists in the "--link-dest" folder, it creates a hardlink to the already copied file. Only requirement: --link-dest and destination must be on the same volume (I think this is logical).

 

I will consider remote backups in a future version.

 

Link to comment
 

I'm not sure if I understand the question properly. I try to explain as follows:

- the backup from 10th October 2019 will be deleted if 14 newer daily backups exist

- the 10th of October is not kept as monthly backup as it's not generated on the 1st of a month

- the 10th of October is not kept as yearly backup as it's not generated on the 1st of January

 

Note: I know this is not an optimal situation. I'm still working on a solution for that.

Thx for your detailed explanation. Will try it hopefully this weekend.

 

Sent from my NX569J using Tapatalk

 

 

 

Link to comment

Hi,

 

I finally started testing today (did pick up a small share). I still must do some daily backups but it is very easy. Thx!

 

Besides shares backups, I have a particular scenario where I placed some important shares (which are critical to me)  in a disk (disk1) which belongs to the array. I am thinking about doing a backup of the full disk instead of several backups of the shares. Do you think it would be posible?

 

Now I get this when I backup a share (isos) using your script. It is the first full backup.

/mnt/disks/UD_hdd2/backuptoexternalHDD/Shares/isos/20201023_195306

Maybe in my specific scenario I am looking for this:

/mnt/disks/UD_hdd2/backuptoexternalHDD/disk1/20201023_195306

Since I am not into coding I just want to make sure it is feasible. If yes pls let me know what I should look into and I will try to modify the script to adapt it.

 

Rgds.

Link to comment

How should we handle soft errors? At the moment my script marks the complete backup as "failed". I had these three permission problems although it transfered ~200k files:

rsync: readdir("/mnt/disks/DESKTOP-1234_Documents/Eigene Bilder"): Permission denied (13)
rsync: readdir("/mnt/disks/DESKTOP-1234_Documents/Eigene Musik"): Permission denied (13)
rsync: readdir("/mnt/disks/DESKTOP-1234_Documents/Eigene Videos"): Permission denied (13)

I checked my Client and those paths are hidden and I'm not able to open them, too:

166088059_2020-11-0323_38_31.png.fa18a1f285b1a1652307acbd5d62f4a6.png

 

I have multiple paths which where marked as "not successful"  because of similar problems:

1516260663_2020-11-0400_19_50.png.1edebe9ba511775f2d06bad79aed2483.png

 

Sadly rsync does not return a error/success statistic. Only "code 23" without the amount of files:

sent 28,943,321,517 bytes  received 2,773,155 bytes  32,688,983.25 bytes/sec
total size is 28,927,317,322  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1189) [sender=3.1.3]
Preserve failed backup: .20201103_231558

And rsync even does not return a proper error code, as "23" means "partial transfer" and not only "permission error":

https://unix.stackexchange.com/a/491461/101920

 

"23" is even returned if the complete source path is wrong. :|

Edited by mgutt
Link to comment

Sorry, need to ask an additional question:

 

Consider an existing full backup. Two more daily inremental backups exist as well. Now I delete a file. What's the state after the next run of the script?

 

Does the file exist in the full backup folder? Should be.

Does the file exist in the two incremental backup folders? Should be.

Does the file exist in the new latest incremental backups? Should not.

 

Thanks. This stuff is new to me.

 

Link to comment
4 hours ago, hawihoney said:

Does the file exist in the full backup folder? Should be.

Does the file exist in the two incremental backup folders? Should be.

Does the file exist in the new latest incremental backups? Should not.

Yes, works exactly as you described ;)

 

 

Edited by mgutt
Link to comment
21 hours ago, mgutt said:

How should we handle soft errors?

I tried the "--stats" option:

https://serverfault.com/a/678308/44086

 

The result of the "failing" job

...
DESKTOP-I0HHMD9_Downloads/FileBot_4.9.0_x64.msi
DESKTOP-I0HHMD9_Downloads/FileBot_4.9.1_x64.msi
DESKTOP-I0HHMD9_Downloads/FileZilla_Pro_3.49.2_win64-setup.exe
DESKTOP-I0HHMD9_Downloads/FileZilla_Pro_3.50.0_win64-setup.exe
rsync: send_files failed to open "/mnt/disks/DESKTOP-I0HHMD9_Downloads/FileZilla_Pro_3.51.0_win64-setup.exe": Permission denied (13)
DESKTOP-I0HHMD9_Downloads/FilmeMKVsortbyAudioLastFile.txt
DESKTOP-I0HHMD9_Downloads/Firefox Installer.exe
DESKTOP-I0HHMD9_Downloads/Firefox Setup 82.0.2.exe
...

Number of files: 24,371 (reg: 22,745, dir: 1,626)
Number of created files: 24,371 (reg: 22,745, dir: 1,626)
Number of deleted files: 0
Number of regular files transferred: 22,745
Total file size: 29,887,743,868 bytes
Total transferred file size: 29,887,743,868 bytes
Literal data: 29,875,526,492 bytes
Matched data: 0 bytes
File list size: 589,788
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 29,884,270,967
Total bytes received: 442,439

sent 29,884,270,967 bytes received 442,439 bytes 30,386,083.79 bytes/sec
total size is 29,887,743,868 speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1189) [sender=3.1.3]
Script Finished Nov 04, 2020 17:01.49

 

 

As it returned "22,745 of 22,745" transferred files altough one is missing, I opened an rsync bug report.

 

So we can't rely on the "stats" summary to solve this issue. I need to think about this further.

Link to comment

I don't use User Shares so I did use Disk Shares and these don't work well with this script. I know, I know, it's not designed that way but I want to share my experience. The reason why I don't use User Shares is, that I do have two different remote Backup Locations and using User Shares with huge directories and files over SMB to remote locations often crash. Using Disk Shares helps most of the time:

 

What I did is:

[...]
source_paths=(
    "/mnt/disk17/Notizen"
)
backup_path1="/mnt/hawi/192.168.178.101_disk17/Backup"
#backup_path2="/mnt/hawi/192.168.178.102_disk1/Backup"

That's the result:

Create backup of /mnt/disk17/Notizen
Backup path has been set to /mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen
Create full backup 20201105_130528
sending incremental file list
Notizen/
[...]
sent 61,223 bytes received 759 bytes 41,321.33 bytes/sec
total size is 57,872 speedup is 0.93
mv: cannot move '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201105_130528' to '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201105_130528': Permission denied
Preserve failed backup: .20201105_130528
DONE

In your original post you mention 'shares' but disk shares are shares as well.

 

 

I don't want to rant, I just want to add feature requests:

 

1.) Always use the last subdir and don't check for /mnt/user resp. /mnt. So in my case 'Notizen' would be used as new subdir for the backup path. It's irrelevant if it's /mnt/user/Notizen or /mnt/disk17/Notizen then. The resulting backup path would be better then and independent from a user share or a disk share. 

 

2.) Don't know why there's a permission problem. My own rsync jobs work that way since years. Any idea?

 

 

Link to comment
Quote

I don't use User Shares so I did use Disk Shares

Nothing wrong with that. The script should work with all paths (and not only "shares").

 

Quote

Always use the last subdir and don't check for /mnt/user resp. /mnt. So in my case 'Notizen' would be used as new subdir for the backup path. It's irrelevant if it's /mnt/user/Notizen or /mnt/disk17/Notizen then. The resulting backup path would be better then and independent from a user share or a disk share. 

This would cause a huge problem if a user backups different paths with the same last subdir name. Example:

/mnt/user/Moritz/Notizen
/mnt/user/Max/Notizen

Both would target "/backup/Notizen". I know its "ugly" having super long paths, but how could we solve this? Maybe an optional setting like "force last subdir name"?

 

2 hours ago, hawihoney said:

Don't know why there's a permission problem

The permission problem is not related to rsync. It's only related to the "mv" command. The "mv" renames (or "moves") the backup folder from the hidden ".20201105" to "20201105" if the backup was successful:

mv "${backup_path}/.${new_backup}" "${backup_path}/${new_backup}"

It's a really basic linux command so I wonder why it does not work for you.

 

How did you mount "/mnt/hawi/192.168.178.101_disk17/Backup" and what could be the reason why read & write is allowed, but not renaming?

 

Please manually repeat the command through the WebTerminal:

mv '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201105_130528' '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201105_130528'

If it works now, then something in this path locked the directory or a file inside of it. Maybe an index service on the external location or similar?
 

Edited by mgutt
Link to comment

Interesting. When issued manually from console on source server it works:

root@Tower:~# mv '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201105_183346' '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201105_183346'
root@Tower:~#

This is what I see on target server after the mv:

root@TowerVM01:/mnt/disk17/Backup# ls -lisa /mnt/disk17/Backup/disk17/Notizen/
total 0
10952386795 0 drwxrwxrwx 3 hawi users 37 Nov  5 18:34 ./
 8877660570 0 drwxrwxrwx 3 hawi users 29 Nov  5 18:33 ../
   30738631 0 drwxrwxrwx 3 hawi users 51 Nov  5 18:33 20201105_183346/

This is the mount command:

mount -t cifs -o rw,nounix,iocharset=utf8,_netdev,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.0,username=hawi,password=******** '//192.168.178.101/disk17' '/mnt/hawi/192.168.178.101_disk17'

Looks ok to me. Is it possible that something from within the script is being hold when issued via SMB to a remote server?

 

Edited by hawihoney
Link to comment
4 hours ago, hawihoney said:

Looks ok to me. Is it possible that something from within the script is being hold when issued via SMB to a remote server?

 

As it works for me while collecting files from an external SMB share and backing up local paths, it should not related to my script I think. I can only guess. I would say if you are sure that no other processes on the destination server are accessing the freshly generated backup, it could be something related to the Linux write cache. Maybe its not possible to rename the folder because its not fully written to the HDD. If this is the case a timeout should help.

 

So I released v0.3

# - rsync returns summary
# - typo in notification corrected
# - skip some rsync errors (defaults are "0" = skip on success and "24" = skip if some files vanish from the source while transfer)
# - add timeout for backup renaming https://forums.unraid.net/topic/97958-rsync-incremental-backup/?tab=comments#comment-910188

This version tries every second to rename the backup and fails after rename_timeout which has a default setting of 100. Please return feedback if this solves your issue and if it works, how many tries / seconds were needed (are visible through the logs).

Edited by mgutt
Link to comment

I found a small bug. If the 15th day, which should be deleted, contains multiple backups, it will delete only one on each script execution while the rest is kept by the "Keep multiple backups per day" condition:

Preserve daily backup: 20201106_011906
Preserve daily backup: 20201105_232908
Preserve daily backup: 20201103_044001
Preserve daily backup: 20201102_044001
Preserve monthly backup: 20201101_044004
Preserve daily backup: 20201031_044001
Preserve daily backup: 20201030_044001
Preserve daily backup: 20201029_044001
Preserve daily backup: 20201028_044001
Preserve daily backup: 20201027_044001
Preserve daily backup: 20201026_044001
Preserve daily backup: 20201025_044005
Preserve daily backup: 20201024_044001
Preserve daily backup: 20201023_044002
Preserve daily backup: 20201022_044001
Delete 20201018_183054
Keep multiple backups per day: 20201018_182807
Keep multiple backups per day: 20201018_181402
Keep multiple backups per day: 20201018_181234
Keep multiple backups per day: 20201018_181134
Keep multiple backups per day: 20201018_151209
Keep multiple backups per day: 20201018_044006
Preserve monthly backup: 20201001_044014
Preserve monthly backup: 20200901_044011

Instead it should delete all of them.

Edited by mgutt
Link to comment
8 hours ago, mgutt said:

Please return feedback if this solves your issue and if it works, how many tries / seconds were needed (are visible through the logs).

Thanks for your change. Unfortunately it didn't work neither - even after 100 tries to move the temporary folder. Without the hidden temporary folder and it's finishing 'mv' the script works perfect.

 

Please forget it. It must be something on my side I think. No idea what. I would suggest to remove that particular change because it's related to one (mine) system only.

 

[...]
Try #99 to make backup visible
mv: cannot move '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/.20201106_072853' to '/mnt/hawi/192.168.178.101_disk17/Backup/disk17/Notizen/20201106_072853': Permission denied
Preserve failed backup: .20201106_072853

 

Edited by hawihoney
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.