VM Backup speed

Jaster · October 3, 2020

Hi Guys,

I'm using the usual script to backup several VM's every day. I have about 700GB of images and the backup takes very long. Sure, it's writing images to a hdd which doesn't go beyond 150mb/s.

I'm wondering what strategies/solutions you use to improve that process? What I have in mind right now is backing up to cache and having the mover transfer the files "later" to the hdd's, but I'm hoping for better ideas.

bastl · October 3, 2020

@Jaster Have you tried to use BTRFS snapshots to backup your VMs? You need the source and target to be on BTRFS. I have my cache drive setup as BTRFS and a unassigned drive also formated as BTRFS as target for my backups. The initial backup transfers the first snapshot at takes some time, every new snapshot only transfers the differential changed or new data and is much quicker as a new copy of the VMs. Have a look into the following:

Jaster · October 3, 2020

I have very bad experience with btrfs and would like to avoid it - but might be an option as I would not create a raid with it.

Do you happen to have some scripts to tinker around with?

Edited October 3, 2020 by Jaster

bastl · October 3, 2020

@Jaster Sry, I linked you the wrong thread. Here is the one I use for my snapshots. Method 2 is what I use.

The initial Snapshot I do by hand every 2-3 months. For this snapshot I turn all my VMs down to have them in a safe shutdown state.

1. create a read only snapshot of my VMs share. This share isn't the default "domains" share which is created by unraid. It is already a BTRFS subvol on my cache, created like described in the thread from JorgeB and hosts all my VMs.

# create readonly snapshot
btrfs subvolume snapshot -r /mnt/cache/VMs /mnt/cache/VMs_backup

sync

2. send/receive initial snapshot copy to the target drive mounted at "VMs_backup_hdd". This process will take some time transfering all my vdisks.

btrfs send /mnt/cache/VMs_backup | btrfs receive /mnt/disks/VMs_backup_hdd

sync

3. After that I have 2 scripts running. First script runs every sunday, checking if VMs are running and if so, shutting them down and doing a snapshot. named as "VMs_backup_offline_" with the current date at the end.

#!/bin/bash
#backgroundOnly=false
#arrayStarted=true
cd /mnt/cache/VMs_backup
sd=$(echo VMs_backup_off* | awk '{print $1}')
ps=$(echo VMs_backup_off* | awk '{print $2}')

if [ "$ps" == "VMs_backup_offline_$(date '+%Y%m%d')" ]
then
    echo "There's already a snapshot from today"
else
	for i in `virsh list | grep running | awk '{print $2}'`; do virsh shutdown $i; done
	
		# Wait until all domains are shut down or timeout has reached.
		END_TIME=$(date -d "300 seconds" +%s)

		while [ $(date +%s) -lt $END_TIME ]; do
			# Break while loop when no domains are left.
			test -z "`virsh list | grep running | awk '{print $2}'`" && break
			# Wait a little, we don't want to DoS libvirt.
			sleep 1
		done
	echo "shutdown completed"
	virsh list | grep running | awk '{print $2}'
    btrfs sub snap -r /mnt/cache/VMs /mnt/cache/VMs_backup_offline_$(date '+%Y%m%d')
	for i in `virsh list --all --autostart|awk '{print $2}'|grep -v Name`; do virsh start $i; done
	sync
    btrfs send -p /mnt/cache/VMs_backup/$ps /mnt/cache/VMs_backup_offline_$(date '+%Y%m%d') | btrfs receive /mnt/disks/VMs_backup_hdd
        if [[ $? -eq 0 ]]; then
        /usr/local/emhttp/webGui/scripts/notify -i normal -s "BTRFS Send/Receive beendet" -d "Script ausgeführt" -m "$(date '+%Y-%m-%d %H:%M') Information: BTRFS VM Offline Snapshot auf HDD erfolgreich abgeschlossen"
        btrfs sub del /mnt/cache/$sd
        #btrfs sub del /mnt/disks/VMs_backup_HDD/VMs_backup/$sd
        else
        /usr/local/emhttp/webGui/scripts/notify -i warning -s "BTRFS Send/Receive gescheitert" -d "Script abgebrochen" -m "$(date '+%Y-%m-%d %H:%M') Information: Es wurde heute bereits ein Offline Snapshot erstellt"
        fi
fi

4. The second script runs daily and snapshots the VM as "VMs_backup_online_" with date no matter if they are running or not. Keep in mind if you have to restore snapshots of VMs which where running at the time the snapshot was taken, they will be in a "crashed" state. Not had any issues with that so far, but there might be situations with databases running in a VM which might break by this. Therefore I have set the weekly snapshots with all my VMs turned of. Just in case.

#!/bin/bash
#description=
#arrayStarted=true
#backgroundOnly=false
cd /mnt/cache/VMs_backup
sd=$(echo VMs_backup_onl* | awk '{print $1}')
ps=$(echo VMs_backup_onl* | awk '{print $2}')

if [ "$ps" == "VMs_backup_online_$(date '+%Y%m%d')" ]
then
    echo "There's already a snapshot from today"
else
    btrfs sub snap -r /mnt/cache/VMs /mnt/cache/VMs_backup_online_$(date '+%Y%m%d')
    sync
    btrfs send -p /mnt/cache/VMs_backup/$ps /mnt/cache/VMs_backup_online_$(date '+%Y%m%d') | btrfs receive /mnt/disks/VMs_backup_hdd
        if [[ $? -eq 0 ]]; then
        /usr/local/emhttp/webGui/scripts/notify -i normal -s "BTRFS Send/Receive beendet" -d "Script ausgeführt" -m "$(date '+%Y-%m-%d %H:%M') Information: BTRFS VM Online Snapshot auf HDD erfolgreich abgeschlossen"
        btrfs sub del /mnt/cache/$sd
        #btrfs sub del /mnt/disks/backup/$sd
        else
        /usr/local/emhttp/webGui/scripts/notify -i warning -s "BTRFS Send/Receive gescheitert" -d "Script abgebrochen" -m "$(date '+%Y-%m-%d %H:%M') Information: Es wurde heute bereits ein Online Snapshot erstellt"
        fi
fi

I don't have it automated in the way that old snapshots getting deleted automatically. I monitor the target drive and if it's getting full i delete some old snapshots. First command lists all the snapshots and the second deletes a specific one. Don't delete the initial read only snapshot if you have differential snaps building up on that.

btrfs sub list /mnt/disks/VMs_backup_hdd

btrfs sub del /mnt/disks/VMs_backup_hdd/VMs_Offline_20181125

If you have to restore a vdisk, simply go into the specific folder and copy the vdisk of the specific VM back to it's original share on the cache. The XML and NVRAM files for the VMs aren't backed up by this. Only the vdisks. To backup these files you can use the app "Backup/Restore Appdata" to backup the libvirt.img for example.

EDIT:

Forget to mention, I use a single 1TB NVME cache device formatted with BTRFS and a single old spinning rust 1,5TB hdd as unassigned device as target for the snapshots. Nothing special, no BTRFS raid involved.

Edited October 3, 2020 by bastl

Jaster · October 12, 2020

On 10/3/2020 at 12:06 PM, bastl said:
@Jaster Sry, I linked you the wrong thread. Here is the one I use for my snapshots. Method 2 is what I use.

The initial Snapshot I do by hand every 2-3 months. For this snapshot I turn all my VMs down to have them in a safe shutdown state.

1. create a read only snapshot of my VMs share. This share isn't the default "domains" share which is created by unraid. It is already a BTRFS subvol on my cache, created like described in the thread from JorgeB and hosts all my VMs.
# create readonly snapshot
btrfs subvolume snapshot -r /mnt/cache/VMs /mnt/cache/VMs_backup
sync
2. send/receive initial snapshot copy to the target drive mounted at "VMs_backup_hdd". This process will take some time transfering all my vdisks.
btrfs send /mnt/cache/VMs_backup | btrfs receive /mnt/disks/VMs_backup_hdd
sync
3. After that I have 2 scripts running. First script runs every sunday, checking if VMs are running and if so, shutting them down and doing a snapshot. named as "VMs_backup_offline_" with the current date at the end.
#!/bin/bash
#backgroundOnly=false
#arrayStarted=true
cd /mnt/cache/VMs_backup
sd=$(echo VMs_backup_off* | awk '{print $1}')
ps=$(echo VMs_backup_off* | awk '{print $2}')

if [ "$ps" == "VMs_backup_offline_$(date '+%Y%m%d')" ]
then
    echo "There's already a snapshot from today"
else
	for i in `virsh list | grep running | awk '{print $2}'`; do virsh shutdown $i; done
	
		# Wait until all domains are shut down or timeout has reached.
		END_TIME=$(date -d "300 seconds" +%s)

		while [ $(date +%s) -lt $END_TIME ]; do
			# Break while loop when no domains are left.
			test -z "`virsh list | grep running | awk '{print $2}'`" && break
			# Wait a little, we don't want to DoS libvirt.
			sleep 1
		done
	echo "shutdown completed"
	virsh list | grep running | awk '{print $2}'
    btrfs sub snap -r /mnt/cache/VMs /mnt/cache/VMs_backup_offline_$(date '+%Y%m%d')
	for i in `virsh list --all --autostart|awk '{print $2}'|grep -v Name`; do virsh start $i; done
	sync
    btrfs send -p /mnt/cache/VMs_backup/$ps /mnt/cache/VMs_backup_offline_$(date '+%Y%m%d') | btrfs receive /mnt/disks/VMs_backup_hdd
        if [[ $? -eq 0 ]]; then
        /usr/local/emhttp/webGui/scripts/notify -i normal -s "BTRFS Send/Receive beendet" -d "Script ausgeführt" -m "$(date '+%Y-%m-%d %H:%M') Information: BTRFS VM Offline Snapshot auf HDD erfolgreich abgeschlossen"
        btrfs sub del /mnt/cache/$sd
        #btrfs sub del /mnt/disks/VMs_backup_HDD/VMs_backup/$sd
        else
        /usr/local/emhttp/webGui/scripts/notify -i warning -s "BTRFS Send/Receive gescheitert" -d "Script abgebrochen" -m "$(date '+%Y-%m-%d %H:%M') Information: Es wurde heute bereits ein Offline Snapshot erstellt"
        fi
fi
4. The second script runs daily and snapshots the VM as "VMs_backup_online_" with date no matter if they are running or not. Keep in mind if you have to restore snapshots of VMs which where running at the time the snapshot was taken, they will be in a "crashed" state. Not had any issues with that so far, but there might be situations with databases running in a VM which might break by this. Therefore I have set the weekly snapshots with all my VMs turned of. Just in case.
#!/bin/bash
#description=
#arrayStarted=true
#backgroundOnly=false
cd /mnt/cache/VMs_backup
sd=$(echo VMs_backup_onl* | awk '{print $1}')
ps=$(echo VMs_backup_onl* | awk '{print $2}')

if [ "$ps" == "VMs_backup_online_$(date '+%Y%m%d')" ]
then
    echo "There's already a snapshot from today"
else
    btrfs sub snap -r /mnt/cache/VMs /mnt/cache/VMs_backup_online_$(date '+%Y%m%d')
    sync
    btrfs send -p /mnt/cache/VMs_backup/$ps /mnt/cache/VMs_backup_online_$(date '+%Y%m%d') | btrfs receive /mnt/disks/VMs_backup_hdd
        if [[ $? -eq 0 ]]; then
        /usr/local/emhttp/webGui/scripts/notify -i normal -s "BTRFS Send/Receive beendet" -d "Script ausgeführt" -m "$(date '+%Y-%m-%d %H:%M') Information: BTRFS VM Online Snapshot auf HDD erfolgreich abgeschlossen"
        btrfs sub del /mnt/cache/$sd
        #btrfs sub del /mnt/disks/backup/$sd
        else
        /usr/local/emhttp/webGui/scripts/notify -i warning -s "BTRFS Send/Receive gescheitert" -d "Script abgebrochen" -m "$(date '+%Y-%m-%d %H:%M') Information: Es wurde heute bereits ein Online Snapshot erstellt"
        fi
fi
I don't have it automated in the way that old snapshots getting deleted automatically. I monitor the target drive and if it's getting full i delete some old snapshots. First command lists all the snapshots and the second deletes a specific one. Don't delete the initial read only snapshot if you have differential snaps building up on that.
btrfs sub list /mnt/disks/VMs_backup_hdd

btrfs sub del /mnt/disks/VMs_backup_hdd/VMs_Offline_20181125
If you have to restore a vdisk, simply go into the specific folder and copy the vdisk of the specific VM back to it's original share on the cache. The XML and NVRAM files for the VMs aren't backed up by this. Only the vdisks. To backup these files you can use the app "Backup/Restore Appdata" to backup the libvirt.img for example.

EDIT:

Forget to mention, I use a single 1TB NVME cache device formatted with BTRFS and a single old spinning rust 1,5TB hdd as unassigned device as target for the snapshots. Nothing special, no BTRFS raid involved.

Thats very detailed, thanks!

I'm planing a single nvme drive for the VMs and using the cache (btrfs raid) as the backup location.

If I'd like to copy a specific snapshot, could I just "copy" (e.g. cp or krusader) or do I need to do some kind of restore?

Are the deltas created from the initial snapshot or from the previous?

Edited October 12, 2020 by Jaster

bastl · October 12, 2020

2 hours ago, Jaster said:

I'm planing a single nvme drive for the VMs and using the cache (btrfs raid) as the backup location.

Keep in mind the only thing which you need for the snapshot feature is that both, source and target are formated btrfs. You can use the cache drive as target, sure, but keep in mind you can't simply copy that backups from this storage to let's say the xfs formated array. Mover won't work if you're planning this. The cache isn't the best solution for this. Imagine your backups filling up your cache drive until full and preventing docker from working or causing issues transfering files over your network to a share using that cache. When Unraid 6.9 build with multi cache pool support is released, it might be a good option to have a second pool only used for backups for example.

2 hours ago, Jaster said:

If I'd like to copy a specific snapshot, could I just "copy" (e.g. cp or krusader) or do I need to do some kind of restore?

For restore you simply copy the files you need from a snapshot to where ever you want them. Overwrite a broken vdisk for example or for tests with a new VM to another unassigned device. It's up on you to use cp or krusader, both will work. It's basically a simple file copy for restore.

2 hours ago, Jaster said:

Are the deltas created from the initial snapshot or from the previous?

The snapshots are differential and all based on the initial read only snapshot. You can delete all snapshots in between the first initial one and your last one or keep some in between, doesn't matter. Essential is you keep the first one. Over time each snapshot itself will use more and more space because the changes compared to the first one will increase. At some point you have to recreate a fresh up to date initial read only snapshot.

Jaster · October 14, 2020

I'm about to perform the migration. Should I still use the domains share or should I use custom share for the used images?

How would/should I set it up?

What I gonna do: use a nvme for the images itself. Have a raid 10 SSD BTRFS cache where I keep the backups, but don't transfer those to the parity array (via mover or what so ever).

Create a script that creates a new "root backup" every Sunday and creates increments from those on a daily basis. After I got 5 weeks full, I'll delete the latest... I'm wondering if I need to set this up file by file or if I can script it somehow on a folder/share basis...?

bastl · October 15, 2020

12 hours ago, Jaster said:

Should I still use the domains share or should I use custom share for the used images?

How would/should I set it up?

Not sure, what Unraid changes in the future therefore you better don't change the default "Domains" share. In the advanced settings of VM manager you can point it to a new path for your VMs. This is how I did it.

example from my setup for my main "VM share":

btrfs subvolume create /mnt/cache/VMs

This is the first step if you wanna move on with BTRFS snapshots. It creates the subvolume for your VMs, in my case on the cache. At the same time a user share with the same name will be created. You can find it under shares besides all your shares you already have. You can configure it like any other share. Simply move your VMs to this new share and adjust the paths in the xml and thats it. Next step ist to create a read only snapshot (see a couple posts above) in my case "/mnt/cache/VMs_backup" which also creates a new share on the cache which will be the base for future snapshots. Anything changes in "/mnt/cache/VMs" compared to "/mnt/cache/VMs_backup" will be included in a new snapshot.

12 hours ago, Jaster said:

I'm wondering if I need to set this up file by file or if I can script it somehow on a folder/share basis...?

Each BTRFS subvolume is handled as it's own share. As soon as you create a subvolume, a user share will be created by Unraid. Technical you can create a subvol for each VM, but you need an extra script for each VM too and with a couple VMs it can get a messy. It's easier to have lets say "productive VM subvolume" which includes all your important VMs you regularly want to backup with 1-2 scripts and a second path for test VMs in case you play a lot with VMs. In my case as said earlier, my main 5 VMs ar sitting on the cache drive and another unassigned 500GB SSD hosts a couple VMs I use for testing only. When creating a new VM I only have to adjust the path for the vdisk to be included or excluded from the snapshots and thats it.

13 hours ago, Jaster said:

Create a script that creates a new "root backup" every Sunday and creates increments from those on a daily basis.

Keep in mind each initial snapshots needs the same space on the target as all the source VMs you wanna backup. Lets say you have 5 VMs each with 100GB allocated you wanna backup. With your idea you have to transfer 500GB data each sunday and the changes during the week. Changes might not be that big, but with only the backups from sunday over 5 weeks you need 2500GB alone on your raid 10 SSD cache. This way you wear out your ssd's really fast. Better use a spinner as target.

Jaster · October 15, 2020

I am going to run the images on a non assigned nvme. Should I create a subvol there or can I use "the whole thing"?

As for now I am at about 1TB+ of VM's.

I think I'll check the performance with a spinner and decide if I want to kill the SSDs or if I can live with a spinner - which I would only update every 2 weeks or so.

bastl · October 15, 2020

7 minutes ago, Jaster said:

Should I create a subvol there or can I use "the whole thing"?

A subvol is a requirement for snapshots to work. Only formating a drive with BTRFS is not enough. I think there are a couple things you're not understanding right about BTRFS and its features. Maybe I wasn't clear enough.

Source and target both have to be formatted with BTRFS.

Snapshots are differences between 2 subvolumes.

A subvolume is presented to a user as a share/folder.

Small advice, please read the full post from JorgeB again for better understanding how this BTRFS feature works and whats the differences between reflink and snapshots are.

Jaster · November 5, 2020

I've been trying to move some vm images (windows 10) to a btrfs snapshot folder, but it apprears they don't boot anymore. What did I do wrong here?

Jaster · November 5, 2020

Basicly, even if I change the location back to where it was, the VM won't boot anymore..?!

VM Backup speed

Recommended Posts

Jaster

Link to comment

bastl

Link to comment

Jaster

Link to comment

bastl

Link to comment

Jaster

Link to comment

bastl

Link to comment

Jaster

Link to comment

bastl

Link to comment

Jaster

Link to comment

bastl

Link to comment

Jaster

Link to comment

Jaster

Link to comment

Join the conversation