copying 27tb from server a to b


pyrater

Recommended Posts

What is the best way to do this? Enable parity or just do a pairty sync once data is copied? Use Rsync over ssh or windows explorer or teracopy??  any best practices for this or am i overthinking it? 

 

I kinda just want to disable pairty copy the data at 125 mb/s and the redo pairty ....seems fastest but other threads state that this may result in data corruption that you will not know about..... 

Link to comment

I'm not in too much of a hurry. I would go for rsync - and I would then run rsync again to verify if there are updated/new files since the first copy started.

 

With 110 MB/second you can manage about 400 GB / hour. So just under 3 days. And if you need to you can stop the copy process in the middle if you want to look at a movie without the copy process stealing too much disk/network bandwidth. For me, it wouldn't be the total time that matters most but how much the copy process will affect my access to the data.

 

Just remember that you want to use turbo-write, i.e. reconstructive writes, or the receiving machine will not be able to keep up with the network link speed.

 

 

I'm assuming most of the data is media data (movies, audio, images) so no gain to hope for from stream-compressing the data.

  • Like 1
Link to comment
1 hour ago, pyrater said:

Will rsync account for broken pipes / files if i set it up to run from say midnight to 5 am over and over until all the data is copied? that way its usable in the day and not killing the network while im on it.

 

In my experience rsync is very resilient aginst problems like disconnects, running out of space at the wrong time and the like.  Even user aborts with a Control-C don't cause problems.  I re-run, it sees what's still to be completed, and then it gets on with it.  I also use it occasionally to to binary compares between servers, so I also have an additional check that the copies are good.  It's never let me down. 

Link to comment

Im thinking of doing transfers from 11pm to 7 am every night until its complete. Anyone see any issues with this?

 

Run from source server:


ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

 

cd /usr/local/

nano rsync_script

 

#!/bin/sh
rsync -e 'ssh -p 22' -av /mnt/user/Data 192.168.2.5:/mnt/user/Data

 

ctl X

 

chmod +x rsync_script


crontab -e 
00 21 * * * /usr/local/rsync_script
0 7 * * * root killall rsync


Mostly got the info from: https://www.techrepublic.com/article/how-to-set-up-auto-rsync-backups-using-ssh/

 

 

Edited by pyrater
Link to comment
39 minutes ago, pyrater said:

S80 can you hook me up with the command to  do the compares between servers? I would like to run that after the data is copied to make sure nothing was borked in the process. Currently im using

 

rsync -av --info=progress2 -e ssh Data/ [email protected]:/mnt/user/

OK - so this is what I use, but I am not using ssh.  I have the rsync daemon running on the server I am copying to.  You can have it running on the source if you prefer, and then pull the files to the destination. Obviously the source and destingation parametes would need to change...  My destination is called BackupServer. and I am checking the endire contents of a disk on the source server (disk1 in this case) agains a share on the destination server (Disk1-backup).

rsync -rvnc --progress --delete --timeout=18000 /mnt/disk1/ BackupServer::mnt/user/Disk1-backup 2>&1 > /mnt/cache/Disk1_differences.txt & 

For my use, this spawns a task (the & at the end) so that I can run multiple concurrent sessions on different physical disk drives.  The -n option is to perform a dry-run, so no files get changed by this.  The -c option is to use the checksums of the files as a comparison instead of the date and time stamps.  Any differences are logged in a file for later examination - in this case Disk1_differences.txt.  The very long timeout is simply because if you have a small number of very large files (big blu-ray rips, for example) the connection can go quiet for a long time while the checksums are being calculated, and I did have some issues with rsync timing out in such cases.

 

Another approach altogether would be to use the Dynamix File Integrity plugin.  This stores a hash for each file in the extended attributes.  rsync copies those across as well when copying files (or adds them later if they are newly generated).  Then you can use the same plugin on the destination server to check the files there.  I haven't yet tried this plugin to check files on my backup server, but I am now starting to use it to validate files on my main server.

Edited by S80_UK
Link to comment

In my case the backup is a mirror of the master, so --delete tells me if there are any files on the backup which are not present on the source.  If I didn't do a dry run, those files would get deleted from the backup.  That's not what everyone wants, but it suits my use case.

Edited by S80_UK
Link to comment
15 hours ago, pyrater said:

Will rsync account for broken pipes / files if i set it up to run from say midnight to 5 am over and over until all the data is copied? that way its usable in the day and not killing the network while im on it.

Rsync is excellent at allowing you to restart aborted copy operations. That's why I suggested rsync and the ability to just copy during off-hours and break the copy when you want personal access to your files without rsync stealing bandwidth.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.