February 2, 201115 yr I'm about to build my first unRaid server as a replacement for my WHS box. Once complete I will need to transfer over 5TB of data to the unRaid server over Gigabit LAN. I built a small unRaid test server and achieved writes to it of around 20MB/s. Most of my files are large BluRay rips to individual ISO files, and as I'm concerned about data integrity I always copy with verify (using Teracopy, Fastcopy, xxcopy, etc.). Using verity of course approximately doubles the copying time, so at 20MB/s the copying is starting to take an unacceptable about of time. Copying 5TB at that speed with verify is going to take quite a long time to say the least So, investigating alternatives to speed up the copy process I thought about adding a cache drive to the unRaid server. I haven't verified it yet but I would expect to get maybe 50 to 70MB/s transfers based on reports in this forum. However, thinking a little more deeply, using as cache disk defeats the purpose of verified copying. The copy on the cache will be perfect, but I'm assuming the copy from the cache to the storage drives will be just a plain vanilla copy with no verity. This problem could be easily solved if there was some option within unRaid (a switch in the cron copy job, perhaps??) to do copies from the cache with verity. Does anybody know of such an option, or if not, is there some other solution available for verified cache copies?
February 2, 201115 yr hi the copy from cache to the array is done with Rsync and as far as i know that uses "verify" technology with md hash verification see this wiki http://en.wikipedia.org/wiki/Rsync
February 2, 201115 yr The process to move data from cache onto the array is called mover, which uses rsync to verify files as they are moved. This is the command in the script, "rsync -i -dIWRpEAXogt --numeric-ids --inplace --remove-source-files {} /mnt/user0/ \; \) -o " I'm not going to go look up all the options, but I'm fairly certain that this verifies the files as they are moved from the cache disk to the array. Someone can correct me if I am wrong.
February 2, 201115 yr Using a cache drive in your situation will slow things down. What happens when the cache drive is full? For maximum speed make sure you are writing to disk shares and not user shares.
February 2, 201115 yr Author Well that's service with a smile; three replies in less than 30 minutes. Great forum. Ok, I'll look into the rsync options. <<Using a cache drive in your situation will slow things down. What happens when the cache drive is full? >> Yes, I considered that problem. All my 8 drives will be new WD 2TB EARS so the transfer will have to be in <2TB steps per day. Or perhaps I can modify the cron job to run more frequently?? <<For maximum speed make sure you are writing to disk shares and not user shares.>> I thought as much but when I did testing on my test unRaid system I found no speed improvement at all. When copying a 4.7TB DVD ISO from my Windows box to unRaid the average transfer time was 158s to a drive share and 159s to a user share. This was repeatable. Seems I may have a bottleneck elsewhere in the system. When I get my new system up and running I'll redo the test. Thanks for the replies.
February 2, 201115 yr It sounds like you are presently just copying off the WHS box. Unassign the parity drive and copy to the data drives without it. Then, assign it and let it build once you are done copying. Peter
February 2, 201115 yr Author <<Unassign the parity drive and copy to the data drives without it. Then, assign it and let it build once you are done copying.>> Thank you Peter. That did the trick. My test copy went from 158 seconds to only 109s and averaged 41MB/s (from about 28MB/s). That's certainly a worthwhile improvement. Cheers, Ross
February 2, 201115 yr Author Just an update on rsync verify: <<This is the command in the script, "rsync -i -dIWRpEAXogt --numeric-ids --inplace --remove-source-files {} /mnt/user0/ \; \) -o ">> <<the copy from cache to the array is done with Rsync and as far as i know that uses "verify" technology with md hash verification >> I took a look at the wiki and also http://everythinglinux.org/rsync/ . As far as I can determine there is no verification that the newly written file was written without error, i.e., there is no post verification. It seems to use md hash to find differences between files of the same name that exist on both the cache and the drive store, and then transfers only blocks of data that contain the differences. I did not read anything to the effect that those writes were then verified, but I'm not really familiar with the Unix OS so I may be wrong (and I hope that I am wrong because I need verified writes from cache). Ross
February 2, 201115 yr having a read of the wiki, it would appear that rsync constantly verifies as it transfers I'd consider that better then a single final md5 / sfv check personally but if you're still concerned, you can do a final md5 check between the original copy (on the whs) and the final resting place of the file (post mover script) once you've copied everything, prior to wiping your whs original copies
February 2, 201115 yr Author Hello kal , Thanks for the reply. <having a read of the wiki, it would appear that rsync constantly verifies as it transfers> Can you paste the section that suggests this? I've re-read the wiki and I still can't see any text that suggests writes are subsequently verified in some way, either on-the-fly or later. <<if you're still concerned, you can do a final md5 check between the original copy (on the whs) and the final resting place of the file >> Sure that is possible, but I think that would defeat the purpose of writing to cache (faster transfers). I would be better just to write directly to the user or disk share with verify. Anyway, happy to be proven wrong on this Ross
February 2, 201115 yr well its a file syncronisation tool, and syncronizes on a block (chunk) level by comparing md5 checksums as it goes. Keeps resubmitting differing blocks until the whole destination file contains the same (md5 verified) blocks The recipient splits its copy of the file into fixed-size non-overlapping chunks and computes two checksums for each chunk: the MD4 hash, and a weaker 'rolling checksum'. It sends these checksums to the sender. Version 30 of the protocol (released with rsync version 3.0.0) now uses MD5 hashes rather than MD4 2nd point, Dont forget, the cache drive only speeds up the writes to the array (by essentially postponing the parity work till the mover script runs), so your TeraCopy verify (or the like) wouldnt be any faster with, or without a cache drive For these large, initial data copies, I'd be considering running without a cache drive, and without a parity drive, blitzing over all your data onto the unraid disks Verifying its all there ok (md5 checks etc as necessary), and then adding in the parity drive. Down the line then, if your subsequent protected unraid writes are thought to be slow, consider using a cache drive then
February 2, 201115 yr Author Here is my understanding of the context of that paragraph: The wiki doc talks about the comparison of two existing similar (but not identical) files. The differences between these two files are found using MD hash, etc. This is all done with only reads; there are not yet any writes done to the recipient file at this stage. Once the differences are found, only then are chunks containing those differences written by the sender to the recipient file. There is no mention in the doc that, after the sender writes this data, a subsequent verify takes place. The wiki does not mention how a new file (not existent on the recipient drive) is handled, but I assume it is just a standard copy with no verify. <For these large, initial data copies, I'd be considering running without a cache drive, and without a parity drive ...>\ Yes that is great advice, and it makes our discussion about rsync redundant. As an unRaid newbie I was under the impression that writing to a cache drive would be faster than writing directly to a drive share, but if the parity disk is unassigned I can't see how it can be faster (should be the same speed as writing to a drive share assuming the same disk specs). Thanks kal. Ross
February 2, 201115 yr I would do the copy without verify, and then do md5 calculations after the fact. You can do the md5 in parallel with copy operations involving other physical disks with no or minimal impact on copy speeds (obviously run the md5 program on the same machine where the data lives). If you are setting up a new array, I would consider doing at least part of the copying with parity in place. Running it under real world conditions is a good test. BTW, most users get 30 Mb/sec copying to the protected array (without verify).
February 2, 201115 yr Oops, that should have been 30 MB/sec +/- And after you compare a bunch of md5 results and find no differences, you may actually start to trust you setup to copy data accurately
Archived
This topic is now archived and is closed to further replies.