Verify Copied Files are Exact Copies

rd48sec · August 29, 2013

I am new to unRAID. I have my server running and copying a bunch of media files to it via SMB and FTP. Before I delete the source files, I would like to make sure that the files on the unRAID server are exact copies with no errors of the originals. What can I use to do this on Win7?

TIA.

garycase · August 29, 2013

Foldermatch works well for comparing files/folders.

http://www.foldermatch.com/

garycase · August 29, 2013

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders; then when they're on the server you can verify the checksums.

This works well for doing that: http://corz.org/windows/software/checksum/

You install it; then highlight the folder that contains your media files; right-click; and select "Create checksums"

Then you can test the files at any time by right-clicking on the folder and selecting "Verify checksums". So if you copied the folders to UnRAID, you can simply "point" to the UnRAID copy, right-click, and choose "Verify checksums". If they're all okay, it's statistically almost certain the files are good copies.

rd48sec · August 29, 2013

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders; then when they're on the server you can verify the checksums.

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server? It seems to me that should work. I agree, creating the checksums before hand would be best.

Napryc · August 29, 2013

Teracopy is a free program which allows to copy and perform checksums, you can find it here: http://codesector.com/teracopy

garycase · August 29, 2013

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders; then when they're on the server you can verify the checksums.

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server? It seems to me that should work. I agree, creating the checksums before hand would be best.

Sure, that would work. You'd need to change the configuration file with the checksum utility so it puts all the checksums in a single file at the root -- otherwise it will put the checksum file for each folder in that folder, which would be a lot of copying for you. I prefer the latter (so I can easily test any specific folder), but for what you want to do it would be much easier to have a single checksum file you could copy; then do the "verify checksums" command on the array.

jumperalex · October 7, 2013

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders; then when they're on the server you can verify the checksums.

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server? It seems to me that should work. I agree, creating the checksums before hand would be best.

Sure, that would work. You'd need to change the configuration file with the checksum utility so it puts all the checksums in a single file at the root -- otherwise it will put the checksum file for each folder in that folder, which would be a lot of copying for you. I prefer the latter (so I can easily test any specific folder), but for what you want to do it would be much easier to have a single checksum file you could copy; then do the "verify checksums" command on the array.

garycase: I'm actually posting this based on your comments in this thread (http://lime-technology.com/forum/index.php?topic=29514.0;topicseen) but figured it would be less of a thread-jack if I posted here.

Anyway my questions are:

Using the checksum windows program pointed at unraid, doesn't the program have to pull the entire file over before it can create the checksum? Given that, won't it take a veeerrrrry long time vice creating the checksums natively on the server the first time it is run?

Which of course brings me to my main question: is it possible to create the checksums directly on unraid the first time in such a way that the windows checksum app will be able to take over ongoing checksum creation and validation? I just have this image of my PC and Array running full bore for days hammering my network trying to transfer all my files

Barziya · October 7, 2013

Teracopy is a free program which allows to copy and perform checksums, you can find it here: http://codesector.com/teracopy

It would have been a great feature if TeraCopy had the option to save checksum files alongside whatever files it copied. That would save us the huge time waste to read a BIG file twice -- once when creating a checksum file for it, and a second time when copying that file to the server. If you are copying from a *nux machine, then you can accomplish the whole thing in a single pass with a tee command in a pipe. I haven't found any nice GUI with such functionality for Windows though.

foo_fighter · October 7, 2013

Yes, you can use md5deep to create and check md5 checksums on the server itself.

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders; then when they're on the server you can verify the checksums.

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server? It seems to me that should work. I agree, creating the checksums before hand would be best.

Sure, that would work. You'd need to change the configuration file with the checksum utility so it puts all the checksums in a single file at the root -- otherwise it will put the checksum file for each folder in that folder, which would be a lot of copying for you. I prefer the latter (so I can easily test any specific folder), but for what you want to do it would be much easier to have a single checksum file you could copy; then do the "verify checksums" command on the array.

garycase: I'm actually posting this based on your comments in this thread (http://lime-technology.com/forum/index.php?topic=29514.0;topicseen) but figured it would be less of a thread-jack if I posted here.

Anyway my questions are:

Using the checksum windows program pointed at unraid, doesn't the program have to pull the entire file over before it can create the checksum? Given that, won't it take a veeerrrrry long time vice creating the checksums natively on the server the first time it is run?

Which of course brings me to my main question: is it possible to create the checksums directly on unraid the first time in such a way that the windows checksum app will be able to take over ongoing checksum creation and validation? I just have this image of my PC and Array running full bore for days hammering my network trying to transfer all my files

jumperalex · October 7, 2013

Well right, but I'm with gary that the windows app seems to be a much more usable app in the long run. But concern is the initial time to has my entire array. And mine isn't even that big. I guess my question, after doing a bit more research on md5deep, hashdeep, and checksum, is what would be the best way to go about using md5/hashdeep to make the initial pass such that I can then use checksum to check those hashes when needed and create new hashes as I add new files.

garycase · October 7, 2013

Using the checksum windows program pointed at unraid, doesn't the program have to pull the entire file over before it can create the checksum? Given that, won't it take a veeerrrrry long time vice creating the checksums natively on the server the first time it is run?

Yes, it indeed has to read the entire file to create the checksums. With a Gb network, that's nearly as fast as if it was being done natively on the UnRAID box, but of course it also results in your client (e.g. Windows box) being busy for many hours while creating the checksums. You can, of course, still use your Windows box with no problem during these computations [just don't shut it down or reboot ].

is it possible to create the checksums directly on unraid the first time in such a way that the windows checksum app will be able to take over ongoing checksum creation and validation?

You can create and verify checksums on the UnRAID box using the Linux utility md5deep. Note there is also a Windows version of md5deep ... but this would have the same "issue" you've already noted -- the Windows box would be busy for hours creating/verifying the checksums.

Note that once you've created your checksums, if you do another "Create checksums" on the same disk, it will prompt you that it's already found checksums -- you then click on "Synchronize" and it will only recompute them for folders that have changed. If nothing's changed, it only takes a couple minutes to do the check.

I simply prefer doing everything from Windows -- and I really like the very simple interface of the Corz utility.

jumperalex · October 7, 2013

Yes, it indeed has to read the entire file to create the checksums. With a Gb network, that's nearly as fast as if it was being done natively on the UnRAID box, but of course it also results in your client (e.g. Windows box) being busy for many hours while creating the checksums. You can, of course, still use your Windows box with no problem during these computations [just don't shut it down or reboot ].

I guess my concern is that with a saturated network and reads from any given drive I'll have problems streaming from Plex. But I suppose you're correct that doing it natively from Unraid vs thru a gigabit network might result in about the same speed. Not to mention that either way the drive being scanned will still be slammed with reads possibly causing streaming issues. I suppose I won't know until I try [shrug]. I will also say, that my ability to reliably see 100+mB/s over my network is sketchy. I seem to hover more around 80 even reading from my cache drive (640GB WD Black) which will surely impact overall hashing speed :-(

You can create and verify checksums on the UnRAID box using the Linux utility md5deep. Note there is also a Windows version of md5deep ... but this would have the same "issue" you've already noted -- the Windows box would be busy for hours creating/verifying the checksums.

Note that once you've created your checksums, if you do another "Create checksums" on the same disk, it will prompt you that it's already found checksums -- you then click on "Synchronize" and it will only recomputed them for folders that have changed. If nothing's changed, it only takes a couple minutes to do the check.

I simply prefer doing everything from Windows -- and I really like the very simple interface of the Corz utility.

Well yeah I too want to use Corz utility after the initial "slog". When you say I'll be prompted that checksums have been found are you referring to Corz's utility or md5deep? Cause my ideal would be to use md5deep CLI following by Corz' windows utility.

Well at this point I probably just need to get my hands dirty so I can ask smarter questions Thanks for the tips.

garycase · October 7, 2013

Try the Corz utility -- I doubt you'll ever switch

By the way, I don't use it for an entire share ... I do the share "elements" one disk at a time.

i.e. I have a "DVDs" share ... but when I created the checksums I did them for \\Tower\Disk1\DVDs, then for \\Tower\Disk2\DVDs, ... until they were all done (14 disks), instead of \\Tower\DVDs (which would definitely taken a VERY long time).

There's really no reason to do "verify" on any regular basis -- you may want to do it once after they're all created just to confirm all's okay; but after that it's only necessary if a parity check indicates some sync errors and you want to confirm they're not in the data; or if a disk shows read errors you'll likely want to check the files on that disk.

jumperalex · October 7, 2013

Yeah I had no intent to do regular checks, just when (not if) I every have a failed parity check. I've been running with scissors for too long IMO by not having checksums.

Well right so doing it piece wise will at least let me maybe run bits at a time at night and during the work day followed by "time off" while I might be streaming flow Plex in the evening.

Soo now you have me thinking more ... doing it your way you ended up with a single hash file for each disk right? Thinking more about it, I guess that is the best way to do it for the intended purpose of dealing with parity fails since you'd need to check the whole disk. If I had hash files on a per folder basis would I have to check each folder in succession or can I tell checksum to check all folders? Even if I could, would there be any reason to do that?

I guess at this point I'm asking more for an idea on best practice regardless of the application I use with the understanding that my main concern is dealing with a failed parity check and looking for the files that have gone bad.

garycase · October 7, 2013

Actually doing it the way I do ends up with a hash file for each folder -- in my case, since I store all my DVDs in individual folders, that's one/DVD. You can configure Checksum to keep one hash file for the entire disk, but I prefer the default one/folder setup. Either way will work fine.

You can still create/verify the entire disk with a single command => you just right-click on a disk [e.g. \\Tower\disk1] or the part of the disk you want to work on [\\Tower\disk1\DVDs] and it will process all of the folders it finds.

jumperalex · October 8, 2013

Ahhh ok I see. And it will work the same way then I take it when verifying?

Sadly "other" obligations kept me from even sitting down at my desk last night [sigh] so it'll have to wait until today. Again, thanks for your help. Always appreciated.

garycase · October 8, 2013

Ahhh ok I see. And it will work the same way then I take it when verifying?

Yes, you can easily verify a single folder, an entire disk, or even an entire share ... simply by selecting what you want; right-clicking; and selecting "verify checksums".

The only time it really took a LONG time (in my case ~ a week) was when I first started using it and had to generate the checksums for all of he content on my disks (I did this not only for my arrays, but also for all of my backup disks -- I actually used 3 computers to do this so I could process several disks at a time).

jumperalex · October 8, 2013

yeah 3 weeks of my desktop being awake 24/7 and my network being hammered is what I am trying to avoid by generating the hashes on unraid natively.

Well after last nights' fiasco I've been promised a night to "catch up on doing nothing" which in my case means getting to hack around on the system

garycase · October 8, 2013

... 3 weeks of my desktop being awake 24/7 and my network being hammered is what I am trying to avoid

3 weeks??!! Your sig shows 2 2TB data drives => you could generate checksums on those in a day ... 2 at the most (depending on which options you choose -- MD5,s SHA1, or both)

jumperalex · October 8, 2013

;-) yeah I know you're probably right. Like I said, I need to start playing around to really know what I'm dealing with here. I just didn't relish pushing 3.5TB of data over my network, but hey its fire and forget anyway right. And really if I think about it, I can probably skip one of my largest directories since the data is, replaceable, disposable, and slightly fault tolerant.

garycase · October 8, 2013

I just didn't relish pushing 3.5TB of data over my network ...

You're worry about a non-problem. That's a very small amount of data ... especially you have a Gb network infrastructure (I assume you do, since the motherboard a Gb NIC).

Verify Copied Files are Exact Copies

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation