Verify Copied Files are Exact Copies


Recommended Posts

I am new to unRAID.  I have my server running and copying a bunch of media files to it via SMB and FTP.  Before I delete the source files,  I would like to make sure that the files on the unRAID server are exact copies with no errors of the originals.  What can I use to do this on Win7?

 

TIA.

Link to comment

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders;  then when they're on the server you can verify the checksums.

 

This works well for doing that:  http://corz.org/windows/software/checksum/

 

You install it;  then highlight the folder that contains your media files; right-click; and select "Create checksums"

 

Then you can test the files at any time by right-clicking on the folder and selecting "Verify checksums".    So if you copied the folders to UnRAID, you can simply "point" to the UnRAID copy, right-click, and choose "Verify checksums".    If they're all okay, it's statistically almost certain the files are good copies.

 

Link to comment

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders;  then when they're on the server you can verify the checksums.

 

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server?  It seems to me that should work.  I agree, creating the checksums before hand would be best.

Link to comment

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders;  then when they're on the server you can verify the checksums.

 

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server?  It seems to me that should work.  I agree, creating the checksums before hand would be best.

 

Sure, that would work.  You'd need to change the configuration file with the checksum utility so it puts all the checksums in a single file at the root -- otherwise it will put the checksum file for each folder in that folder, which would be a lot of copying for you.    I prefer the latter (so I can easily test any specific folder), but for what you want to do it would be much easier to have a single checksum file you could copy; then do the "verify checksums" command on the array.

 

Link to comment
  • 1 month later...

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders;  then when they're on the server you can verify the checksums.

 

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server?  It seems to me that should work.  I agree, creating the checksums before hand would be best.

 

Sure, that would work.  You'd need to change the configuration file with the checksum utility so it puts all the checksums in a single file at the root -- otherwise it will put the checksum file for each folder in that folder, which would be a lot of copying for you.    I prefer the latter (so I can easily test any specific folder), but for what you want to do it would be much easier to have a single checksum file you could copy; then do the "verify checksums" command on the array.

 

garycase: I'm actually posting this based on your comments in this thread (http://lime-technology.com/forum/index.php?topic=29514.0;topicseen) but figured it would be less of a thread-jack if I posted here. 

 

Anyway my questions are:

 

Using the checksum windows program pointed at unraid, doesn't the program have to pull the entire file over before it can create the checksum?  Given that, won't it take a veeerrrrry long time vice creating the checksums natively on the server the first time it is run? 

 

Which of course brings me to my main question: is it possible to create the checksums directly on unraid the first time in such a way that the windows checksum app will be able to take over ongoing checksum creation and validation?  I just have this image of my PC and Array running full bore for days hammering my network trying to transfer all my files :o

Link to comment

Teracopy is a free program which allows to copy and perform checksums, you can find it here: http://codesector.com/teracopy

It would have been a great feature if TeraCopy had the option to save checksum files alongside whatever files it copied.  That would save us the huge time waste to read a BIG file twice -- once when creating a checksum file for it, and a second time when copying that file to the server.  If you are copying from a *nux machine, then you can accomplish the whole thing in a single pass with a tee command in a pipe.  I haven't found any nice GUI with such functionality for Windows though.

 

 

Link to comment

Yes, you can use md5deep to create and check md5 checksums on the server itself.

 

Another way to ensure you have good copies is to first (BEFORE you copy the files) add a checksum to all of the folders;  then when they're on the server you can verify the checksums.

 

Would it work to create checksums on the original files/directories, then copy the .hash files to the server and verify the checksums on the server?  It seems to me that should work.  I agree, creating the checksums before hand would be best.

 

Sure, that would work.  You'd need to change the configuration file with the checksum utility so it puts all the checksums in a single file at the root -- otherwise it will put the checksum file for each folder in that folder, which would be a lot of copying for you.    I prefer the latter (so I can easily test any specific folder), but for what you want to do it would be much easier to have a single checksum file you could copy; then do the "verify checksums" command on the array.

 

garycase: I'm actually posting this based on your comments in this thread (http://lime-technology.com/forum/index.php?topic=29514.0;topicseen) but figured it would be less of a thread-jack if I posted here. 

 

Anyway my questions are:

 

Using the checksum windows program pointed at unraid, doesn't the program have to pull the entire file over before it can create the checksum?  Given that, won't it take a veeerrrrry long time vice creating the checksums natively on the server the first time it is run? 

 

Which of course brings me to my main question: is it possible to create the checksums directly on unraid the first time in such a way that the windows checksum app will be able to take over ongoing checksum creation and validation?  I just have this image of my PC and Array running full bore for days hammering my network trying to transfer all my files :o

Link to comment

Well right, but I'm with gary that the windows app seems to be a much more usable app in the long run.  But concern is the initial time to has my entire array.  And mine isn't even that big.  I guess my question, after doing a bit more research on md5deep, hashdeep, and checksum, is what would be the best way to go about using md5/hashdeep to make the initial pass such that I can then use checksum to check those hashes when needed and create new hashes as I add new files.

Link to comment

Using the checksum windows program pointed at unraid, doesn't the program have to pull the entire file over before it can create the checksum?  Given that, won't it take a veeerrrrry long time vice creating the checksums natively on the server the first time it is run? 

 

Yes, it indeed has to read the entire file to create the checksums.  With a Gb network, that's nearly as fast as if it was being done natively on the UnRAID box, but of course it also results in your client (e.g. Windows box) being busy for many hours while creating the checksums.    You can, of course, still use your Windows box with no problem during these computations [just don't shut it down or reboot  :) ].

 

 

is it possible to create the checksums directly on unraid the first time in such a way that the windows checksum app will be able to take over ongoing checksum creation and validation?

 

You can create and verify checksums on the UnRAID box using the Linux utility md5deep.  Note there is also a Windows version of md5deep ... but this would have the same "issue" you've already noted -- the Windows box would be busy for hours creating/verifying the checksums.

 

Note that once you've created your checksums, if you do another "Create checksums" on the same disk, it will prompt you that it's already found checksums -- you then click on "Synchronize" and it will only recompute them for folders that have changed.    If nothing's changed, it only takes a couple minutes to do the check.

 

I simply prefer doing everything from Windows -- and I really like the very simple interface of the Corz utility.

Link to comment

 

Yes, it indeed has to read the entire file to create the checksums.  With a Gb network, that's nearly as fast as if it was being done natively on the UnRAID box, but of course it also results in your client (e.g. Windows box) being busy for many hours while creating the checksums.    You can, of course, still use your Windows box with no problem during these computations [just don't shut it down or reboot  :) ].

 

I guess my concern is that with a saturated network and reads from any given drive I'll have problems streaming from Plex.  But I suppose you're correct that doing it natively from Unraid vs thru a gigabit network might result in about the same speed.  Not to mention that either way the drive being scanned will still be slammed with reads possibly causing streaming issues.  I suppose I won't know until I try [shrug].  I will also say, that my ability to reliably see 100+mB/s over my network is sketchy.  I seem to hover more around 80 even reading from my cache drive (640GB WD Black) which will surely impact overall hashing speed :-(

 

You can create and verify checksums on the UnRAID box using the Linux utility md5deep.  Note there is also a Windows version of md5deep ... but this would have the same "issue" you've already noted -- the Windows box would be busy for hours creating/verifying the checksums.

 

Note that once you've created your checksums, if you do another "Create checksums" on the same disk, it will prompt you that it's already found checksums -- you then click on "Synchronize" and it will only recomputed them for folders that have changed.    If nothing's changed, it only takes a couple minutes to do the check.

 

I simply prefer doing everything from Windows -- and I really like the very simple interface of the Corz utility.

 

Well yeah I too want to use Corz utility after the initial "slog".  When you say I'll be prompted that checksums have been found are you referring to Corz's utility or md5deep? Cause my ideal would be to use md5deep CLI following by Corz' windows utility.

 

Well at this point I probably just need to get my hands dirty so I can ask smarter questions :)  Thanks for the tips.

Link to comment

Try the Corz utility -- I doubt you'll ever switch  :)

 

By the way, I don't use it for an entire share ... I do the share "elements" one disk at a time.

 

i.e. I have a "DVDs" share ... but when I created the checksums I did them for \\Tower\Disk1\DVDs, then for \\Tower\Disk2\DVDs, ... until they were all done (14 disks), instead of \\Tower\DVDs (which would definitely taken a VERY long time).

 

There's really no reason to do "verify" on any regular basis -- you may want to do it once after they're all created just to confirm all's okay;  but after that it's only necessary if a parity check indicates some sync errors and you want to confirm they're not in the data;  or if a disk shows read errors you'll likely want to check the files on that disk.

 

Link to comment

Yeah I had no intent to do regular checks, just when (not if) I every have a failed parity check.  I've been running with scissors for too long IMO by not having checksums.

 

Well right so doing it piece wise will at least let me maybe run bits at a time at night and during the work day followed by "time off" while I might be streaming flow Plex in the evening.

 

Soo now you have me thinking more ... doing it your way you ended up with a single hash file for each disk right? Thinking more about it, I guess that is the best way to do it for the intended purpose of dealing with parity fails since you'd need to check the whole disk.  If I had hash files on a per folder basis would I have to check each folder in succession or can I tell checksum to check all folders?  Even if I could, would there be any reason to do that?

 

I guess at this point I'm asking more for an idea on best practice regardless of the application I use with the understanding that my main concern is dealing with a failed parity check and looking for the files that have gone bad. 

Link to comment

Actually doing it the way I do ends up with a hash file for each folder -- in my case, since I store all my DVDs in individual folders, that's one/DVD.    You can configure Checksum to keep one hash file for the entire disk, but I prefer the default one/folder setup.    Either way will work fine.

 

You can still create/verify the entire disk with a single command => you just right-click on a disk [e.g. \\Tower\disk1]  or the part of the disk you want to work on [\\Tower\disk1\DVDs] and it will process all of the folders it finds.

 

Link to comment

Ahhh ok I see.  And it will work the same way then I take it when verifying?

 

Yes, you can easily verify a single folder, an entire disk, or even an entire share ... simply by selecting what you want; right-clicking; and selecting "verify checksums".

 

The only time it really took a LONG time (in my case ~ a week) was when I first started using it and had to generate the checksums for all of he content on my disks (I did this not only for my arrays, but also for all of my backup disks -- I actually used 3 computers to do this so I could process several disks at a time).

 

Link to comment

:o yeah 3 weeks of my desktop being awake 24/7 and my network being hammered is what I am trying to avoid by generating the hashes on unraid natively.

 

Well after last nights' fiasco I've been promised a night to "catch up on doing nothing" which in my case means getting to hack around on the system :)

 

Link to comment

;-) yeah I know you're probably right.  Like I said, I need to start playing around to really know what I'm dealing with here.  I just didn't relish pushing 3.5TB of data over my network, but hey its fire and forget anyway right.  And really if I think about it, I can probably skip one of my largest directories since the data is, replaceable, disposable, and slightly fault tolerant.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.