Checksum Suite


Squid

Recommended Posts

I will probably go back to checksums only once dual parity is available.

 

Dual parity won't help in the case you described => if the files are corrupted, the BOTH parity disks will have been updated to reflect the corruption.  You'd still need some way of correcting the corrupted data.

 

The corruption happened because a 2nd disk had read errors during a rebuild of another failed disk, with dual parity I could disable the 2nd disk and rebuild both from parity, your am I wrong?

Link to comment

In that case it would indeed be okay => since the source of the corruption was a 2nd bad disk.  You wouldn't even have to disable it ... the data should automatically be corrected from the parity information when a read error is detected.    But the concept is as I noted -- if data is corrupted for some other reason, both parity drives would reflect the corruption.

 

 

Link to comment

@Squid:

Is it correct, that if a verification pass is missed, it won't be executed until the next execution time is reached? My server is sleeping or powered down most of the time. I would really like if the plugin could check if verification passes have been missed and execute them in the sheduled order as soon as the server is running again. An user option to toggle such behaviour would be much appreciated. :)

Link to comment

@Squid:

Is it correct, that if a verification pass is missed, it won't be executed until the next execution time is reached? My server is sleeping or powered down most of the time. I would really like if the plugin could check if verification passes have been missed and execute them in the sheduled order as soon as the server is running again. An user option to toggle such behaviour would be much appreciated. :)

That's correct.  The scheduled verifications (% of drive / share) runs through a cron job.  There currently is no built in method to have it reschedule a missed time.

 

TBH, After I got this plugin to the point of where it had the features that I was interested in, and did what I wanted it to do, I more or less lost interest in further updates to this.  (It just wasn't as much fun to me as CA is)

 

I am very grateful that Bonienl is working on a GUI for bunker which should do everything and more and be far more polished than this plugin is.  (Even though I don't particularly agree with his approach even though it is sound).  Once bonienl's plugin is out of WIP progress stage, I was actually going to deprecate this plugin in favor of his (even though for my purposes I would actually continue to use it)

 

Link to comment

I'm getting this in my logs by the dozen or so daily.  Anyone know how to correct?  I have six shares set up the same way yet only this one is giving this message in the log.

 

crond[1476]: failed parsing crontab for user root: /usr/local/emhttp/plugins/checksum/scripts/checksumShare.php "/mnt/user/Pictures" &>/dev/null 2>&1

Link to comment

(Even though I don't particularly agree with his approach even though it is sound).

 

I think this is moving in the right direction and am greatful that both you and Bonienl's work on this.

 

I'm assuming what you don't agree with is the approach of putting the Checksum info into Extended Attributes instead of creating a .hash (or whatever...) file?

 

Also if your looking for the next cool new project to start... maybe consider a Rsync GUI which helps make the unRAID to unRAID (or other Rsync Clients) easier... lot's of discussion about that right now in the General Support forum... people really don't like that they have to create there own scripts and dig into command line to create backups....

Link to comment

Also if your looking for the next cool new project to start... maybe consider a Rsync GUI which helps make the unRAID to unRAID (or other Rsync Clients) easier... lot's of discussion about that right now in the General Support forum... people really don't like that they have to create there own scripts and dig into command line to create backups....

Sparklyballs' WebSync docker might fit the bill.
Link to comment

Good point. I'll have to look into that.

 

Edit: Looking into it... first question, why is a Gui for a tool built into unRAID a docker...?

 

Edit2: Kept looking and figured it out, it's based on a WebGui designed by someone else... that makes more sense.

 

Edit3: Looks like development has stalled for a few months.

Link to comment

Good point. I'll have to look into that.

 

Edit: Looking into it... first question, why is a Gui for a tool built into unRAID a docker...?

 

Edit2: Kept looking and figured it out, it's based on a WebGui designed by someone else... that makes more sense.

 

Edit3: Looks like development has stalled for a few months.

It's a beta for a possible ls.io version.  Just pester sparklyballs about it.

Link to comment

I have a  problem with duplicate hash values, one will show corruption for a file, whereas another hash value will show that the same file is intact?

 

Prior to your plugin, I manually created MD5 hash values, for each file, using the corz utility

 

On installing and running your plugin, setting it  again to create a MD5 hash value per file, I assume that it would monitor the exist hash values instead of creating new ones.

 

Largely this is the case, but I have got situations where the orginal hash value is shown, and another hash value has been created by your plugin. In some cases verifying both hash values will report that the file is intact, but in some other cases the original hash value reports that the file is corrupt, but the new hash value reports that the file is intact?

 

For example I have a mp3 file called Test, the orginal hash value is Test.md5 and the new hash value is Test.mp3.md5

 

In all cases where I have two hash values for a file, the original hash value is *.md5 and the newly crreated hash value *.file extension.md5

 

I am concerned that on running the Verification Tool, I don't know it is looks at the  old hash value, new value or both. As such it could to reporting that a file is corrupt when it is not.

 

I guess that I could just delete all MD5 files and start again, especially in light of Bunker GUI plugin?

Link to comment

I have a  problem with duplicate hash values, one will show corruption for a file, whereas another hash value will show that the same file is intact?

 

Prior to your plugin, I manually created MD5 hash values, for each file, using the corz utility

 

On installing and running your plugin, setting it  again to create a MD5 hash value per file, I assume that it would monitor the exist hash values instead of creating new ones.

 

Largely this is the case, but I have got situations where the orginal hash value is shown, and another hash value has been created by your plugin. In some cases verifying both hash values will report that the file is intact, but in some other cases the original hash value reports that the file is corrupt, but the new hash value reports that the file is intact?

 

For example I have a mp3 file called Test, the orginal hash value is Test.md5 and the new hash value is Test.mp3.md5

 

In all cases where I have two hash values for a file, the original hash value is *.md5 and the newly crreated hash value *.file extension.md5

 

I am concerned that on running the Verification Tool, I don't know it is looks at the  old hash value, new value or both. As such it could to reporting that a file is corrupt when it is not.

 

I guess that I could just delete all MD5 files and start again, especially in light of Bunker GUI plugin?

If the original hash no longer matches the file, then the file has changed since that hash, whether corruption or something else. If the file still plays OK I suspect something has updated the tags in the mp3. I'll let Squid answer the question of which hash his verification would use, but I suspect it would use the one that follows his naming convention; i.e., *.file_extension.md5
Link to comment

Having done some more detective work, it appears that all files have had MD5 hash values created in the format *.file_extension.md5 by the plugin regardless if they had hash values in the format *.md5 already or not.

 

The plugin was installed in December, so files after this date are OK, but files before this date have two MD5 hash values

 

Moving forward it appears that hash values in the format *.file_extension.md5 are going to be used both in Squids plugin and the competing Bunker plugin too.

 

How am I going to delete hash values in the old *.md5 whilst leaving the newer hash values instead, automatically too as I have 28TB of files!!!

 

Or am I best deleting all md5 hash values and then letting the plugin recalculte them again?

Link to comment

I am very grateful that Bonienl is working on a GUI for bunker which should do everything and more and be far more polished than this plugin is.  (Even though I don't particularly agree with his approach even though it is sound).  Once bonienl's plugin is out of WIP progress stage, I was actually going to deprecate this plugin in favor of his (even though for my purposes I would actually continue to use it)

 

I don't see the plugins as replacements of each other. We both have our own approach and functionalities. Would be a shame if you abandon your hard work completely, there will be folks in favour of either approach, and yours can also do par2 for error restoration. I guess what I want to say is eventhough you want to put development on a low, still your plugin can be alive and kicking!

Link to comment

I am very grateful that Bonienl is working on a GUI for bunker which should do everything and more and be far more polished than this plugin is.  (Even though I don't particularly agree with his approach even though it is sound).  Once bonienl's plugin is out of WIP progress stage, I was actually going to deprecate this plugin in favor of his (even though for my purposes I would actually continue to use it)

 

I don't see the plugins as replacements of each other. We both have our own approach and functionalities. Would be a shame if you abandon your hard work completely, there will be folks in favour of either approach, and yours can also do par2 for error restoration. I guess what I want to say is eventhough you want to put development on a low, still your plugin can be alive and kicking!

Its was the par2 that really took the fun out of it (especially since I couldn't justify it for my own needs).  Beyond that, the md5 / sha / b2s still is operational and still will remain so, if only because I personally prefer my approach with the separate checksum files that are portable as you copy the files from one medium to another, etc.  But, I think that its a fair assumption that your interface will be far more polished than mine.

 

But, as requested, here is the format of the hash files if you wish to import them.  Basically a standard checksum format with comments to allow corz to interpret

 

#Squid's Checksum
#
#md5#30-01-10_1239.jpg#[email protected]:26
a75d06dceea03c4a4a98690db387a176  30-01-10_1239.jpg
#md5#30-01-10_1240.jpg#[email protected]:26
a3de82ba450a3ceae1c19f223954b768  30-01-10_1240.jpg
#md5#30-01-10_1305.jpg#[email protected]:26
ac98037b5e2473a66d00eaf1bffcbeeb  30-01-10_1305.jpg
#md5#30-01-10_1340.jpg#[email protected]:26
bff394ccda3d3dd663cb2b53f2b0d485  30-01-10_1340.jpg
#md5#30-01-10_2007.jpg#[email protected]:26
155027556f2179913f3b20a9baca76ee  30-01-10_2007.jpg
#md5#30-01-10_2331.jpg#[email protected]:26
a6d9ffb589bbe70f1e16c2869af4b0de  30-01-10_2331.jpg
#md5#31-01-10_0945.jpg#[email protected]:26
ca669c4f2e543da2465b3239f8c02be6  31-01-10_0945.jpg

 

Comment line is algorithm (md5 | sha | blake2) followed by filename followed by mtime of the file itself ** (in corz' obscure format)

 

sha as an identifier is actually sha256.  blake2 as an identifier is actually blake2s

 

No "*" on the filenames to indicate a binary file.  (Can't remember right now, but one of the checkers that I tested had issues with the * (and its pretty much deprecated nowadays anyways - everything assumes its binary anyways)

 

No paths included anywhere on the filenames - everything is relative for portability.

 

** My mtime matches the file exactly.  But, if you do allow imports of preexisting hashes, then be aware that corz has 2 bugs with the mtime:  It can be out by +/- 1 second, and it can be also out by exactly +/- 1 hour (bug in corz due to day light savings)

Link to comment

Thanks for your reply

 

The file format for the md5 hash values was file_name.md5 using solely corz on a manual basis. Listed below are the settings chosen in the corz setting ini file

 

; Individual hashes..    [bool]              [default: individual_hashes=false]

;

; command-line switch: i

;

; Instruct checksum to *always* create individual hash files, even when working

; with folders. This is the same as passing "i" in your switches. Most useful

; when combined with file mask groups, e.g. crim(movies). Most people will want

; to leave this set to false.

;

individual_hashes=true

 

; Algorithm..                        [string: md5/sha1]  [default: algorithm=md5]

;

; command-line switch: s    (use sha1)

; command-line switch: 2    (use BLAKE2)

;

; Which algorithm to use when creating hashes?

; Choices are currently "md5", "sha1" or "blake2" (no quotes).

;

algorithm=md5

;

 

The following setting looks like I could use it to unify all hash values to the same format i.e. *.file_extension.md5

 

; Add file extensions?                  [bool]  [default: file_extensions=false]

;

; command-line switch: e

;

; This is for creation of individual per-file checksums only. For example,

; if you create a checksum of the file "foo.avi", the checksum will be named

; either "foo.hash or "foo.avi.hash, the former being the default (false).

;

; Setting this to false, as well as being more elegant, enables checksum to

; store hashes for "foo.txt" and "foo.nfo", (if such files exist next to

; "foo.avi") all inside "foo.hash", which is neat.

;

file_extensions=false

;

; NOTE: if checksum encounters existing checksum files *not* using your

; current file_extensions setting, it will rename them to match, so in our

; example, if "foo.avi.md5" existed, it would be renamed to "foo.md5", and

; any other foo.files added to that single checksum file.

 

I appreciate that Checksum Suite and Bunker GUI are two different approaches, at the moment I am conerned just to get all my md5 hash files in the same format

 

Link to comment

I have now found some files which have only the original file_name.md5 hash values just to complicate matters

 

I have found that if I set the switch in the attached option to true, and then manually chose Create Hash Values using the corze utility, it will sucessfully rename file_name.md5 hashes to file_name.file_extension.md5 hashes

 

The following setting looks like I could use it to unify all hash values to the same format i.e. file_name.file_extension.md5

 

; Add file extensions?                  [bool]  [default: file_extensions=false]

;

; command-line switch: e

;

; This is for creation of individual per-file checksums only. For example,

; if you create a checksum of the file "foo.avi", the checksum will be named

; either "foo.hash or "foo.avi.hash, the former being the default (false).

;

; Setting this to false, as well as being more elegant, enables checksum to

; store hashes for "foo.txt" and "foo.nfo", (if such files exist next to

; "foo.avi") all inside "foo.hash", which is neat.

;

file_extensions=false

;

; NOTE: if checksum encounters existing checksum files *not* using your

; current file_extensions setting, it will rename them to match, so in our

; example, if "foo.avi.md5" existed, it would be renamed to "foo.md5", and

; any other foo.files added to that single checksum file.

 

I am trying it out on one share

 

This could be a solutuion, but say I have a file with 2 hashes, one in file_name.md5 and one in file_name.file_extension.md5, the above menthod would result in 2 file_name.file_extension.md5 hashes named the same.

Link to comment

Just copied some a file into a folder where I have 2 checksums

 

Folder contains

 

The Circus.wma

The Circus.md5 (How the checksum was created orginally)

The Circus.wma.md5 (How the checksum was subsequnently created by the plugin)

 

If I run using corz manually using the create file name switch as true, because it sees The Circus.wma and The Circus.wma.md5, but it will not delete the The Circus.md5 file

 

If the folder contains just

 

The Circus.wma

The Circus.md5

 

On running create checksum manually, the corz programe will rename The Circus.md5 to The Circus.wma.md5

 

So I seem to have some files with file_name.md5 checksums. some with file_name.file_extension.md5 checksums and some with both checksums

 

Where both exist, in some cases both report the file as intact and in some other cases one checksum will report filew is infact and the other checksum as file corrupted. Where files are shown as corupted, they are still playable

 

I seem to have a right mess here, and with the benefit of hindsight perhaps I should have deleted all *.md5 files before installing the plugin.

 

What would be the best way forward, delete all *.md5 files and start again?

 

 

Link to comment

I'd just delete *.md5 and then redo your checksums.

 

Not the quickest way in terms of how long it will take, but certainly the quickest in terms of "human time" => just a one-line command followed by initiating the checksum computations.    MANY hours later it will be done ... but you don't have to do anything while that's happening.

 

Link to comment

What are the key differences (with respect to reliability, outcomes, speed and overheads) between this Plugin and the Dynamix File Integrity plugin created by bonienl?

 

I can see that clearly this plugin allows for recovery via Par2 but I'd like to know if there are any more big differences. If it is only ability for file recovery I can already do that from my Backup and visa versa. I find it unlikely that both my Main and Backup servers would experience rot on the same file at the same time.

 

Horses for Courses and I am just looking at the benefits of one over the other?

Link to comment
  • Squid locked this topic
  • Squid unlocked this topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.