Checksum Suite


Squid

Recommended Posts

How do you cancel manual Checksum Creation in progress?

It's changing the modification dates of my files!

Changing the modification dates of your files? Or just the modification dates of the folders?

 

The folders are modified by writing the .hash file for the folder.

Creating a checksum does not change the date of the file.

Certainly hasn't on my system. Just folder dates when the .hash is written. I expect that is what he was seeing.
Link to comment

Updated to 2015.11.04

 

Added in scheduled cron jobs for creation and verification jobs.

 

All verification jobs have a % to verify.  Each this is the amount that will be verified each time the job runs.  EG: if its set to 5%, the first time the jobs runs it will verify 0-4% of the share.  The next time it runs it will verify 5-9%, and so on.  The oldest files on the drive are always verified first.

 

 

Very cool idea!

Thanks, I don't really see the need to be forced to always run full checksums.  Myself, I'd rather run them a bit at a time every week or so over the course of 6 months or a year than once a year, and I think that running a full verification once a month is a bit overkill even for the most paranoid among us.

 

If files are moved or a drive has been rebuilt, someone will probably want to do a full verification.

Personally, I intend to do a full verification per drive for files not verified within a certain number of weeks, but only as much as can be done within a 6 hour window or until a certain clock time.

i.e. start at midnight and verify as much as can be done until 6am when I start working on the server.

 

When saying oldest files first..  is that based on file modification time or the last verification time?

Link to comment

Updated to 2015.11.04

 

Added in scheduled cron jobs for creation and verification jobs.

 

All verification jobs have a % to verify.  Each this is the amount that will be verified each time the job runs.  EG: if its set to 5%, the first time the jobs runs it will verify 0-4% of the share.  The next time it runs it will verify 5-9%, and so on.  The oldest files on the drive are always verified first.

 

 

Very cool idea!

Thanks, I don't really see the need to be forced to always run full checksums.  Myself, I'd rather run them a bit at a time every week or so over the course of 6 months or a year than once a year, and I think that running a full verification once a month is a bit overkill even for the most paranoid among us.

 

If files are moved or a drive has been rebuilt, someone will probably want to do a full verification.

Personally, I intend to do a full verification per drive for files not verified within a certain number of weeks, but only as much as can be done within a 6 hour window or until a certain clock time.

i.e. start at midnight and verify as much as can be done until 6am when I start working on the server.

Interesting idea.  Will add it to the white board.

When saying oldest files first..  is that based on file modification time or the last verification time?

File Modification Time.  Of course this all assumes that a checksum has been created for the file(s) in question.  There may be an overlap of a file or two between passes due to rounding errors, etc in the math, but I'd rather the plug verify file A-H on pass one and verify files H-O on pass #2 than miss a file.

 

It currently only issues notifications (email / display) on failures, but was thinking that it would be a good idea to also issue notifications on success.  Also, in the schedule sections, if the last checked time (and %) is green then the last pass was good.  If its red, then one or more files failed, and the Failure Log will show the details.

 

Additionally, the scheduling is rather rudimentary, only offering Daily (time of day), Weekly (day of week, time of day), Monthly (day of month, time of day), Yearly (month, day of month, time of day) to keep things simple.  For advanced scheduling (eg: bi-weekly, or something like the last Saturday of the month), you can enter in a custom cron entry

Link to comment

No it's not folder dates... It's a file type "sparsebundle". Definitely changed the mod dates.

The "sparseimage" files and other file type are unchanged.

 

Well I also did something risky.. I stopped the array..

Had trouble restarting but I did a reboot an all seems well.

Then had to go in a delete all the md5 type files created up to that point.

I wish the md5 files could be stored in a separate folder and not the folders being checked since the mod dates also change on those.

Also I wanted to check just disk 1 as a test but it also checked and changed dates on disk 2.

About to do a parity check.

Link to comment

...

It currently only issues notifications (email / display) on failures, but was thinking that it would be a good idea to also issue notifications on success...

Success notifications should probably be a checkbox. I personally like getting success emails from all my various automated processes but others might rather not.
Link to comment

...

It currently only issues notifications (email / display) on failures, but was thinking that it would be a good idea to also issue notifications on success...

Success notifications should probably be a checkbox. I personally like getting success emails from all my various automated processes but others might rather not.

Agree
Link to comment

Not sure what you mean here.

 

I'm referring to my reply 4 up from here.

Set that file type as an excluded file.  But in test after test, this plug has never modified the date (or the file itself) of a file being hashed.

 

 

Storing the checksum's in a separate folder is low on the ultimate feature list (but it is there)

Link to comment

...

It currently only issues notifications (email / display) on failures, but was thinking that it would be a good idea to also issue notifications on success...

Success notifications should probably be a checkbox. I personally like getting success emails from all my various automated processes but others might rather not.

Ask and ye shall receive.  Updated to 2015.11.04a
Link to comment

I wish the md5 files could be stored in a separate folder and not the folders being checked since the mod dates also change on those.

But thinking further about storing hashes in a separate folder introduces another problem.  At that point, you are pretty much stuck with using absolute paths within the hash files (eg: /mnt/user/movieshare/movieA/moviefile.mkv)

 

So now you can't easily verify the files if / when you copy them to a removable device if you share it with a buddy.  You can't move the file from one folder to another without having to recalculate the hash files.  You can't easily (but not impossible) do disk verifications, etc.

 

 

Link to comment

No it's not folder dates... It's a file type "sparsebundle". Definitely changed the mod dates.

Quick and dirty research here:  a sparsebundle file its displayed on your Mac as a file is actually a folder.  Hence why its changing the dates.  Not much I can do about that if Apple decides in its infinite wisdom to create a file type that is actually a folder.  Your only option in that case if you can't live with the modification time of the folder being changed if the files contained within get updated is to not monitor / create checksums on that particular share.

 

http://blog.fosketts.net/2015/07/22/how-to-use-mac-os-x-sparse-bundle-disk-images/

 

Excerpt:

 

It consists of a “bundle” in Mac OS X parlance, which is a directory which is treated as a single file by Finder.

 

But the bigger question is did the contents of the bundle get corrupted because of the extra file (.hash) being stored within it?  If it didn't, then its not a big deal, and just the time changed.  If it did corrupt it, then I'd have to investigate how to automatically exclude a sparse bundle.

 

Link to comment

I wish the md5 files could be stored in a separate folder and not the folders being checked since the mod dates also change on those.

But thinking further about storing hashes in a separate folder introduces another problem.  At that point, you are pretty much stuck with using absolute paths within the hash files (eg: /mnt/user/movieshare/movieA/moviefile.mkv)

 

So now you can't easily verify the files if / when you copy them to a removable device if you share it with a buddy.  You can't move the file from one folder to another without having to recalculate the hash files.  You can't easily (but not impossible) do disk verifications, etc.

Just some food for thought.

An idea might be to store the hash and hash/verify time into the extended attributes like bunker and bitrot, then export them into the current folder.hash or wherever the user chooses.

or potentially use a central location.

 

I'm using a .c program to scan the whole filesystem and store data into a .gdbm.

I do this because it's extremely fast once the file system is cached.

It double duties as a dircache clone. (and a file list caching utility).

 

I scan the file system with my gdbmsum keeping stat blocks in ram, and matching against the stat block stored in the gdbm sum.

For most purposes, mtime and size are all that's needed.

 

I store the whole binary stat block because I'm lazy and a binary match of stat struct to stat struct is fast.

When the stat block changes, I calculate the md5 since the chances are the file is already in the buffer cache.

(I'm still debating on doing it immediately, wait until mtime > some age, doing it batch overnight and/or double check if file is open).

 

From there I can now export the data as one large md5sum file or pipe it to grep and/or md5sum -c

 

What I plan to add is the storage into the extended attributes and then have some other tool to traverse the directories and pull out the data to create the folder.hash files.

 

In my particular case, I'm using the GBDM format as it's extremely fast.

I can traverse a filesystem containing over 250,000 files in 2 seconds with lookups of each file into the GDBM.

 

i.e.

276469 items processed in 2 seconds, 256582 fetches, 256582 existing, 0 non-existing, 0 stores, 0 deletes, 0 errors

sync completed in 0 seconds

operation completed in 2 seconds

 

This allows me to dir/inode cache the filesystem and also cache the stat block in the GDBM file for changes.

Then append the md5 inline or via some other batch operation.

 

Another example

131990 items processed in 1 seconds, 125690 fetches, 125690 existing, 0 non-existing, 0 stores, 0 deletes, 0 errors
134716 items processed in 1 seconds, 128316 fetches, 128316 existing, 0 non-existing, 0 stores, 0 deletes, 0 errors
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/11 Electric Rock Reef - Summer Prayer (A B C - Elements and Shadows Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/05 ZigZag to Paradies - Speed Down (Remixed in Budapest Hotel Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/13 Lover Banks - Hideaway (Cassette Sunset Ibiza Retro Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/04 Unchained Bars - Sleepless Eyes (Beyond the Beach Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/07 Bingo Bus - Passenger of a Dream (Sky Full Of Dance Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/09 Non-Stop Listening - Nature Is Calling (Sexy Summer Session Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/01 Chimichanka - Whisper of the Ocean (Never Felt So Good Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/02 Backyard Players - Light the Darkness (Dance to the Limit Version).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/03 Ocean View Suite - Down by the Pier (Sigma Sound Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/14 Timejumpers - Follow Rivers (Breeze of Chill Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/folder.jpg
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/16 Mazed Emotions - Everybody Can Be Free (Turn This Beat Around Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/12 Moderate Jungle - The Sun (Shatter Me at Midnight Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/15 Glitch and Wet - Salt on My Skin (Jealous No More Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/06 Waterfront Lounge - Stones Against Water (Ibiza Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/08 Cliffside Chiller - Sunny Beach Days (Beat Drops Out Edit).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/10 Ambient Therapy - Hear a Whisper (Boys Relaxing At the Disco Cut).mp3
137044 items processed in 3 seconds, 130561 fetches, 130544 existing, 0 non-existing, 17 stores, 0 deletes, 0 errors
....
276487 items processed in 3 seconds, 256599 fetches, 256582 existing, 0 non-existing, 17 stores, 0 deletes, 0 errors
sync completed in 1 seconds
operation completed in 4 seconds

 

I'm not suggesting any changes, just presenting some food for thought.

 

It might be worthwhile to consider a central DB/GDBM, then export a folder.hash files after processing.

The downside of gdbm is concurrency, Only 1 writer is allowed. Multiple readers are allowed.

Which is why I had been working on an SQLite variant. But that comes at the cost of speed and size.

 

For me, my needs are much larger. I have so many files I need to export a file list nightly in case I want to search for something.

For one of my disks it takes over an hour just to walk the file tree of 750,000 files

The gdbm can be exported easily into a centralized filelist.

for 250,000 files the overhead is -

rw-rw-r-- 1 root root 106398151 2015-11-04 17:08 /mnt/disk3/filedb/disk3.md5sum.gdbm

with 133 bytes + stat struct + time() as a record.

 

exporting this file format is very fast as in.

root@unRAID:/home/rcotrone/src.slacky/gdbmsum-work# time ./gdbmsum /mnt/disk3/filedb/disk3.md5sum.gdbm |wc -l
256620

real    0m0.855s
user    0m0.650s
sys     0m0.310s

# time ./gdbmsum /mnt/disk3/filedb/disk3.md5sum.gdbm | grep 'Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds' | md5sum -c 
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/02 Backyard Players - Light the Darkness (Dance to the Limit Version).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/folder.jpg: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/05 ZigZag to Paradies - Speed Down (Remixed in Budapest Hotel Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/10 Ambient Therapy - Hear a Whisper (Boys Relaxing At the Disco Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/07 Bingo Bus - Passenger of a Dream (Sky Full Of Dance Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/14 Timejumpers - Follow Rivers (Breeze of Chill Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/09 Non-Stop Listening - Nature Is Calling (Sexy Summer Session Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/13 Lover Banks - Hideaway (Cassette Sunset Ibiza Retro Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/04 Unchained Bars - Sleepless Eyes (Beyond the Beach Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/06 Waterfront Lounge - Stones Against Water (Ibiza Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/01 Chimichanka - Whisper of the Ocean (Never Felt So Good Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/08 Cliffside Chiller - Sunny Beach Days (Beat Drops Out Edit).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/03 Ocean View Suite - Down by the Pier (Sigma Sound Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/11 Electric Rock Reef - Summer Prayer (A B C - Elements and Shadows Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/16 Mazed Emotions - Everybody Can Be Free (Turn This Beat Around Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/15 Glitch and Wet - Salt on My Skin (Jealous No More Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/12 Moderate Jungle - The Sun (Shatter Me at Midnight Mix).mp3: OK

real    0m1.386s
user    0m2.080s
sys     0m0.250s

 

Just some ideas you may want to explore.

Link to comment

I wish the md5 files could be stored in a separate folder and not the folders being checked since the mod dates also change on those.

But thinking further about storing hashes in a separate folder introduces another problem.  At that point, you are pretty much stuck with using absolute paths within the hash files (eg: /mnt/user/movieshare/movieA/moviefile.mkv)

 

So now you can't easily verify the files if / when you copy them to a removable device if you share it with a buddy.  You can't move the file from one folder to another without having to recalculate the hash files.  You can't easily (but not impossible) do disk verifications, etc.

 

and to present more food for thought... another idea for keeping the folder.hash file out of the directory...

 

If you create a md5 of the full path of the folder.hash file, it can be used as filename stored in some directory database.

Then in the source directory, create a symlink to the md5 named folder.hash.

ie.

 

/mnt/disk3/movies/somemovie/folder.hash -> /mnt/cache/filedb/786f8e4beaa1bbb0577ae0cd3638ecd6.hash

You can even store the path as comment inside the 786f8e4beaa1bbb0577ae0cd3638ecd6.hash if need be.

 

Unfortunately, this still doesn't get by changing the mtime of the /mnt/disk3/movies/somemovie at least once.

But it does let you store the folder.hash elsewhere in case of corruption and copy the folder.hash within the directory when backing up.

 

Downside would be... allot of files in one large directory unless you prefixed it somehow by sharename or disk name.

 

I originally thought of this idea as people had stated they did not want a bunch of files littered around their file system.

My future goal is to do this with a folder.par2 for verification and/or reconstruction of a corrupt file.

 

Folder.hash is great to know something is wrong, but you'll have to go to a backup.

with a folder.par2, you can detect and fix small errors in place.

Link to comment

and s'more food for thought on storing the path in the folder.hash file.

 

This grabs the md5 into a bash array so it can be utilized further.

 

root@unRAID:/mnt/disk3/filedb# declare -a MD5=( $(md5sum disk3.md5sum.gdbm) )

root@unRAID:/mnt/disk3/filedb# echo $MD5

16fea9e414edd86ef4c63356951d4378

 

This turns the full path into a unique hash value.

root@unRAID:/mnt/disk3/filedb# declare -a MD5PATH=( $(echo -e $PWD\c | md5sum) )

root@unRAID:/mnt/disk3/filedb# echo $MD5PATH

177b1c5dba856f67850883f7b265fe9b

root@unRAID:/mnt/disk3/filedb# echo "# path: $PWD " > /tmp/${MD5PATH[0]}.hash

 

root@unRAID:/mnt/disk3/filedb# cat /tmp/${MD5PATH[0]}.hash

# path: /mnt/disk3/filedb

 

root@unRAID:/mnt/disk3/filedb# set -x

 

root@unRAID:/mnt/disk3/filedb# cat /tmp/${MD5PATH[0]}.hash

+ cat /tmp/177b1c5dba856f67850883f7b265fe9b.hash

# path: /mnt/disk3/filedb

 

This is used as an example to stuff the hash into a extended attribute.

it's only purpose is to provide food for thought and an example of the export.

root@unRAID:/mnt/disk3/filedb# setfattr -n user.hash -v $MD5 disk3.md5sum.gdbm

root@unRAID:/mnt/disk3/filedb# getfattr -d disk3.md5sum.gdbm

# file: disk3.md5sum.gdbm

user.hash="16fea9e414edd86ef4c63356951d4378"

 

Example of the hash named md5 folder.hash value.With path stored inside it.

root@unRAID:/mnt/disk3/filedb# cat /tmp/177b1c5dba856f67850883f7b265fe9b.hash

# path: /mnt/disk3/filedb

16fea9e414edd86ef4c63356951d4378  disk3.md5sum.gdbm

 

root@unRAID:/mnt/disk3/filedb# pwd

/mnt/disk3/filedb

root@unRAID:/mnt/disk3/filedb# md5sum -c /tmp/177b1c5dba856f67850883f7b265fe9b.hash

disk3.md5sum.gdbm: OK

Link to comment

I wish the md5 files could be stored in a separate folder and not the folders being checked since the mod dates also change on those.

But thinking further about storing hashes in a separate folder introduces another problem.  At that point, you are pretty much stuck with using absolute paths within the hash files (eg: /mnt/user/movieshare/movieA/moviefile.mkv)

 

So now you can't easily verify the files if / when you copy them to a removable device if you share it with a buddy.  You can't move the file from one folder to another without having to recalculate the hash files.  You can't easily (but not impossible) do disk verifications, etc.

Just some food for thought.

An idea might be to store the hash and hash/verify time into the extended attributes like bunker and bitrot, then export them into the current folder.hash or wherever the user chooses.

or potentially use a central location.

 

I'm using a .c program to scan the whole filesystem and store data into a .gdbm.

I do this because it's extremely fast once the file system is cached.

It double duties as a dircache clone. (and a file list caching utility).

 

I scan the file system with my gdbmsum keeping stat blocks in ram, and matching against the stat block stored in the gdbm sum.

For most purposes, mtime and size are all that's needed.

 

I store the whole binary stat block because I'm lazy and a binary match of stat struct to stat struct is fast.

When the stat block changes, I calculate the md5 since the chances are the file is already in the buffer cache.

(I'm still debating on doing it immediately, wait until mtime > some age, doing it batch overnight and/or double check if file is open).

 

From there I can now export the data as one large md5sum file or pipe it to grep and/or md5sum -c

 

What I plan to add is the storage into the extended attributes and then have some other tool to traverse the directories and pull out the data to create the folder.hash files.

 

In my particular case, I'm using the GBDM format as it's extremely fast.

I can traverse a filesystem containing over 250,000 files in 2 seconds with lookups of each file into the GDBM.

 

i.e.

276469 items processed in 2 seconds, 256582 fetches, 256582 existing, 0 non-existing, 0 stores, 0 deletes, 0 errors

sync completed in 0 seconds

operation completed in 2 seconds

 

This allows me to dir/inode cache the filesystem and also cache the stat block in the GDBM file for changes.

Then append the md5 inline or via some other batch operation.

 

Another example

131990 items processed in 1 seconds, 125690 fetches, 125690 existing, 0 non-existing, 0 stores, 0 deletes, 0 errors
134716 items processed in 1 seconds, 128316 fetches, 128316 existing, 0 non-existing, 0 stores, 0 deletes, 0 errors
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/11 Electric Rock Reef - Summer Prayer (A B C - Elements and Shadows Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/05 ZigZag to Paradies - Speed Down (Remixed in Budapest Hotel Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/13 Lover Banks - Hideaway (Cassette Sunset Ibiza Retro Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/04 Unchained Bars - Sleepless Eyes (Beyond the Beach Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/07 Bingo Bus - Passenger of a Dream (Sky Full Of Dance Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/09 Non-Stop Listening - Nature Is Calling (Sexy Summer Session Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/01 Chimichanka - Whisper of the Ocean (Never Felt So Good Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/02 Backyard Players - Light the Darkness (Dance to the Limit Version).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/03 Ocean View Suite - Down by the Pier (Sigma Sound Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/14 Timejumpers - Follow Rivers (Breeze of Chill Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/folder.jpg
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/16 Mazed Emotions - Everybody Can Be Free (Turn This Beat Around Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/12 Moderate Jungle - The Sun (Shatter Me at Midnight Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/15 Glitch and Wet - Salt on My Skin (Jealous No More Cut).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/06 Waterfront Lounge - Stones Against Water (Ibiza Mix).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/08 Cliffside Chiller - Sunny Beach Days (Beat Drops Out Edit).mp3
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/10 Ambient Therapy - Hear a Whisper (Boys Relaxing At the Disco Cut).mp3
137044 items processed in 3 seconds, 130561 fetches, 130544 existing, 0 non-existing, 17 stores, 0 deletes, 0 errors
....
276487 items processed in 3 seconds, 256599 fetches, 256582 existing, 0 non-existing, 17 stores, 0 deletes, 0 errors
sync completed in 1 seconds
operation completed in 4 seconds

 

I'm not suggesting any changes, just presenting some food for thought.

 

It might be worthwhile to consider a central DB/GDBM, then export a folder.hash files after processing.

The downside of gdbm is concurrency, Only 1 writer is allowed. Multiple readers are allowed.

Which is why I had been working on an SQLite variant. But that comes at the cost of speed and size.

 

For me, my needs are much larger. I have so many files I need to export a file list nightly in case I want to search for something.

For one of my disks it takes over an hour just to walk the file tree of 750,000 files

The gdbm can be exported easily into a centralized filelist.

for 250,000 files the overhead is -

rw-rw-r-- 1 root root 106398151 2015-11-04 17:08 /mnt/disk3/filedb/disk3.md5sum.gdbm

with 133 bytes + stat struct + time() as a record.

 

exporting this file format is very fast as in.

root@unRAID:/home/rcotrone/src.slacky/gdbmsum-work# time ./gdbmsum /mnt/disk3/filedb/disk3.md5sum.gdbm |wc -l
256620

real    0m0.855s
user    0m0.650s
sys     0m0.310s

# time ./gdbmsum /mnt/disk3/filedb/disk3.md5sum.gdbm | grep 'Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds' | md5sum -c 
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/02 Backyard Players - Light the Darkness (Dance to the Limit Version).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/folder.jpg: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/05 ZigZag to Paradies - Speed Down (Remixed in Budapest Hotel Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/10 Ambient Therapy - Hear a Whisper (Boys Relaxing At the Disco Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/07 Bingo Bus - Passenger of a Dream (Sky Full Of Dance Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/14 Timejumpers - Follow Rivers (Breeze of Chill Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/09 Non-Stop Listening - Nature Is Calling (Sexy Summer Session Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/13 Lover Banks - Hideaway (Cassette Sunset Ibiza Retro Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/04 Unchained Bars - Sleepless Eyes (Beyond the Beach Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/06 Waterfront Lounge - Stones Against Water (Ibiza Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/01 Chimichanka - Whisper of the Ocean (Never Felt So Good Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/08 Cliffside Chiller - Sunny Beach Days (Beat Drops Out Edit).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/03 Ocean View Suite - Down by the Pier (Sigma Sound Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/11 Electric Rock Reef - Summer Prayer (A B C - Elements and Shadows Mix).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/16 Mazed Emotions - Everybody Can Be Free (Turn This Beat Around Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/15 Glitch and Wet - Salt on My Skin (Jealous No More Cut).mp3: OK
/mnt/disk3/Music/music.mp3/Chill/Various Artists/Taste of Summer Del Mar (Cool and Smooth Chill and Lounge Sounds - Deluxe Selection for Easy Listening and Relax)/12 Moderate Jungle - The Sun (Shatter Me at Midnight Mix).mp3: OK

real    0m1.386s
user    0m2.080s
sys     0m0.250s

 

Just some ideas you may want to explore.

I had thought about it when this project started.  Would have probably been easier to create a front end for bitrot / bunker.

 

But, I discarded it because I personally feel that if you're going to wind up storing hidden meta data (and I hate with a passion anything at all hidden) containing checksums within extended attributes that ultimately you're going to be better off having the file system itself (ZFS or BTRFS once it matures more) handling all of that for you with the addition of self-healing.  Not saying that there's anything wrong with the approach of bunker/bitrot, but just that I don't necessarily agree with it.

 

I wanted something that the user wouldn't have to think about optionally storing the hashes along side the file.  All automatically.  So that once you copied the file to a stick you could verify the checksum(s) through windows no problems.  Not impossible without a .hash, but far harder, as the extended attributes would be lost when transferring off of the network, so you'd have to actually think to go and then export the checksums.

 

I like the fact that any time my wife tells me that such and such movie isn't working properly, I can (or even she can) just double click the .hash to check to see if its gotten corrupted or if it was just a bad rip.

 

I was already running a bash script that did basically the same thing as this plug for quite awhile.

 

And finally and possibly most importantly - I'm old school and/or just plain stubborn.

 

Storing within extended attributes does have the advantage that there's no excess file(s) within the folder, and full disk checking does not have the extra overhead that this plugin has while it figures out what files are stored on the particular disk.  Speed of creation / verification would be identical between the two approaches.

 

Each approach has its own pros and its cons.  My solution isn't perfect for everyone (and I acknowledge that in the OP and steer people towards bunker), but ultimately what I'm really striving for here is something that is simple to use (plugin configuration could be far better here), and its results (in my case the hash files) are easily portable between platforms (succeed here).

 

I don't think that its too much of a stretch to say that the vast majority of users of unRaid (or of any NAS or computer system in general) does not have any idea about checksums, and personally I think that ignorance of that is a huge mistake when we're talking about terabytes of data (both important and not-so important)

 

TBH, I don't particularily care if someone uses bunker / bitrot or this plugin or corz.  What I do want is for more people to actually create and understand checksums.  I think that a big reason why bunker only had 7 downloads (one of which was myself) was that it came across as too "techie" for joe-six pack.

 

And yes, par2 is ultimately planned for an auxiliary plugin for the data that you just can't replace.

 

This plugin is still very much a WIP.  Functional, but still evolving.  At this point in time nothing is off the table.  As I keep telling my wife, this stuff keeps me out of trouble.  If only she could get over the fact that all the computers have female names (mostly ex-gf's  ;) )

Link to comment

The purpose of using the meta data/extended attribute is to store it with the file so it won't matter if you are on the disk or user share.

no database needed. The user does not need to know.

 

At that point exporting it a local direct folder.hash or a remote/alternate location of md5hashname.hash with the embedded path provides what is needed. 

 

A centrally/easily managed md5 from the attribute (that the user never need know about)

An exported folder.hash in the within the directory or an alternate location with linkage back to the source.

 

The downside of having the original folder.hash within the directory is when you have corruption.

 

Building the md5hashname.hash to a central location and using a symlink safeguards the hash file and allows the symlink to exist in the folder (for corz, or export).

 

both bitrot and bunker have export formats, but not one that is compatible with corz.

 

I can do this fairly easily, however I don't really have the time do it, or it would have been done by now. LOL!

Maybe I'll get adventurous this week.

I'll still end up using my gdbmsum program since it's side effect lets me log all changes, cache the directories and update the hash in place.

Link to comment

TBH, I don't particularily care if someone uses bunker / bitrot or this plugin or corz.  What I do want is for more people to actually create and understand checksums.  I think that a big reason why bunker only had 7 downloads (one of which was myself) was that it came across as too "techie" for joe-six pack.

 

And yes, par2 is ultimately planned for an auxiliary plugin for the data that you just can't replace.

 

folder.par2 is where this plugin is going to rock and save the day for some people.

Link to comment

both bitrot and bunker have export formats, but not one that is compatible with corz.

I've actually been waiting for someone to ask about that.  I actually was going to build in compatibility to read in the extended attributes to avoid having to rehash all of the existing files that had already been done with bunker, but looked at the number of total downloads of it (at the time just prior to publishing this plugin it was a whole 7 downloads) and decided that it wasn't worth the huge amount of debugging on it.  (and because I'm directly dealing with user's data here, my debugging is rather extensive to make sure that there's no way I can inadvertently corrupt data -> to the point that every time I write a hash file the plugin checks to make sure the file name is correct and if its not correct immediately throws up an alert and completely stops the plugin from doing anything and everything until a reboot happens).

 

I figured that if anyone ever brought it up and supplied me with a sample exported file, I'd just create a script to create the .hash files from it.

 

Link to comment

both bitrot and bunker have export formats, but not one that is compatible with corz.

I've actually been waiting for someone to ask about that.  I actually was going to build in compatibility to read in the extended attributes to avoid having to rehash all of the existing files that had already been done with bunker, but looked at the number of total downloads of it (at the time just prior to publishing this plugin it was a whole 7 downloads) and decided that it wasn't worth the huge amount of debugging on it.  (and because I'm directly dealing with user's data here, my debugging is rather extensive to make sure that there's no way I can inadvertently corrupt data -> to the point that every time I write a hash file the plugin checks to make sure the file name is correct and if its not correct immediately throws up an alert and completely stops the plugin from doing anything and everything until a reboot happens).

 

I figured that if anyone ever brought it up and supplied me with a sample exported file, I'd just create a script to create the .hash files from it.

 

It's probably not worth your effort.

 

I have a tool, it still remains to have a few options added and then compiled for 64bit.

I've been wasting so much time with the conversion to ESX6 and unRAID6 with the ESX usb reset problem that I have not finished.

 

This is the help screen so far.

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/hashtools-work# ./hashfattrexport  --help
Usage: %s [OPTION]... PATTERN [PATTERN]...
Export hash extended file attributes on each FILE or DIR recursively
PATTTERN is globbed by the shell, Directories are processed recursively


Filter/Name selection and interpretation:
                           Filter rules are processed like find -name using fnmatch
-n, --name                Filter by name (multiples allowed)
-f, --filter              Filter from file (One filter file only for now)


-X, --one-file-system     don't cross filesystem boundaries
-l  --maxdepth <levels>   Descend at most <levels> of directories below command line
-C  --chdir <directory>   chdir to this directory before operating
-r  --relative            Attempt to build relative path from provided files/dirs
                           A second -r uses realpath() which resolves to full path


-S, --stats               Print statistics
-P, --progress            Print statistic progress every <seconds>.


-Z, --report-missing      Print filenames missing an extended hash attribute
-M, --report-modified     Print filenames modified after extended hash attribute
-0, --null                Terminate filename lines with NULL instead of default \n
-R, --report              Report status OK,FAILED,MODIFIED,MISSING_XATTR,UPDATED
-q, --quiet               Quiet/Less output use multiple -q's to make quieter
-v, --verbose             increment verbosity


-h, --help                Help display
-V, --version             Print Version
-d, --debug               increment debug level

 

 

it works like this

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/hashtools-work# getfattr -d strlib.* 
# file: strlib.c
user.hash.time="1419424415"
user.hash.value="67eef48f1199c68381127baded05f051"

# file: strlib.h
user.hash.time="1419424415"
user.hash.value="214635e26ea28ccb3cb18b9b3d484248"

# file: strlib.o
user.hash.time="1419424415"
user.hash.value="1d258b90f70e787b5560897fcb125e1b"

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/hashtools-work# ./hashfattr -r strlib.*
67eef48f1199c68381127baded05f051  strlib.c
214635e26ea28ccb3cb18b9b3d484248  strlib.h
1d258b90f70e787b5560897fcb125e1b  strlib.o

root@unRAID:/mnt/disk1/home/rcotrone/src.slacky/hashtools-work# ./hashfattr -r strlib.* | md5sum -c
strlib.c: OK
strlib.h: OK
strlib.o: OK

 

It's like doing a find down a tree | grep | some filter to convert the output of getfattr | md5sum -c

 

What I have to perfect is writing an individual folder.hash per directory and/or doing the whole /mnt/somearchivefolder/hashed_directory_name.hash as previously mentioned.

Link to comment

I followed the bunker thread but never bothered to use it because in the end I wanted something that was filesystem independent. I backup my really important user shares to NTFS disks and store them offsite. That makes it much easier to access them without unRAID.

 

I had been running an md5 script I got from bjp999 before this plugin came along, and I had used corz in the past but not diligently, and never seemed to get around to actually verifying anything.

 

This plugin is indispensable in my opinion and it already has all the functionality I could hope for, but I will look forward to further refinements. I just hope it continues to maintain corz compatibility and filesystem independence.

 

Thanks

Link to comment

I just hope it continues to maintain corz compatibility and filesystem independence.

Put it this way...  At the moment, I don't forsee building a complete file system browser into it so that you can check a single hash file.  Bulk verification only.  Which means that corz is here to stay.
Link to comment

no problems.  If I get time I'll look into a bash script to help you.  (Working on a new verification script to avoid a PHP bug that affects people having hundreds of thousands of files with low memory - intermittent bus errors with the existing script)

 

I'm good, I have my .c programs. I only wanted to add some ideas if you were looking to expand further.

I think the corz compatibility is/was a great idea and it's something I've been proposing to the other authors.

I'll probably borrow some the plugin code to see how I can do the same.

 

I have millions of files with all sorts of file names. I need to do it in .c to avoid quoting issues.

I learned that linking to the openssl libraries provides the fastest md5 implementation I could find. I wouldn't be surprised if PHP uses them.

Along with a compiled walk through the file system, it's as fast as it can possibly be.

Link to comment
  • Squid locked this topic
  • Squid unlocked this topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.