bitrot - a utility for generating sha256 keys for integrity checks (version 1.0)


Recommended Posts

Thanks for the script.  It works great.  I have a question can you run the script with more than one mask? 

 

i.e.  bitrot.sh -a -p /mnt/user/Movies -m *.mkv -m *.m4v

 

It uses the mask in the find command. I'll look into if it supports more than one mask or repeat the find command multiple times.

Link to comment
  • Replies 94
  • Created
  • Last Reply

Top Posters In This Topic

Hi All,

 

I cant get it to work with version 5.05. What are the steps to install and what is additionally needed for 5.05?

Have you installed the hashdeep package which is a required dependency?  For v5 it is recommended that you do this via unMenu although there is nothing to stop you adding it directly.

Link to comment
  • 2 weeks later...

Thanks for the reply. There is only md5deep-3.6.orig.tar.gz in unMmenu?

 

Should I install this?

 

Regards

 

Duppie

 

Install that.

 

I installed it but still getting the following error:

 

root@Tower:/boot/tools# ./bitrot.sh -a -p /mnt/user/Movies/Documentaries

 

bitrot, by John Bartlett, version 1.0

 

Error: The hashdeep package has not been installed.

 

root@Tower:/boot/tools# which hasdeep

which: no hasdeep in (.:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/usr/kerberos/bin:/usr/kerberos/sbin)

root@Tower:/boot/tools# which md5deep

/usr/local/bin/md5deep

root@Tower:/boot/tools#

 

Duppie

Link to comment
  • 2 months later...

I've downloaded gcc and md5deep. I set md5deep to reinstall on re-boot, but not gcc. Since md5deep downloads and compiles on reboot, won't I need to set gcc to download & install itself, too?

 

EDIT:

I'm getting the same issue as Duppie:

root@NAS:/boot/packages/md5deep/hashdeep# bitrot.sh -a -p "/mnt/user/Home Video"

bitrot, by John Bartlett, version 1.0

Error: The hashdeep package has not been installed.
root@NAS:/boot/packages/md5deep/hashdeep# which hashdeep
/boot/packages/md5deep/hashdeep/hashdeep

 

Running 5.0.4, I installed md5deep via unMenu.

 

Any thoughts?

Link to comment

root@NAS:/boot/packages/md5deep/hashdeep# which hashdeep

/boot/packages/md5deep/hashdeep/hashdeep

 

'which' is showing only that there is a hashdeep package, but not that it is installed.  Should be a hashdeep binary in the path, or wherever bitrot is looking for it.

Link to comment

OK, started digging through the script itself.

which sha256deep

yields nothing, so it executes

installpkg hashdeep-4.4-x86_64-1rj.txz > /dev/null

and still doesn't work (even though hashdeep-4.4-x86_64-1rj.txz is in the same directory as bitrot.sh)

 

I tried executing

root@NAS:/boot/Scripts# installpkg hashdeep-4.4-x86_64-1rj.txz
Verifying package hashdeep-4.4-x86_64-1rj.txz.
expr: syntax error
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 439: echo: write error: No space left on device
/sbin/installpkg: line 442: echo: write error: No space left on device
cat: write error: No space left on device
Installing package hashdeep-4.4-x86_64-1rj.txz:
PACKAGE DESCRIPTION:
/sbin/installpkg: line 508: echo: write error: No space left on device
/sbin/installpkg: line 509: echo: write error: No space left on device
/sbin/installpkg: line 510: echo: write error: No space left on device
/sbin/installpkg: line 511: echo: write error: No space left on device
/sbin/installpkg: line 516: echo: write error: No space left on device
/sbin/installpkg: line 521: echo: write error: No space left on device
WARNING:  Package has not been created with 'makepkg'
/sbin/installpkg: line 530: echo: write error: No space left on device
Package hashdeep-4.4-x86_64-1rj.txz installed.

 

Where is this trying to install to that I'm running out of disk space? I've got 3.7GB free on my flash drive. Am I out of space on the virtual drive? The server has been up for 132 days, and I've got 8GB of RAM installed. It looks like I've got about 520MB RAM used, 4.14GB cached and 2.75GB free.

Link to comment

The script won't work as-is on a 32bit system and that package it installs is a 64bit package so it won't work for you either.

 

I don't have a 32bit version of UNRAID so I can't really develop it to work with 32bit - though of someone could link to the 32bit package that UnMenu uses (UnMenu on 64bit doesn't list it), I can add code for you to try.

Link to comment

The script won't work as-is on a 32bit system and that package it installs is a 64bit package so it won't work for you either.

 

I don't have a 32bit version of UNRAID so I can't really develop it to work with 32bit - though of someone could link to the 32bit package that UnMenu uses (UnMenu on 64bit doesn't list it), I can add code for you to try.

 

Well, that answers quite a few questions.

 

I've been meaning to set up a 6.0beta test system, guess I'd better get on it...

Link to comment

The script won't work as-is on a 32bit system and that package it installs is a 64bit package so it won't work for you either.

 

I don't have a 32bit version of UNRAID so I can't really develop it to work with 32bit - though of someone could link to the 32bit package that UnMenu uses (UnMenu on 64bit doesn't list it), I can add code for you to try.

 

Well, that answers quite a few questions.

 

I've been meaning to set up a 6.0beta test system, guess I'd better get on it...

Installing the md5deep package from UnMenu on a 32bit system should, in theory, work.

Link to comment

That's what I installed following your instructions in OP. Then, when it didn't work,  I downloaded the 64-bit hashdeep from your OP link.

 

I'll have to take a look through the unMenu md5deep package to see if there's a 32-bit hashdeep in there, or, maybe, alter the script to call one of the hashes that is installed.

 

That'll have to be after work, though...

Link to comment

I made this change:

#shabin="/usr/bin/sha256deep"
shabin="/usr/local/bin/sha256deep"

at line 136 and now it's running a treat!

 

Except that my test folder was on a drive with 0 bytes free and there wasn't enough room to write the extended attributes to disk. A touch of file rearranging will fix that right up.

 

Thanks for the script and for the tips to get it working.

Link to comment

I made this change:

#shabin="/usr/bin/sha256deep"
shabin="/usr/local/bin/sha256deep"

at line 136 and now it's running a treat!

 

Except that my test folder was on a drive with 0 bytes free and there wasn't enough room to write the extended attributes to disk. A touch of file rearranging will fix that right up.

 

Thanks for the script and for the tips to get it working.

 

I'll add a check for a 32bit system and add the path you gave, thanks! I'll also add a check for at least 1 MB of free space on a specified drive (/mnt/diskx) or on all drives (/mnt/user*)

Link to comment

The way bitrot stores hashes for individual files I find a very clever way of doing. The use of extended attributes makes housekeeping of the hashes very easy, it is just stored together with the other file information, the bright thing is: the actual file stays untouched, as it should.

 

When starting to use bitrot I soon wanted to have a couple of things more ...

 

The possibility to run completely silent, which makes it suitable to run in the background and can become part of a daily or weekly schedule to perform updates and verifications. Like regular checks for new files and add them automatically, and perform regular verifications on existing files to see if anything has changed and report about any mismatches.

 

The ability to use an exclude filter next to an include filter is also handy. Sometimes you want all files with an exception, e.g. skip all *.tmp files

 

The same can be said about file size, the ability to skip files below or above a certain size can help to filter out unwanted files.

 

A bit more speed in the file scanning would also help, maybe not so important when all is done in the background and on scheduler bases though.

 

The programmer in me was looking for ways to make my wishlist come true, and soon I found myself changing and adjusting the bitrot script and eventually a large rewrite.

 

I certainly don't want to hijack your brilliant idea and don't know if any interest exists in an alternative solution, but speedwise I could make a huge difference.

 

E.g. a 300.000 files scan with bitrot takes over 35 minutes on my machine, the same scan in the alternative version takes 29 seconds.

 

Anyway wanted to let you know that I like your work and it inspired me to come up with something which suites my needs. Next to the script I created a daily and weekly schedule to look for new files and keep a history of mismatches, if/when they occur.

 

 

 

Link to comment

Certainly a great idea, I did the same thing taking it a little further in .c

 

I was stuck on speed until I found a routine which internally does the md5 faster then I had and/or can use the version from openssl.

I also got stuck on details for export/import and the ability to use other hasher's.

It's really important for me that the data is exportable/importable as a md5sums file for using other tools.

 

This is what I had so far.

root@slacky:/mnt/disk1/home/rcotrone/src.slacky/hashtools-work# ./hashfattr  --help
Usage: %s [OPTION]... PATTERN [PATTERN]...
Manage hash extended file attributes on each FILE or DIR recursively
PATTTERN is globbed by the shell, Directories are processed recursively

-e, --export              Export extended hash attributes
-c, --check               Check  extended hash attributes
-u, --update              Update extended hash attributes
-z, --update-missing      Only Update hash when hash is missing
-m, --update-modified     Only Update hash when mtime > last hash time
-D, --delete              Delete extended hash attributes

Filter/Name selection and interpretation:
                           Filter rules are processed like find -name using fnmatch
-n, --name                Filter by name (multiples allowed)
-f, --filter              Filter from file (One filter file only for now)

-X, --one-file-system     don't cross filesystem boundaries
-l  --maxdepth <levels>   Descend at most <levels> of directories below command line
-C  --chdir <directory>   chdir to this directory before operating
-r  --relative            Attempt to build relative path from provided files/dirs
                           A second -r uses realpath() which resolves to full path

-S, --stats               Print statistics
-P, --progress            Print statistic progress every <seconds>.

-Z, --report-missing      Print filenames missing an extended hash attribute
-M, --report-modified     Print filenames modified after extended hash attribute
-0, --null                Terminate filename lines with NULL instead of default \n
                           More useful with -Z -M for path names.
-R, --report              Report status OK,FAILED,MODIFIED,MISSING_XATTR,UPDATED
-q, --quiet               Quiet/Less output use multiple -q's to make quieter
-v, --verbose             increment verbosity

                           Calculates hash internally (md5) (DEFAULT)
-x, --hash-exec           Calculate hash eXternally for check/upsert
     example:              /usr/bin/md5sum -b {}

-h, --help                Help display
-V, --version             Print Version
-d, --debug               increment debug level

 

I've held off on anything further after the reports of ReiserFS corruption and seeing the setfattr/getfattr routines being in the kernel stack trace.

 

I'm going to fully take this up further when I'm about ready to upgrade to unRAID 6.

 

A core need is a method of exporting / importing the hashes into other tools in the standard hash file format.

It's also important to rsync files using the -X parameter to save the extended attributes.

 

On the plus side, if you rsync -aX from one machine to another unraid machine, the extended attributes are preserved, which allows you to test them on the receiving side.

Link to comment

Wow, that looks impressive !

 

Certainly a lot of good ideas in your version, and perhaps I am going to "steal" a few :)

 

I didn't take the bold step of writing it in C, but certainly a faster hash calculation does benefit a lot, though I believe the sha256deep which is included with bitrot does do a decent thing.

 

extended attributes are a nifty thing, but yeah you have to be aware how to use the tools to preserve them.

 

Below my "help"

Usage: bunker -a|A|v|V|u|e|i|r [-fdsSlq] [-md5] path [!] [mask]
  -a          add SHA key attribute for files, specified in path and optional mask
  -A          same as -a option with implicit export function (may use -f)
  -v          verify SHA key attribute and report mismatches (may use -f)
  -V          same as -v option with updating of scandate of files (may use -f)
  -u          update mismatched SHA keys with correct SHA key attribute (may use -f)
  -e          export SHA key attributes to the export file (may use -f)
  -i          import SHA key attributes from file and restore them (must use -f)
  -r          remove SHA key attributes from specified selection (may use -f)
  -f <file>   optional set file reference to <file>. Defaults to /tmp/bunker.store.log
  -d <days>   optional only verify/update/remove for files scanned <days> or longer ago
  -s <size>   optional only include files smaller than <size>
  -S <size>   optional only include files greater than <size>
  -l          optional create log entry in the syslog file
  -q          optional quiet mode, suppress all output. Use for background processing
  -md5        optional use md5 hashing algorithm instead of sha256
  path        path to starting directory, mandatory with 3 exceptions (see examples)
  mask        optional filter for file selection. Default is all files
              when path or mask names have spaces, then place names between quotes
              precede mask with ! to change its operation from include to exclude

Examples:
bunker -a /mnt/user/TV                                      add SHA keys for files in share 'TV'
bunker -a -S 10M /mnt/user/TV                               add SHA keys for files greater than 10 MB in share 'TV'
bunker -a /mnt/user/TV *.mov                                add SHA keys for '*.mov' files only in share 'TV'
bunker -a /mnt/user/TV ! *.mov                              add SHA keys for all files in share 'TV' except '*.mov'
bunker -A -f /tmp/keys.txt /mnt/user/TV                     add SHA keys for files in share 'TV' and export to file keys.txt
bunker -v /mnt/user/Documents                               verify SHA keys for previously scanned files
bunker -V /mnt/user/Documents                               verify SHA keys for scanned files and update their scandate
bunker -v -d 90 /mnt/user/Documents                         verify SHA keys for files scanned 90 days or longer ago
bunker -v -f /tmp/mismatches.txt /mnt/disk2/Documents       verify SHA keys and save mismatches in user defined file
bunker -u  /mnt/disk2/Documents                             update SHA keys for mismatching files
bunker -u -f /tmp/mismatches.txt                            update SHA keys for files listed in user defined file - no path
bunker -e /mnt/disk1/Movies                                 export SHA keys to default export file
bunker -e -f /mnt/cache/disk1_keys.txt /mnt/disk1/Movies    export SHA keys to user defined file
bunker -i -f /mnt/cache/disk1_keys.txt                      import and restore SHA keys from user defined file - no path
bunker -r  /mnt/user/TV                                     remove SHA keys for files in share 'TV'
bunker -r -f /tmp/mismatches.txt                            remove SHA keys for files listed in user defined file - no path

Link to comment

The filter logic I used is similiar to rsync.

 

+ thisnamemas

- thisnamemask

- thisnamemask

+ thisnamemask

 

it was too much work *(and I'm lazy)* to do the whole  ! -name thing as it means parsing the argument vector manually instead of using getopt_long.

 

 

so my filters use  -name "+ thismask" -name "- thismask"

 

Since I use ftw() as does find, I was going to add a --mmin or age factor.

When I use a --age factor I use a delta of  DD:MM:HH:SS i.e days:minutes:hours:seconds.

Since that gets converted to the number of seconds you can pass the human delta factor or number of seconds.

 

Where I've seen issue is in export/import. More so in files with special characters so my default export is exactly like the output of the hash sum tools of  hash filename. I'm planning the extended export as mtime size hash filename.

 

Truth be told, I've never thought of a size mask, I.e. excluding on size.

 

However I have considered the ability to traverse the directory tree and export a folder.hash file in each directory for compatibility with corz checksum.  Then also having an import tool use that to add the data to the extended attributes.

 

and as always I go off in tangents and get side tracked.

 

As far as logging via syslog, that could come to cause issue in large filesystems as in my own. I have about 350,000 files on one filesystem.  so logging to a unique facility might be worthwhile so rsyslog or syslog.conf can redirect the messages. With rsyslog you can have it automatically date the output logs too.

 

I really have 4 parallel projects going on with this.

 

One like the the bitrot to keep the hash sums with the files. i.e. hashfattr, hasfattrexport hashfattrimport

 

Another to use .gdbm files because they are very fast. it provides a very fast index to all the stat information on the drive.  gdbmsums gdbmftw (like cache dirs only it stores all the stat data that is read)

This was my attempt at my own cache-dirs, which would also cache the stat blocks for rapid review.

 

Then the third being the sqlite locate database which allows importing of foreign file systems for hash sum storage and locating.

 

The forth is unique in that I use par2 to create a folder.par2 in each directory.

This provides checking of files, but also provides the ability to correct a percentage of corruption if there is a problem.

 

I think all of these have merit, but the key issue is the data also needs to be exported off the file system in case there is a problem with the filesystem itself.  This export file needs to be usable by standard tools. ie. md5sum sha256sum, etc, etc so there is something to usable at the command level without significant reprogramming.

 

With a basic exported file you can run it against the hasher directly.

With the extended format of  mtime size hash filename you can chop off mtime and size with sed or cut and pipe it to md5sum/sha256sum for a quick verify.

 

Certainly a lot of good ideas in your version, and perhaps I am going to "steal" a few :)

 

That's why I posted where I am so far, Each of us learn and build on ideas from one another!

Link to comment

I benchmarked several hashing utilities and the speed was more or less negligible (ie: Tie) for reading off of a platter so I opted for the algorithm that didn't have any known or potential collisions.

 

I've never thought of excluding based on size. I use the find tool to locate files and I believe it only allows for one mask. Grep regex's are flakey concerning files with special characters so I opted not to go that route.

 

Is there a link detailing the standard hash file layout? I can update the import/export to utilize that.

Link to comment

I think collisions in this case isn't so much of an issue.

Correct me if I'm wrong, but the goal is to check the file's integrity. In that case an md5 is adequate.

I don't think there will be any false positives.

 

I don't think greps' regex's are flaky, but I would love to see some real world examples.

I know there can be all sorts of issues with command line quoting, which is why many times I do it in .c or via fgrep.

 

Standard hash file layout is

 

hash(space)(space or * to signal file is opened in binary vs text)filename

 

I would propose to add

mtime size as a two field prefix for an extended hash file format.

This way the first two fields can be cut out easily thus leaving the regular hash file format.

if you really needed to export/save the last verify time then add it before these two fields.

 

Thus providing

 

time(of last verify) mtime(of file) size(of file) hash(of hasher)(space)(space or *)filename

 

At the very minimum,

hash(of hasher)(space)(space or *)filename

Is the most universal format as it can be piped directly into the hashers for verification.

Can't tell you how often I'll do this.

 

With slower systems (such as my N54L) The forking overhead of running an individual hasher for each and every file via pipe makes the process slow. Whereas running the hasher on the exported hash file is faster.

It really becomes apparent with many smaller files such as my mp3 archive with half a million files.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.