jbartlett Posted September 21, 2014 Author Share Posted September 21, 2014 Thanks for the script. It works great. I have a question can you run the script with more than one mask? i.e. bitrot.sh -a -p /mnt/user/Movies -m *.mkv -m *.m4v It uses the mask in the find command. I'll look into if it supports more than one mask or repeat the find command multiple times. Quote Link to comment
Duppie Posted September 21, 2014 Share Posted September 21, 2014 Hi All, I cant get it to work with version 5.05. What are the steps to install and what is additionally needed for 5.05? Thanks Duppie Quote Link to comment
itimpi Posted September 21, 2014 Share Posted September 21, 2014 Hi All, I cant get it to work with version 5.05. What are the steps to install and what is additionally needed for 5.05? Have you installed the hashdeep package which is a required dependency? For v5 it is recommended that you do this via unMenu although there is nothing to stop you adding it directly. Quote Link to comment
Duppie Posted September 22, 2014 Share Posted September 22, 2014 Thanks for the reply. There is only md5deep-3.6.orig.tar.gz in unMmenu? Should I install this? Regards Duppie Quote Link to comment
jbartlett Posted September 22, 2014 Author Share Posted September 22, 2014 Thanks for the reply. There is only md5deep-3.6.orig.tar.gz in unMmenu? Should I install this? Regards Duppie Install that. Quote Link to comment
Duppie Posted September 22, 2014 Share Posted September 22, 2014 Ok thanks, Will give it a go. Quote Link to comment
sureguy Posted October 1, 2014 Share Posted October 1, 2014 Thanks a lot for this, it's much appreciated! Quote Link to comment
Duppie Posted October 5, 2014 Share Posted October 5, 2014 Thanks for the reply. There is only md5deep-3.6.orig.tar.gz in unMmenu? Should I install this? Regards Duppie Install that. I installed it but still getting the following error: root@Tower:/boot/tools# ./bitrot.sh -a -p /mnt/user/Movies/Documentaries bitrot, by John Bartlett, version 1.0 Error: The hashdeep package has not been installed. root@Tower:/boot/tools# which hasdeep which: no hasdeep in (.:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/usr/kerberos/bin:/usr/kerberos/sbin) root@Tower:/boot/tools# which md5deep /usr/local/bin/md5deep root@Tower:/boot/tools# Duppie Quote Link to comment
FreeMan Posted December 7, 2014 Share Posted December 7, 2014 I've downloaded gcc and md5deep. I set md5deep to reinstall on re-boot, but not gcc. Since md5deep downloads and compiles on reboot, won't I need to set gcc to download & install itself, too? EDIT: I'm getting the same issue as Duppie: root@NAS:/boot/packages/md5deep/hashdeep# bitrot.sh -a -p "/mnt/user/Home Video" bitrot, by John Bartlett, version 1.0 Error: The hashdeep package has not been installed. root@NAS:/boot/packages/md5deep/hashdeep# which hashdeep /boot/packages/md5deep/hashdeep/hashdeep Running 5.0.4, I installed md5deep via unMenu. Any thoughts? Quote Link to comment
RobJ Posted December 7, 2014 Share Posted December 7, 2014 root@NAS:/boot/packages/md5deep/hashdeep# which hashdeep /boot/packages/md5deep/hashdeep/hashdeep 'which' is showing only that there is a hashdeep package, but not that it is installed. Should be a hashdeep binary in the path, or wherever bitrot is looking for it. Quote Link to comment
FreeMan Posted December 7, 2014 Share Posted December 7, 2014 OK, started digging through the script itself. which sha256deep yields nothing, so it executes installpkg hashdeep-4.4-x86_64-1rj.txz > /dev/null and still doesn't work (even though hashdeep-4.4-x86_64-1rj.txz is in the same directory as bitrot.sh) I tried executing root@NAS:/boot/Scripts# installpkg hashdeep-4.4-x86_64-1rj.txz Verifying package hashdeep-4.4-x86_64-1rj.txz. expr: syntax error /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 439: echo: write error: No space left on device /sbin/installpkg: line 442: echo: write error: No space left on device cat: write error: No space left on device Installing package hashdeep-4.4-x86_64-1rj.txz: PACKAGE DESCRIPTION: /sbin/installpkg: line 508: echo: write error: No space left on device /sbin/installpkg: line 509: echo: write error: No space left on device /sbin/installpkg: line 510: echo: write error: No space left on device /sbin/installpkg: line 511: echo: write error: No space left on device /sbin/installpkg: line 516: echo: write error: No space left on device /sbin/installpkg: line 521: echo: write error: No space left on device WARNING: Package has not been created with 'makepkg' /sbin/installpkg: line 530: echo: write error: No space left on device Package hashdeep-4.4-x86_64-1rj.txz installed. Where is this trying to install to that I'm running out of disk space? I've got 3.7GB free on my flash drive. Am I out of space on the virtual drive? The server has been up for 132 days, and I've got 8GB of RAM installed. It looks like I've got about 520MB RAM used, 4.14GB cached and 2.75GB free. Quote Link to comment
WeeboTech Posted December 7, 2014 Share Posted December 7, 2014 Maybe /var/log ran out of space. do df -vH /var/log Example: root@unRAID:/var/log/packages# df -vH Filesystem Size Used Avail Use% Mounted on tmpfs 135M 25M 110M 19% /var/log Quote Link to comment
jbartlett Posted December 7, 2014 Author Share Posted December 7, 2014 The script won't work as-is on a 32bit system and that package it installs is a 64bit package so it won't work for you either. I don't have a 32bit version of UNRAID so I can't really develop it to work with 32bit - though of someone could link to the 32bit package that UnMenu uses (UnMenu on 64bit doesn't list it), I can add code for you to try. Quote Link to comment
FreeMan Posted December 8, 2014 Share Posted December 8, 2014 The script won't work as-is on a 32bit system and that package it installs is a 64bit package so it won't work for you either. I don't have a 32bit version of UNRAID so I can't really develop it to work with 32bit - though of someone could link to the 32bit package that UnMenu uses (UnMenu on 64bit doesn't list it), I can add code for you to try. Well, that answers quite a few questions. I've been meaning to set up a 6.0beta test system, guess I'd better get on it... Quote Link to comment
jbartlett Posted December 8, 2014 Author Share Posted December 8, 2014 The script won't work as-is on a 32bit system and that package it installs is a 64bit package so it won't work for you either. I don't have a 32bit version of UNRAID so I can't really develop it to work with 32bit - though of someone could link to the 32bit package that UnMenu uses (UnMenu on 64bit doesn't list it), I can add code for you to try. Well, that answers quite a few questions. I've been meaning to set up a 6.0beta test system, guess I'd better get on it... Installing the md5deep package from UnMenu on a 32bit system should, in theory, work. Quote Link to comment
FreeMan Posted December 8, 2014 Share Posted December 8, 2014 That's what I installed following your instructions in OP. Then, when it didn't work, I downloaded the 64-bit hashdeep from your OP link. I'll have to take a look through the unMenu md5deep package to see if there's a 32-bit hashdeep in there, or, maybe, alter the script to call one of the hashes that is installed. That'll have to be after work, though... Quote Link to comment
FreeMan Posted December 9, 2014 Share Posted December 9, 2014 I made this change: #shabin="/usr/bin/sha256deep" shabin="/usr/local/bin/sha256deep" at line 136 and now it's running a treat! Except that my test folder was on a drive with 0 bytes free and there wasn't enough room to write the extended attributes to disk. A touch of file rearranging will fix that right up. Thanks for the script and for the tips to get it working. Quote Link to comment
jbartlett Posted December 9, 2014 Author Share Posted December 9, 2014 I made this change: #shabin="/usr/bin/sha256deep" shabin="/usr/local/bin/sha256deep" at line 136 and now it's running a treat! Except that my test folder was on a drive with 0 bytes free and there wasn't enough room to write the extended attributes to disk. A touch of file rearranging will fix that right up. Thanks for the script and for the tips to get it working. I'll add a check for a 32bit system and add the path you gave, thanks! I'll also add a check for at least 1 MB of free space on a specified drive (/mnt/diskx) or on all drives (/mnt/user*) Quote Link to comment
bonienl Posted December 17, 2014 Share Posted December 17, 2014 The way bitrot stores hashes for individual files I find a very clever way of doing. The use of extended attributes makes housekeeping of the hashes very easy, it is just stored together with the other file information, the bright thing is: the actual file stays untouched, as it should. When starting to use bitrot I soon wanted to have a couple of things more ... The possibility to run completely silent, which makes it suitable to run in the background and can become part of a daily or weekly schedule to perform updates and verifications. Like regular checks for new files and add them automatically, and perform regular verifications on existing files to see if anything has changed and report about any mismatches. The ability to use an exclude filter next to an include filter is also handy. Sometimes you want all files with an exception, e.g. skip all *.tmp files The same can be said about file size, the ability to skip files below or above a certain size can help to filter out unwanted files. A bit more speed in the file scanning would also help, maybe not so important when all is done in the background and on scheduler bases though. The programmer in me was looking for ways to make my wishlist come true, and soon I found myself changing and adjusting the bitrot script and eventually a large rewrite. I certainly don't want to hijack your brilliant idea and don't know if any interest exists in an alternative solution, but speedwise I could make a huge difference. E.g. a 300.000 files scan with bitrot takes over 35 minutes on my machine, the same scan in the alternative version takes 29 seconds. Anyway wanted to let you know that I like your work and it inspired me to come up with something which suites my needs. Next to the script I created a daily and weekly schedule to look for new files and keep a history of mismatches, if/when they occur. Quote Link to comment
WeeboTech Posted December 17, 2014 Share Posted December 17, 2014 Certainly a great idea, I did the same thing taking it a little further in .c I was stuck on speed until I found a routine which internally does the md5 faster then I had and/or can use the version from openssl. I also got stuck on details for export/import and the ability to use other hasher's. It's really important for me that the data is exportable/importable as a md5sums file for using other tools. This is what I had so far. root@slacky:/mnt/disk1/home/rcotrone/src.slacky/hashtools-work# ./hashfattr --help Usage: %s [OPTION]... PATTERN [PATTERN]... Manage hash extended file attributes on each FILE or DIR recursively PATTTERN is globbed by the shell, Directories are processed recursively -e, --export Export extended hash attributes -c, --check Check extended hash attributes -u, --update Update extended hash attributes -z, --update-missing Only Update hash when hash is missing -m, --update-modified Only Update hash when mtime > last hash time -D, --delete Delete extended hash attributes Filter/Name selection and interpretation: Filter rules are processed like find -name using fnmatch -n, --name Filter by name (multiples allowed) -f, --filter Filter from file (One filter file only for now) -X, --one-file-system don't cross filesystem boundaries -l --maxdepth <levels> Descend at most <levels> of directories below command line -C --chdir <directory> chdir to this directory before operating -r --relative Attempt to build relative path from provided files/dirs A second -r uses realpath() which resolves to full path -S, --stats Print statistics -P, --progress Print statistic progress every <seconds>. -Z, --report-missing Print filenames missing an extended hash attribute -M, --report-modified Print filenames modified after extended hash attribute -0, --null Terminate filename lines with NULL instead of default \n More useful with -Z -M for path names. -R, --report Report status OK,FAILED,MODIFIED,MISSING_XATTR,UPDATED -q, --quiet Quiet/Less output use multiple -q's to make quieter -v, --verbose increment verbosity Calculates hash internally (md5) (DEFAULT) -x, --hash-exec Calculate hash eXternally for check/upsert example: /usr/bin/md5sum -b {} -h, --help Help display -V, --version Print Version -d, --debug increment debug level I've held off on anything further after the reports of ReiserFS corruption and seeing the setfattr/getfattr routines being in the kernel stack trace. I'm going to fully take this up further when I'm about ready to upgrade to unRAID 6. A core need is a method of exporting / importing the hashes into other tools in the standard hash file format. It's also important to rsync files using the -X parameter to save the extended attributes. On the plus side, if you rsync -aX from one machine to another unraid machine, the extended attributes are preserved, which allows you to test them on the receiving side. Quote Link to comment
bonienl Posted December 17, 2014 Share Posted December 17, 2014 Wow, that looks impressive ! Certainly a lot of good ideas in your version, and perhaps I am going to "steal" a few I didn't take the bold step of writing it in C, but certainly a faster hash calculation does benefit a lot, though I believe the sha256deep which is included with bitrot does do a decent thing. extended attributes are a nifty thing, but yeah you have to be aware how to use the tools to preserve them. Below my "help" Usage: bunker -a|A|v|V|u|e|i|r [-fdsSlq] [-md5] path [!] [mask] -a add SHA key attribute for files, specified in path and optional mask -A same as -a option with implicit export function (may use -f) -v verify SHA key attribute and report mismatches (may use -f) -V same as -v option with updating of scandate of files (may use -f) -u update mismatched SHA keys with correct SHA key attribute (may use -f) -e export SHA key attributes to the export file (may use -f) -i import SHA key attributes from file and restore them (must use -f) -r remove SHA key attributes from specified selection (may use -f) -f <file> optional set file reference to <file>. Defaults to /tmp/bunker.store.log -d <days> optional only verify/update/remove for files scanned <days> or longer ago -s <size> optional only include files smaller than <size> -S <size> optional only include files greater than <size> -l optional create log entry in the syslog file -q optional quiet mode, suppress all output. Use for background processing -md5 optional use md5 hashing algorithm instead of sha256 path path to starting directory, mandatory with 3 exceptions (see examples) mask optional filter for file selection. Default is all files when path or mask names have spaces, then place names between quotes precede mask with ! to change its operation from include to exclude Examples: bunker -a /mnt/user/TV add SHA keys for files in share 'TV' bunker -a -S 10M /mnt/user/TV add SHA keys for files greater than 10 MB in share 'TV' bunker -a /mnt/user/TV *.mov add SHA keys for '*.mov' files only in share 'TV' bunker -a /mnt/user/TV ! *.mov add SHA keys for all files in share 'TV' except '*.mov' bunker -A -f /tmp/keys.txt /mnt/user/TV add SHA keys for files in share 'TV' and export to file keys.txt bunker -v /mnt/user/Documents verify SHA keys for previously scanned files bunker -V /mnt/user/Documents verify SHA keys for scanned files and update their scandate bunker -v -d 90 /mnt/user/Documents verify SHA keys for files scanned 90 days or longer ago bunker -v -f /tmp/mismatches.txt /mnt/disk2/Documents verify SHA keys and save mismatches in user defined file bunker -u /mnt/disk2/Documents update SHA keys for mismatching files bunker -u -f /tmp/mismatches.txt update SHA keys for files listed in user defined file - no path bunker -e /mnt/disk1/Movies export SHA keys to default export file bunker -e -f /mnt/cache/disk1_keys.txt /mnt/disk1/Movies export SHA keys to user defined file bunker -i -f /mnt/cache/disk1_keys.txt import and restore SHA keys from user defined file - no path bunker -r /mnt/user/TV remove SHA keys for files in share 'TV' bunker -r -f /tmp/mismatches.txt remove SHA keys for files listed in user defined file - no path Quote Link to comment
WeeboTech Posted December 17, 2014 Share Posted December 17, 2014 The filter logic I used is similiar to rsync. + thisnamemas - thisnamemask - thisnamemask + thisnamemask it was too much work *(and I'm lazy)* to do the whole ! -name thing as it means parsing the argument vector manually instead of using getopt_long. so my filters use -name "+ thismask" -name "- thismask" Since I use ftw() as does find, I was going to add a --mmin or age factor. When I use a --age factor I use a delta of DD:MM:HH:SS i.e days:minutes:hours:seconds. Since that gets converted to the number of seconds you can pass the human delta factor or number of seconds. Where I've seen issue is in export/import. More so in files with special characters so my default export is exactly like the output of the hash sum tools of hash filename. I'm planning the extended export as mtime size hash filename. Truth be told, I've never thought of a size mask, I.e. excluding on size. However I have considered the ability to traverse the directory tree and export a folder.hash file in each directory for compatibility with corz checksum. Then also having an import tool use that to add the data to the extended attributes. and as always I go off in tangents and get side tracked. As far as logging via syslog, that could come to cause issue in large filesystems as in my own. I have about 350,000 files on one filesystem. so logging to a unique facility might be worthwhile so rsyslog or syslog.conf can redirect the messages. With rsyslog you can have it automatically date the output logs too. I really have 4 parallel projects going on with this. One like the the bitrot to keep the hash sums with the files. i.e. hashfattr, hasfattrexport hashfattrimport Another to use .gdbm files because they are very fast. it provides a very fast index to all the stat information on the drive. gdbmsums gdbmftw (like cache dirs only it stores all the stat data that is read) This was my attempt at my own cache-dirs, which would also cache the stat blocks for rapid review. Then the third being the sqlite locate database which allows importing of foreign file systems for hash sum storage and locating. The forth is unique in that I use par2 to create a folder.par2 in each directory. This provides checking of files, but also provides the ability to correct a percentage of corruption if there is a problem. I think all of these have merit, but the key issue is the data also needs to be exported off the file system in case there is a problem with the filesystem itself. This export file needs to be usable by standard tools. ie. md5sum sha256sum, etc, etc so there is something to usable at the command level without significant reprogramming. With a basic exported file you can run it against the hasher directly. With the extended format of mtime size hash filename you can chop off mtime and size with sed or cut and pipe it to md5sum/sha256sum for a quick verify. Certainly a lot of good ideas in your version, and perhaps I am going to "steal" a few That's why I posted where I am so far, Each of us learn and build on ideas from one another! Quote Link to comment
jbartlett Posted December 18, 2014 Author Share Posted December 18, 2014 I benchmarked several hashing utilities and the speed was more or less negligible (ie: Tie) for reading off of a platter so I opted for the algorithm that didn't have any known or potential collisions. I've never thought of excluding based on size. I use the find tool to locate files and I believe it only allows for one mask. Grep regex's are flakey concerning files with special characters so I opted not to go that route. Is there a link detailing the standard hash file layout? I can update the import/export to utilize that. Quote Link to comment
WeeboTech Posted December 18, 2014 Share Posted December 18, 2014 I think collisions in this case isn't so much of an issue. Correct me if I'm wrong, but the goal is to check the file's integrity. In that case an md5 is adequate. I don't think there will be any false positives. I don't think greps' regex's are flaky, but I would love to see some real world examples. I know there can be all sorts of issues with command line quoting, which is why many times I do it in .c or via fgrep. Standard hash file layout is hash(space)(space or * to signal file is opened in binary vs text)filename I would propose to add mtime size as a two field prefix for an extended hash file format. This way the first two fields can be cut out easily thus leaving the regular hash file format. if you really needed to export/save the last verify time then add it before these two fields. Thus providing time(of last verify) mtime(of file) size(of file) hash(of hasher)(space)(space or *)filename At the very minimum, hash(of hasher)(space)(space or *)filename Is the most universal format as it can be piped directly into the hashers for verification. Can't tell you how often I'll do this. With slower systems (such as my N54L) The forking overhead of running an individual hasher for each and every file via pipe makes the process slow. Whereas running the hasher on the exported hash file is faster. It really becomes apparent with many smaller files such as my mp3 archive with half a million files. Quote Link to comment
jbartlett Posted December 19, 2014 Author Share Posted December 19, 2014 If memory serves, the issues I had were in the use of bracket [ ] characters in the file name. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.