bonienl Posted December 23, 2014 Share Posted December 23, 2014 Introduction bunker is a file integrity check utility based on the original bitrot utility of jbartlett and the calculated sha keys are compatible with bitrot. I want to thank jbartlett for bringing out his excellent idea and my version can be seen as an alternative which I initially developed to fulfill my own requirements and now may be useful to others. The purpose of bunker is to save the calculated hash value in the extended attributes of a given file, this allows for regular checking of the integrity of the file content. Note that the original file is never touched or altered by bunker no matter what options are chosen. Different hashing methods are stored under different extended attributes, it is possible to store sha256, md5 and blake2 hashes together with a file, if that is desired. Versions Version 1.16 verify location of 'notify' script to support unRAID v6.x Version 1.15 Fixed execution of export (-e) command Fixed -D option (modified time calculation) Swapped -r and -R commands Added new extended attribute: file size Added display of file name being processed Code optimizations Version 1.14 logger corrections and code optimization (thx itimpi) Version 1.13 logger improvements (thx itimpi) Version 1.12 added new option -L which allows to log only changes Version 1.11 bug fix, correction in report calculation (thx archedraft) Version 1.10 more comprehensive reporting, and minor bug fix Version 1.9 (minor) bug fixing release Version 1.8 introduced new option -n (notify) which let bunker send alert notifications when file corruption is detected Version 1.7 introduced new filedate attribute. New -t (touch) command. Various improvements. Version 1.6 regression fix in -v command and add missing files display for -c command Version 1.5 introduced new command -U and new option -D Version 1.4 is a bug fixing release. Correct ETA calculation, fix scandate with -C option, sort output files. Version 1.3 has new options -c and -C which allows the checking of the hash values from a previous exported file. This can be used for example when transferring files from one filesystem to another, e.g. from reiserfs to xfs, during this process extended attributes are not copied over, but with the -c (-C) option these can be verified/restored afterwards, make sure to do an export of the hash keys before the file transfer. Version 1.2 uses the already installed utilities sha256sum and md5sum, no external package needs to be installed unless one wants to make use of the new option '-b2'. This uses the blake2 algorithm. A download of the blake2 utility can be obtained from the Blake2 site. Extract the file "b2sum-amd64-linux" rename it "b2sum" and copy it to your server, e.g. to /usr/bin/b2sum. Version 1.1 is the initial release. Usage The following is from the help of bunker. bunker v1.16 - Copyright (c) 2015 Bergware International Usage: bunker -a|A|v|V|u|U|e|t|i|c|C|r|R [-fdDsSlLnq] [-md5|-b2] path [!] [mask] -a add hash key attribute for files, specified in path and optional mask -A same as -a option with implicit export function (may use -f) -v verify hash key attribute and report mismatches (may use -f) -V same as -v option with updating of scandate of files (may use -f) -u update mismatched hash keys with correct hash key attribute (may use -f) -U same as -u option, but only update files which are newer than last scandate -e export hash key attributes to the export file (may use -f) -t touch file, i.e. copy file modified time to extended attribute -i import hash key attributes from file and restore them (must use -f) -c check hash key attributes from input file (must use -f) -C same as -c option and add hash key attribute for files (must use -f) -r remove hash key attribute from specified selection (may use -f) -R same as -r option and remove filedate, filesize, scandate values too (may use -f) -f <file> optional set file reference to <file>. Defaults to /tmp/bunker.store.log -d <days> optional only verify/update/remove files which were scanned <days> or longer ago -D <time> optional only add/verify/update/export/remove files newer than <time>, time = NNs,m,h,d,w -s <size> optional only include files smaller than <size> -S <size> optional only include files greater than <size> -l optional create log entry in the syslog file -L optional, same as -l but only create log entry when changes are present -n optional send notifications when file corruption is detected -q optional quiet mode, suppress all output. Use for background processing -md5 optional use md5 hashing algorithm instead of sha256 -b2 optional use blake2 hashing algorithm instead of sha256 path path to starting directory, mandatory with some exceptions (see examples) mask optional filter for file selection. Default is all files when path or mask names have spaces, then place names between quotes precede mask with ! to change its operation from include to exclude Examples: bunker -a /mnt/user/tv add SHA key for files in share tv bunker -a -S 10M /mnt/user/tv add SHA key for files greater than 10 MB in share tv bunker -a /mnt/user/tv *.mov add SHA key for .mov files only in share tv bunker -a /mnt/user/tv ! *.mov add SHA key for all files in share tv except .mov files bunker -A -f /tmp/keys.txt /mnt/user/tv add SHA key for files in share tv and export to file keys.txt bunker -v -n /mnt/user/files verify SHA key for previously scanned files and send notifications bunker -V /mnt/user/files verify SHA key for scanned files and update their scandate bunker -v -d 90 /mnt/user/movies verify SHA key for files scanned 90 days or longer ago bunker -v -f /tmp/errors.txt /mnt/user/movies verify SHA key and save mismatches in file errors.txt bunker -u /mnt/disk1 update SHA key for mismatching files bunker -U /mnt/disk1 update SHA key only for mismatching files newer than last scandate bunker -u -D 12h /mnt/disk1 update SHA key for mismatching files created in the last 12 hours bunker -u -f /tmp/errors.txt update SHA key for files listed in user defined file - no path bunker -e -f /tmp/disk1_keys.txt /mnt/disk1 export SHA key to file disk1_keys.txt bunker -i -f /tmp/disk1_keys.txt import and restore SHA key from user defined file - no path bunker -c -f /tmp/disk1_keys.txt check SHA key from user defined input file - no path bunker -C -f /tmp/disk1_keys.txt check SHA key and add SHA attribute (omit mismatches) - no path bunker -r /mnt/user/tv remove SHA key for files in share tv bunker -r -f /tmp/errors.txt remove SHA key for files listed in file errors.txt - no path Look at the examples to make use of the possibilities of bunker. Operation There are two main ways to use the utility: [1] interactive or [2] scheduled. Interactive When used in interactive mode the utility can be executed from a telnet session with the given options, and results are made visible on screen. Stopping the utility can be done at any time using 'CTRL-C'. It is also possible to open several telnet sessions and run multiple bunker instances concurrently, e.g. for checking different disks. Most of the time is spent on I/O access and calculating/checking a large disk can be a lengthy process, for example it takes almost 7.5 hours on my system to go through a nearly full 2TB disk. Scheduled Another way of operation is to create scheduled tasks to do regular file verifications and/or other activities. For example I created the script 'bunker-daily' and copied this file to the folder /etc/cron.daily. It checks for new files, file changes and updates the export file. It will go thru all available disks (thx itimpi). #!/bin/bash bunker=/boot/custom/bin/bunker log=/boot/custom/hash/blake2 var=/proc/mdcmd day=$(date +%Y%m%d) array=$(grep -Po '^mdState=\K\S+' $var) rsync=$(grep -Po '^mdResync=\K\S+' $var) mkdir -p $log # Daily check on new files, report errors and create export file if [[ $array == STARTED && $rsync -eq 0 ]]; then for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ; do if [[ -e /mnt/disk$i ]]; then $bunker -A -D 1 -q -l -f $log/disk$i.$day.new.txt /mnt/disk$i $bunker -U -D 1 -q -l -n -f $log/disk$i.$day.bad.txt /mnt/disk$i if [[ -s $log/disk$i.export.txt ]]; then if [[ -s $log/disk$i.$day.new.txt || -s $log/disk$i.$day.bad.txt ]]; then mv $log/disk$i.export.txt $log/disk$i.$day.txt $bunker -e -q -l -f $log/disk$i.export.txt /mnt/disk$i fi else $bunker -e -q -l -f $log/disk$i.export.txt /mnt/disk$i fi fi done fi This can be combined with a verification script which checks for file corruptions. I do this on monthly bases, but it may be done on weekly basis instead. Copy the file bunker-monthly to /etc/cron.monthly. Note that you need to adjust the script to the number of disks in your system. #!/bin/bash bunker=/boot/custom/bin/bunker log=/boot/custom/hash/blake2 var=/proc/mdcmd day=$(date +%Y%m%d) array=$(grep -Po '^mdState=\K\S+' $var) rsync=$(grep -Po '^mdResync=\K\S+' $var) mkdir -p $log # Monthly verification of different group of disks (quarterly rotation) if [[ $array == STARTED && $rsync -eq 0 ]]; then case $(($(date +%m)%4)) in 0) for i in 1 2 ; do $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i & done ;; 1) for i in 3 4 5 ; do $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i & done ;; 2) for i in 6 7 8 ; do $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i & done ;; 3) for i in 9 10 ; do $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i & done ;; esac fi Export / Import The purpose of export and import is to create and save a copy of the hash keys of the given files (export) which can be restored at later time (import), e.g. after a disk crash keys can be imported to run a file verification afterwards to see if content has been damaged. Export/import is not a file repair method but a mechanism to find corruptions. Other tools need to be used to do any file repair. Download See the attachment to download the zip file. Copy the file 'bunker' to your flash or other convenient location, and execute it from there. Optionally use the files bunker-daily and bunker-monthly. Extra Included in the zip file are two additional scripts bunker-update and bunker-verify, provided by courtesy of itimpi. You can place these scripts in the cron.hourly and cron.daily folders respectively and introduce automation of the file checking. bunker.zip 1 Quote Link to comment
jbartlett Posted December 23, 2014 Share Posted December 23, 2014 Interesting. Any plans on adding a recover method which will use SHA value to restore the file to the original location in the event that it ends up in a lost+found directory? You're welcome to examine my logic to base yours on. Quote Link to comment
jbartlett Posted December 23, 2014 Share Posted December 23, 2014 It was also an excellent idea to make the tools compatible with each other. Prevents against rescanning in the event that people switch from one to the other or decides to use both. Quote Link to comment
bonienl Posted December 23, 2014 Author Share Posted December 23, 2014 It was also an excellent idea to make the tools compatible with each other. Prevents against rescanning in the event that people switch from one to the other or decides to use both. I started with your tool and had all my files scanned/hashed. Didn't want to go through the whole timely process again by just changing a tool. Compatibility is indeed a nice addition when people use one or the other or perhaps both ... Maybe you want to consider your file output as "<hash value>|<full path/filename>|<scandate>" it will also make files interchangeable. I studied your approach of recovery in bitrot, but I am not confident enough myself to build that into my script, this is hard to test for me Quote Link to comment
jbartlett Posted December 23, 2014 Share Posted December 23, 2014 It was also an excellent idea to make the tools compatible with each other. Prevents against rescanning in the event that people switch from one to the other or decides to use both. I started with your tool and had all my files scanned/hashed. Didn't want to go through the whole timely process again by just changing a tool. Compatibility is indeed a nice addition when people use one or the other or perhaps both ... Maybe you want to consider your file output as "<hash value>|<full path/filename>|<scandate>" it will also make files interchangeable. I studied your approach of recovery in bitrot, but I am not confident enough myself to build that into my script, this is hard to test for me I plan on adding support for specifying the format of the export, something like "-F hash|filename|scandate" or "-F scandate|filename|hash" The way I tested the recovery was to scan a directory of a few files, copy those files to a different directory, alter the path in the export, and then recover against the copied files. Quote Link to comment
WeeboTech Posted December 23, 2014 Share Posted December 23, 2014 It was also an excellent idea to make the tools compatible with each other. Prevents against rescanning in the event that people switch from one to the other or decides to use both. I started with your tool and had all my files scanned/hashed. Didn't want to go through the whole timely process again by just changing a tool. Compatibility is indeed a nice addition when people use one or the other or perhaps both ... Maybe you want to consider your file output as "<hash value>|<full path/filename>|<scandate>" it will also make files interchangeable. I studied your approach of recovery in bitrot, but I am not confident enough myself to build that into my script, this is hard to test for me I plan on adding support for specifying the format of the export, something like "-F hash|filename|scandate" or "-F scandate|filename|hash" Many tools that allow reformatting of internal values on demand use printf like formatters. So if an agreed upon set of formatters are used the export can be configured as needed. Think like /bin/date, /bin/stat, even find uses them in the -printf ala -printf "%h Example from find man page. %f File's name with any leading directories removed (only the last element). %h Leading directories of file's name (all but the last ele-ment). If the file name contains no slashes (since it is in the current directory) the %h specifier expands to "." I'm not saying to use these, I'm suggesting to make it configurable with something like the --exportf argument --exportf "%H|%F|%D" While the names are more intuitive, these type of formatters are pretty common place. do /bin/stat --help and /bin/date --help I used this method for internal functions which dump the stat() blocks (grabbed it from /bin/stat), then added a few for hash/filename. With the tools I've built, since everything links to this function I can dump anything in any format as needed. Another idea is to allow the use of full variables using eval. --exportf '${HASH}|${FILENAME}|${SCANDATE}' This would allow exposing any variable from the shell in the export as long as it was exported and eval was used to get the definitions into another line. i.e. eval 'EXPORTLINE="${EXPORTSTRING}"' printf "${EXPORTLINE}\n" I've used this technique allot, but it also means people have to understand shell quoting, so I wouldn't recommend it. Quote Link to comment
bonienl Posted December 24, 2014 Author Share Posted December 24, 2014 Weebotech, thanks for your input here, much appreciated, certainly more food for thought Though I use "date" and "state" quite regular I always find myself looking at the help information to see which abbreviation is used for what particular item, in other words although their capabilities are very versatile there aren't very intuitive. If an export format option is added I would prefer to use abbreviations which make sense, like your proposed "%H|%F|%D" (one can even argue to make these case insensitive). Exporting of internal variables I find a risky business, using eval you would allow potentially every variable to be exported ... not my choice. Quote Link to comment
WeeboTech Posted December 24, 2014 Share Posted December 24, 2014 As much as the % formatters are not as easily intuitive, they are used in many tools and in c printfs. I learned to live with them. I wouldn't make them case insensitive as no other tool does that. I have a generic C function to do this that I include and modify in all programs. I bet if we come up with a general bash function to do the replacements, it will make it easier for both tools. The ability to have a basic hash export that can be immediately used in the source tool is of value when you actually have corruption. From issues seen throughput the forum, when there is corruption the getfattr and setfattr will exasperate that issue and even cause kernel crashes. You cannot rely on the metadata in all cases. In the heat of a problem, walking someone through parsing the intermediary export file makes it more work then simply archiving the raw hash for input to the hasher. This is the approach I use with export for those reasons. At the very least, perhaps provide an additional tool to convert the export file to a standard hash type file. I'll probably add the ability to use the % modifiers so the hash/scan date can be exported at will. I don't necessarily see the value in exporting it at this time. I would love to understand more about it's value and potential future use. Quote Link to comment
bonienl Posted December 24, 2014 Author Share Posted December 24, 2014 Also thinking of what can be a potential usage or benefit for introducing an export format option. Perhaps another approach can be to build into the tool an immediate verification from an import file to a hash utility, it would then basically replace the piped conversion syntax this cut -d'|' -f1,2 hashfile | sed 's/|/ */' | md5sum -c becomes bunker -c -md5 -f hashfile Something similar can be done for bitrot as well of course ... Quote Link to comment
WeeboTech Posted December 24, 2014 Share Posted December 24, 2014 Also thinking of what can be a potential usage or benefit for introducing an export format option. Perhaps another approach can be to build into the tool an immediate verification from an import file to a hash utility, it would then basically replace the piped conversion syntax this cut -d'|' -f1,2 hashfile | sed 's/|/ */' | md5sum -c becomes bunker -c -md5 -f hashfile Something similar can be done for bitrot as well of course ... That works, but when the raw tools exist, use them when possible. i.e. the lowest common denominator is a hash file that can be used by the hasher itself without the need of other tools. I've had to do this so many times as I'm consolidating a box of 30 hard drives into larger hard drives. Keeping md5sums files as model-serial.md5sums has been invaluable and I can use it with any of the basic unix tools. So the ability to export the file to be used by something that already exists has merit. That being said, this quick script provides an quick example for using the meta characters to export data in a configurable manner. #!/bin/bash declare -A meta meta[D]=`date +%s` meta[F]="/tmp/somefilename.ext" meta[H]="d41d8cd98f00b204e9800998ecf8427e" set | grep meta echo "%H=${meta[H]}" echo "%F=${meta[F]}" echo "%D=${meta[D]}" string="%H|%F|%D" string="%H *%F" for k in "${!meta[@]}" do echo "key : ${k}" echo "value: ${meta[${k}]}" string=${string//%${k}/${meta[${k}]}} done echo "$string" and example output [email protected]:/tmp# /boot/bin/bashmetachar.sh BASH_SOURCE=([0]="/boot/bin/bashmetachar.sh") meta=([D]="1419428621" [F]="/tmp/somefilename.ext" [H]="d41d8cd98f00b204e9800998ecf8427e" ) %H=d41d8cd98f00b204e9800998ecf8427e %F=/tmp/somefilename.ext %D=1419428621 key : D value: 1419428621 key : F value: /tmp/somefilename.ext key : H value: d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e */tmp/somefilename.ext or [email protected]:/tmp# /boot/bin/bashmetachar.sh BASH_SOURCE=([0]="/boot/bin/bashmetachar.sh") meta=([D]="1419428644" [F]="/tmp/somefilename.ext" [H]="d41d8cd98f00b204e9800998ecf8427e" ) %H=d41d8cd98f00b204e9800998ecf8427e %F=/tmp/somefilename.ext %D=1419428644 key : D value: 1419428644 key : F value: /tmp/somefilename.ext key : H value: d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e|/tmp/somefilename.ext|1419428644 This was a real quick example. I'm not sure what will happen if the filename itself has these meta characters in it. Anyway, just some food for thought. Quote Link to comment
WeeboTech Posted December 24, 2014 Share Posted December 24, 2014 And another point worth considering. How do people import already existing hash/md5sums files? Here's a quick example of what I had in place do this via shell. It takes quite some time to run on a large filesystem with allot of files, which is why I started to go the .c route. #!/bin/bash [ ${DEBUG:=0} -gt 0 ] && set -x -v P=${0##*/} # basename of program R=${0%%/$P} # dirname of program P=${P%.*} # strip off after last . character TMPFILE=/tmp/${P}.$$ trap "rm -f ${TMPFILE}; exit" EXIT HUP INT QUIT TERM i=0 #if [ -z "${1}" ] # then echo "Usage: $0 <md5sum hash file>" # exit #fi if [ ! -z "${1}" ] then exec 6<&0 0<${1} TIME=`stat -c %Y ${1}` else TIME=`date +%s` fi while read HASH FILENAME do (( i++ )) printf "%.6d %s\n" $i "${FILENAME}" >&2 FILENAME="${FILENAME#\*}" setfattr -n user.hash.value -v ${HASH} "${FILENAME}" TIME=`stat -c %Y "${FILENAME}"` setfattr -n user.hash.time -v ${TIME} "${FILENAME}" done if [ ! -z "${1}" ] then exec 0<&6 6<&- fi Quote Link to comment
tr0910 Posted December 24, 2014 Share Posted December 24, 2014 @weebo, @bonienl, @jbartlett Seeing you creating tools for advancing the stability and reliability of our data is the work of giants. We will all thank you for this later. I want to thank you now. Quote Link to comment
WeeboTech Posted December 24, 2014 Share Posted December 24, 2014 @weebo, @bonienl Seeing the 2 of you creating tools for advancing the stability and reliability of our data is the work of giants. We will all thank you for this later. I want to thank you now. Let's include jbartlett in this as well since bitrot is the impetus to doing this. jbartlett's contributions are invaluable. Quote Link to comment
bonienl Posted December 24, 2014 Author Share Posted December 24, 2014 Definitely, without bitrot we wouldn't have this conversion! A side note ... It would be good if some "standardization" is used in the extended attributes This will make different tools exchangable. So far bitrot / bunker uses: user.sha256=value user.scandate=value I have added for bunker in order to support different hashing methods. user.md5=value user.blake2=value Quote Link to comment
WeeboTech Posted December 24, 2014 Share Posted December 24, 2014 My attributes are configurable on the command line. They 'default' to user.hash.value= user.hash.time= if ( !opt.attrhashvaluename ) { sprintf(buf,"user.%s.value", opt.hashbase ? opt.hashbase : "hash" ); opt.attrhashvaluename = strdup(buf); } if ( !opt.attrhashtimename ) { sprintf(buf,"user.%s.time", opt.hashbase ? opt.hashbase : "hash" ); opt.attrhashtimename = strdup(buf); } If you use an external hasher, that hasher's basename is used in the attribute name such as --hash-exec '/bin/md5sum -b {}' user.md5sum.value user.md5sum.time However each can be overridden with --hash-name --time-name I use the epoch time in the time value, so it can be used in arithmetic or in date conversions with strftime Quote Link to comment
MortenSchmidt Posted February 5, 2015 Share Posted February 5, 2015 I've been using Bunker for a while now - thank you gentlemen! But the -i (import) option isn't working for me, I just get an "Invalid Parameter Specified" error. I have tried bunker -i -b2 -f /mnt/disk4/disk4blake2.txt /mnt/disk4 Same story with -c (check). But -a (adding), -v (verify), -u (update) and -e (export) all worked fine. Are the import and check features simply not implemented yet? Quote Link to comment
bonienl Posted February 5, 2015 Author Share Posted February 5, 2015 I've been using Bunker for a while now - thank you gentlemen! But the -i (import) option isn't working for me, I just get an "Invalid Parameter Specified" error. I have tried bunker -i -b2 -f /mnt/disk4/disk4blake2.txt /mnt/disk4 Same story with -c (check). But -a (adding), -v (verify), -u (update) and -e (export) all worked fine. Are the import and check features simply not implemented yet? With the import and check options only a file reference is given, no folder (this information is already in the file to be imported/checked). So syntax simply becomes : bunker -i -b2 -f /mnt/disk4/disk4blake2.txt Quote Link to comment
MortenSchmidt Posted February 9, 2015 Share Posted February 9, 2015 I've been using Bunker for a while now - thank you gentlemen! But the -i (import) option isn't working for me, I just get an "Invalid Parameter Specified" error. I have tried bunker -i -b2 -f /mnt/disk4/disk4blake2.txt /mnt/disk4 Same story with -c (check). But -a (adding), -v (verify), -u (update) and -e (export) all worked fine. Are the import and check features simply not implemented yet? With the import and check options only a file reference is given, no folder (this information is already in the file to be imported/checked). So syntax simply becomes : bunker -i -b2 -f /mnt/disk4/disk4blake2.txt Thank you. Makes sense. A problem I have noticed with using this tool is if I rename a folder, all files within that folder loose their extended attribute and thus their hash. This is a problem if you want to rsync your files to a new drive in a temp location and then later rename folders after files have been verified (this avoids having duplicate files while the transfer&verification is ongoing). There are a couple of workarounds, one of which is 1) Generate hashes on old (reiserfs) drive (bunker -a /mnt/disk1) 2) Copy to new (XFS) disk (rsync -avX /mnt/disk1 /mnt/disk2/temp 3) Verify files (bunker -v /mnt/disk2/temp 4) Export hashes from temp location (bunker -e -f /mnt/cache/disk2temp.txt /mnt/disk2) 5) Manually edit hash file to replace '/mnt/disk2/temp/' with '/mnt/disk2/' 6) Move files from temp to final location (mv /mnt/disk2/temp/* /mnt/disk2 or something along those lines) 7) Re-import hashes (bunker -i /mnt/cache/disk2temp.txt) However, it is also a problem when you want to reorganize your media library and rename/move many folders around. I'd love to hear of a solution that is more elegant than exporting hashes to one big file, then manually find&replace paths and then manually re-importing. I would also love it if bunker had the capability to store hash files per directory instead of the whole ext-attrib thing. This would elegantly avoid the above problem. It would also make it possible to generate hashes on the server, but from time to time verify a file over the network (with, say the corz tool). Is this a feature request that can be considered? Thanks again! Quote Link to comment
WeeboTech Posted February 9, 2015 Share Posted February 9, 2015 A problem I have noticed with using this tool is if I rename a folder, all files within that folder loose their extended attribute and thus their hash. This is a problem if you want to rsync your files to a new drive in a temp location and then later rename folders after files have been verified (this avoids having duplicate files while the transfer&verification is ongoing). There are a couple of workarounds, one of which is 1) Generate hashes on old (reiserfs) drive (bunker -a /mnt/disk1) 2) Copy to new (XFS) disk (rsync -avX /mnt/disk1 /mnt/disk2/temp 3) Verify files (bunker -v /mnt/disk2/temp 4) Export hashes from temp location (bunker -e -f /mnt/cache/disk2temp.txt /mnt/disk2) 5) Manually edit hash file to replace '/mnt/disk2/temp/' with '/mnt/disk2/' 6) Move files from temp to final location (mv /mnt/disk2/temp/* /mnt/disk2 or something along those lines) 7) Re-import hashes (bunker -i /mnt/cache/disk2temp.txt) However, it is also a problem when you want to reorganize your media library and rename/move many folders around. I'd love to hear of a solution that is more elegant than exporting hashes to one big file, then manually find&replace paths and then manually re-importing. The problem is in using mv and not rsync for the second restore. consider the following instead of move. rsync --remove-source-files -avPX source dest Quote Link to comment
WeeboTech Posted February 9, 2015 Share Posted February 9, 2015 I would also love it if bunker had the capability to store hash files per directory instead of the whole ext-attrib thing. This would elegantly avoid the above problem. It would also make it possible to generate hashes on the server, but from time to time verify a file over the network (with, say the corz tool). Is this a feature request that can be considered? Thanks again! I've suggested the ability to export files to an md5sums type folder.hash file per directory for the exact reason described. I believe the current export file can be run through sed & cut to create one but not on a per directory basis. i.e. it may traverse from a provided root, down the tree, but I'm not sure if it will use relative paths in the file without some work. Quote Link to comment
MortenSchmidt Posted February 10, 2015 Share Posted February 10, 2015 The problem is in using mv and not rsync for the second restore. consider the following instead of move. rsync --remove-source-files -avPX source dest Hmmm, on my system, even when source and dest is on the same physical disk, the above results in reading and re-writing all the files, instead of simply renaming the top level dirs. Takes a long time and defeats the purpose of not deleting the source files until after the destination files have been verified. Am I missing something? Quote Link to comment
WeeboTech Posted February 10, 2015 Share Posted February 10, 2015 I'm not sure what's going on with the environment. rsync works fine for me, but yes it copies the file and removes the source before it's been verified, yet it's already been verified. i.e. moving from /mnt/disk2/temp to /mnt/disk2 shouldn't require it to be verified again. In any case, I thought it odd about mv not preserving extended attributes. I remember seeing that it did in source code. In my most recent test with mv extended attributes are preserved. [email protected]:/mnt/disk2# pwd /mnt/disk2 declare -a MD5=$(md5sum folder.hash) setfattr -n user.hash -v ${MD5[0]} folder.hash [email protected]:/mnt/disk2# getfattr -d folder.hash # file: folder.hash user.hash="65b69d37c3d3f8cccce56a6f4ac7d49a" mkdir tmp mv folder.hash tmp cd tmp [email protected]:/mnt/disk2/tmp# pwd /mnt/disk2/tmp [email protected]:/mnt/disk2/tmp# getfattr -d folder.hash # file: folder.hash user.hash="65b69d37c3d3f8cccce56a6f4ac7d49a" [email protected]:/mnt/disk2/tmp# mkdir /mnt/disk1/tmp [email protected]:/mnt/disk2/tmp# mv folder.hash /mnt/disk1/tmp [email protected]:/mnt/disk2/tmp# getfattr -d /mnt/disk1/tmp/folder.hash getfattr: Removing leading '/' from absolute path names # file: mnt/disk1/tmp/folder.hash user.hash="65b69d37c3d3f8cccce56a6f4ac7d49a" I wouldn't use rsync to move from /mnt/disk2/temp to /mnt/disk2 however your statement said,'something along those lines'. After re-verifying, I would use mv. However when going disk to disk or disk to disk on another system I would use rsync. Usually what I do is rsync -avPX source rsync://host/path after that is done I do it again with the -rc instead of -a. rsync -rcvPX source rsync://host/path This does a checksum comparison second time around instead of mtime/size. After that I'll do a third one with the --remove-source-files rsync --remove-source-files -rcvPX source rsync://host/path You can probably eliminate the second step if you do the hash verification from bunker. This is just how I do it though. mv should be working for you. I would double check that the drives are mounted with extended attributes. Quote Link to comment
MortenSchmidt Posted March 1, 2015 Share Posted March 1, 2015 mv should be working for you. You are right, mv should be working, and as it turns out in most cases it does work for me. I might have been mistaken - haven't run into it since. Sorry to cry wolf. Moving with MC works too (unless you are merging files into existing directories). Thank you for your elaborate note. However, while rsync'ing files (converting disks from ReiserFS to XFS), I ran into a problem with rsync - I apparently had some files with invaild extended attributes, and the way rsync handles that is to... not copy the files at all. So count your directory sizes defore deleting anything from old disks!! Here's what I got when trying to re transfer a folder that turned out smaller on the destination disk: [email protected]:~# rsync -avX /mnt/disk1/Common/* /mnt/disk16/temp/Common/ sending incremental file list rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/MVI_1423.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/MVI_1433.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/MVI_7397.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/MVI_1433.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Jyllingeskole/913_1622_02.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Pilegaardsskolen/913_1620_01.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Pilegaardsskolen/913_1618_02.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Pilegaardsskolen/913_1621_01.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/ekstra/MVI_1423.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5) sent 2,017,059 bytes received 6,978 bytes 15,161.33 bytes/sec total size is 118,985,987,854 speedup is 58,786.47 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0] I can copy those same files with MC no problem. Perhaps your way of doing things would end up in the source disk having everything else deleted and only the problem files left. Idunno. But your method doesn't store the checksum with the files for checking for bitrot later on - except if you run bunker as a separate step. I still think having checksums stored in a file in each directory would be simpler and more robust overall, and it would solve this issue as well. Quote Link to comment
Squid Posted March 1, 2015 Share Posted March 1, 2015 I still think having checksums stored in a file in each directory would be simpler and more robust overall, and it would solve this issue as well. I agree, and has the advantage of being able to be checked through windows. I have a cron job running this script daily. It walks through all the disks and creates a md5 for every file that doesn't already have one, and if the file has been modified (ie - streams removed, etc) since the md5 was created, it will recreate it. I used disk shares to guarantee the .md5 winds up on the same disk as the file regardless of the split settings. Unfortunately, I never made it to handle every file - didn't care. It's only looking for video files. (I really don't care and don't want the clutter from it making md5s for every .nfo), and it takes no parameters - extensions looked for are coded in. #!/bin/sh # Script to create MD5 hashes for only video files # Get All disks installed in system ALLDISK=$(ls /mnt --color="never" | grep "disk") logger "Scanning disks for media files without .md5" # Loop through the disks for DISK in $ALLDISK do DIR="/mnt/$DISK" # Loop throough all the video files on the disk find $DIR -type f -iname "*.mkv" -o -iname "*.ts" -o -iname "*.avi" -o -iname "*.vob" -o -iname "*.mt2s" -o -iname "*.mp4" -o -iname "*.mpg" -o -iname "*.mpeg" -o -iname "*.3gp" -o -iname "*.wmv" | while read FILENAME do MD5FILE="$FILENAME.md5" # Does the MD5 already exist? if [ -e "$MD5FILE" ]; then if [ $(date +%s -r "$FILENAME") -gt $(date +%s -r "$MD5FILE") ]; then cd "${FILENAME%/*}" logger "$FILENAME changed... Updating MD5" md5sum -b "$(basename "$FILENAME")" > /tmp/md5file.md5 mv /tmp/md5file.md5 "$MD5FILE" fi else # CD to the path of the file cd "${FILENAME%/*}" logger "Creating MD5 for $FILENAME" md5sum -b "$(basename "$FILENAME")" > /tmp/md5file.md5 mv /tmp/md5file.md5 "$MD5FILE" fi done done logger "Finished Scanning" Quote Link to comment
tr0910 Posted April 26, 2015 Share Posted April 26, 2015 I have been running the latest bunker on 6b14b and noticed this error on one run. Subsequent runs are fine. bunker -a /mnt/disk3 Scanning for new files... \ awk: cmd. line:1: fatal: division by zero attempted Finished. Added 1 files. I'm also trying to understand the differences between Bunker and jbartlett's Bitrot I find only these differences. Bunker can't, but Bitrot can: Recover lost+found files by matching the SHA key with an exported list bitrot.sh --recover -p /mnt/disk1/lost+found -f /tmp/shakeys_disk1.txt Verify SHA on a specific file bitrot.sh -v -p /mnt/user/Documents -m modifiedword.doc Bitrot can't but Bunker can: bunker -c -f /tmp/disk1_keys.txt check SHA key from user defined input file - no path Are there other differences I've missed?? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.