bunker - yet another utility for file integrity checks


Recommended Posts

Introduction

bunker is a file integrity check utility based on the original bitrot utility of jbartlett and the calculated sha keys are compatible with bitrot.  I want to thank jbartlett for bringing out his excellent idea and my version can be seen as an alternative which I initially developed to fulfill my own requirements and now may be useful to others.

 

The purpose of bunker is to save the calculated hash value in the extended attributes of a given file, this allows for regular checking of the integrity of the file content. Note that the original file is never touched or altered by bunker no matter what options are chosen.

 

Different hashing methods are stored under different extended attributes, it is possible to store sha256, md5 and blake2 hashes together with a file, if that is desired.

 

Versions

Version 1.16 verify location of 'notify' script to support unRAID v6.x

 

Version 1.15

  • Fixed execution of export (-e) command
  • Fixed -D option (modified time calculation)
  • Swapped -r and -R commands
  • Added new extended attribute: file size
  • Added display of file name being processed
  • Code optimizations

Version 1.14 logger corrections and code optimization (thx itimpi)

 

Version 1.13 logger improvements (thx itimpi)

 

Version 1.12 added new option -L which allows to log only changes

 

Version 1.11 bug fix, correction in report calculation (thx archedraft)

 

Version 1.10 more comprehensive reporting, and minor bug fix

 

Version 1.9 (minor) bug fixing release

 

Version 1.8 introduced new option -n (notify) which let bunker send alert notifications when file corruption is detected

 

Version 1.7 introduced new filedate attribute. New -t (touch) command. Various improvements.

 

Version 1.6 regression fix in -v command and add missing files display for -c command

 

Version 1.5 introduced new command -U and new option -D

 

Version 1.4 is a bug fixing release. Correct ETA calculation, fix scandate with -C option, sort output files.

 

Version 1.3 has new options -c and -C which allows the checking of the hash values from a previous exported file. This can be used for example when transferring files from one filesystem to another, e.g. from reiserfs to xfs, during this process extended attributes are not copied over, but with the -c (-C) option these can be verified/restored afterwards, make sure to do an export of the hash keys before the file transfer.

 

Version 1.2 uses the already installed utilities sha256sum and md5sum, no external package needs to be installed unless one wants to make use of the new option '-b2'. This uses the blake2 algorithm. A download of the blake2 utility can be obtained from the Blake2 site. Extract the file "b2sum-amd64-linux" rename it "b2sum" and copy it to your server, e.g. to /usr/bin/b2sum.

 

Version 1.1 is the initial release.

 

Usage

The following is from the help of bunker.

bunker v1.16 - Copyright (c) 2015 Bergware International

Usage: bunker -a|A|v|V|u|U|e|t|i|c|C|r|R [-fdDsSlLnq] [-md5|-b2] path [!] [mask]
  -a          add hash key attribute for files, specified in path and optional mask
  -A          same as -a option with implicit export function (may use -f)
  -v          verify hash key attribute and report mismatches (may use -f)
  -V          same as -v option with updating of scandate of files (may use -f)
  -u          update mismatched hash keys with correct hash key attribute (may use -f)
  -U          same as -u option, but only update files which are newer than last scandate
  -e          export hash key attributes to the export file (may use -f)
  -t          touch file, i.e. copy file modified time to extended attribute
  -i          import hash key attributes from file and restore them (must use -f)
  -c          check hash key attributes from input file (must use -f)
  -C          same as -c option and add hash key attribute for files (must use -f)
  -r          remove hash key attribute from specified selection (may use -f)
  -R          same as -r option and remove filedate, filesize, scandate values too (may use -f)

  -f <file>   optional set file reference to <file>. Defaults to /tmp/bunker.store.log
  -d <days>   optional only verify/update/remove files which were scanned <days> or longer ago
  -D <time>   optional only add/verify/update/export/remove files newer than <time>, time = NNs,m,h,d,w
  -s <size>   optional only include files smaller than <size>
  -S <size>   optional only include files greater than <size>
  -l          optional create log entry in the syslog file
  -L          optional, same as -l but only create log entry when changes are present
  -n          optional send notifications when file corruption is detected
  -q          optional quiet mode, suppress all output. Use for background processing
  -md5        optional use md5 hashing algorithm instead of sha256
  -b2         optional use blake2 hashing algorithm instead of sha256

  path        path to starting directory, mandatory with some exceptions (see examples)
  mask        optional filter for file selection. Default is all files
              when path or mask names have spaces, then place names between quotes
              precede mask with ! to change its operation from include to exclude

Examples:
bunker -a /mnt/user/tv                                 add SHA key for files in share tv
bunker -a -S 10M /mnt/user/tv                          add SHA key for files greater than 10 MB in share tv
bunker -a /mnt/user/tv *.mov                           add SHA key for .mov files only in share tv
bunker -a /mnt/user/tv ! *.mov                         add SHA key for all files in share tv except .mov files
bunker -A -f /tmp/keys.txt /mnt/user/tv                add SHA key for files in share tv and export to file keys.txt
bunker -v -n /mnt/user/files                           verify SHA key for previously scanned files and send notifications
bunker -V /mnt/user/files                              verify SHA key for scanned files and update their scandate
bunker -v -d 90 /mnt/user/movies                       verify SHA key for files scanned 90 days or longer ago
bunker -v -f /tmp/errors.txt /mnt/user/movies          verify SHA key and save mismatches in file errors.txt
bunker -u  /mnt/disk1                                  update SHA key for mismatching files
bunker -U  /mnt/disk1                                  update SHA key only for mismatching files newer than last scandate
bunker -u -D 12h /mnt/disk1                            update SHA key for mismatching files created in the last 12 hours
bunker -u -f /tmp/errors.txt                           update SHA key for files listed in user defined file - no path
bunker -e -f /tmp/disk1_keys.txt /mnt/disk1            export SHA key to file disk1_keys.txt
bunker -i -f /tmp/disk1_keys.txt                       import and restore SHA key from user defined file - no path
bunker -c -f /tmp/disk1_keys.txt                       check SHA key from user defined input file - no path
bunker -C -f /tmp/disk1_keys.txt                       check SHA key and add SHA attribute (omit mismatches) - no path
bunker -r  /mnt/user/tv                                remove SHA key for files in share tv
bunker -r -f /tmp/errors.txt                           remove SHA key for files listed in file errors.txt - no path

Look at the examples to make use of the possibilities of bunker.

 

Operation

There are two main ways to use the utility: [1] interactive or [2] scheduled.

 

Interactive

When used in interactive mode the utility can be executed from a telnet session with the given options, and results are made visible on screen. Stopping the utility can be done at any time using 'CTRL-C'.

 

It is also possible to open several telnet sessions and run multiple bunker instances concurrently, e.g. for checking different disks. Most of the time is spent on I/O access and calculating/checking a large disk can be a lengthy process, for example it takes almost 7.5 hours on my system to go through a nearly full 2TB disk.

 

Scheduled

Another way of operation is to create scheduled tasks to do regular file verifications and/or other activities. For example I created the script 'bunker-daily' and copied this file to the folder /etc/cron.daily. It checks for new files, file changes and updates the export file. It will go thru all available disks (thx itimpi).

#!/bin/bash
bunker=/boot/custom/bin/bunker
log=/boot/custom/hash/blake2
var=/proc/mdcmd
day=$(date +%Y%m%d)
array=$(grep -Po '^mdState=\K\S+' $var)
rsync=$(grep -Po '^mdResync=\K\S+' $var)

mkdir -p $log

# Daily check on new files, report errors and create export file
if [[ $array == STARTED && $rsync -eq 0 ]]; then
  for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ; do
    if [[ -e /mnt/disk$i ]]; then
      $bunker -A -D 1 -q -l -f $log/disk$i.$day.new.txt /mnt/disk$i
      $bunker -U -D 1 -q -l -n -f $log/disk$i.$day.bad.txt /mnt/disk$i
      if [[ -s $log/disk$i.export.txt ]]; then
        if [[ -s $log/disk$i.$day.new.txt || -s $log/disk$i.$day.bad.txt ]]; then
          mv $log/disk$i.export.txt $log/disk$i.$day.txt
          $bunker -e -q -l -f $log/disk$i.export.txt /mnt/disk$i
        fi
      else
        $bunker -e -q -l -f $log/disk$i.export.txt /mnt/disk$i
      fi
    fi
  done
fi

This can be combined with a verification script which checks for file corruptions. I do this on monthly bases, but it may be done on weekly basis instead. Copy the file bunker-monthly to /etc/cron.monthly. Note that you need to adjust the script to the number of disks in your system.

#!/bin/bash
bunker=/boot/custom/bin/bunker
log=/boot/custom/hash/blake2
var=/proc/mdcmd
day=$(date +%Y%m%d)
array=$(grep -Po '^mdState=\K\S+' $var)
rsync=$(grep -Po '^mdResync=\K\S+' $var)

mkdir -p $log

# Monthly verification of different group of disks (quarterly rotation)
if [[ $array == STARTED && $rsync -eq 0 ]]; then
  case $(($(date +%m)%4)) in
  0) for i in 1 2 ; do
       $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i &
     done ;;
  1) for i in 3 4 5 ; do
       $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i &
     done ;;
  2) for i in 6 7 8 ; do
       $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i &
     done ;;
  3) for i in 9 10 ; do
       $bunker -v -n -q -l -f $log/disk$i.$day.bad.txt /mnt/disk$i &
     done ;;
  esac
fi

Export / Import

The purpose of export and import is to create and save a copy of the hash keys of the given files (export) which can be restored at later time (import), e.g. after a disk crash keys can be imported to run a file verification afterwards to see if content has been damaged. Export/import is not a file repair method but a mechanism to find corruptions. Other tools need to be used to do any file repair.

 

Download

See the attachment to download the zip file. Copy the file 'bunker' to your flash or other convenient location, and execute it from there. Optionally use the files bunker-daily and bunker-monthly.

 

Extra

Included in the zip file are two additional scripts bunker-update and bunker-verify, provided by courtesy of itimpi. You can place these scripts in the cron.hourly and cron.daily folders respectively and introduce automation of the file checking.

bunker.zip

  • Thanks 1
Link to comment

It was also an excellent idea to make the tools compatible with each other. Prevents against rescanning in the event that people switch from one to the other or decides to use both.

 

I started with your tool and had all my files scanned/hashed. Didn't want to go through the whole timely process again by just changing a tool.  ;)

 

Compatibility is indeed a nice addition when people use one or the other or perhaps both ... Maybe you want to consider your file output as "<hash value>|<full path/filename>|<scandate>" it will also make files interchangeable.

 

I studied your approach of recovery in bitrot, but I am not confident enough myself to build that into my script, this is hard to test for me  ;D

 

 

 

Link to comment

It was also an excellent idea to make the tools compatible with each other. Prevents against rescanning in the event that people switch from one to the other or decides to use both.

 

I started with your tool and had all my files scanned/hashed. Didn't want to go through the whole timely process again by just changing a tool.  ;)

 

Compatibility is indeed a nice addition when people use one or the other or perhaps both ... Maybe you want to consider your file output as "<hash value>|<full path/filename>|<scandate>" it will also make files interchangeable.

 

I studied your approach of recovery in bitrot, but I am not confident enough myself to build that into my script, this is hard to test for me  ;D

 

I plan on adding support for specifying the format of the export, something like "-F hash|filename|scandate" or "-F scandate|filename|hash"

 

The way I tested the recovery was to scan a directory of a few files, copy those files to a different directory, alter the path in the export, and then recover against the copied files.

Link to comment

It was also an excellent idea to make the tools compatible with each other. Prevents against rescanning in the event that people switch from one to the other or decides to use both.

 

I started with your tool and had all my files scanned/hashed. Didn't want to go through the whole timely process again by just changing a tool.  ;)

 

Compatibility is indeed a nice addition when people use one or the other or perhaps both ... Maybe you want to consider your file output as "<hash value>|<full path/filename>|<scandate>" it will also make files interchangeable.

 

I studied your approach of recovery in bitrot, but I am not confident enough myself to build that into my script, this is hard to test for me  ;D

 

I plan on adding support for specifying the format of the export, something like "-F hash|filename|scandate" or "-F scandate|filename|hash"

 

Many tools that allow reformatting of internal values on demand use printf like formatters.

So if an agreed upon set of formatters are used the export can be configured as needed.

 

Think like /bin/date, /bin/stat, even find uses them in the -printf  ala -printf "%h

Example from find man page.

 

%f File's name with any leading directories removed (only the last element).

%h Leading directories of file's name (all but the last ele-ment).

  If the file name contains no slashes (since it is

  in the current directory) the %h  specifier  expands  to "."

 

I'm not saying to use these, I'm suggesting to make it configurable with something like the --exportf argument

--exportf "%H|%F|%D"

 

While the names are more intuitive, these type of formatters are pretty common place.

do /bin/stat --help and /bin/date --help

 

I used this method for internal functions which dump the stat() blocks (grabbed it from /bin/stat), then added a few for hash/filename.

With the tools I've built, since everything links to this function I can dump anything in any format as needed.

 

Another idea is to allow the use of full variables using eval.

 

--exportf '${HASH}|${FILENAME}|${SCANDATE}'

This would allow exposing any variable from the shell in the export as long as it was exported and eval was used to get the definitions into another line.

 

i.e.

eval 'EXPORTLINE="${EXPORTSTRING}"'

printf "${EXPORTLINE}\n"

 

I've used this technique allot, but it also means people have to understand shell quoting, so I wouldn't recommend it.

Link to comment

Weebotech, thanks for your input here, much appreciated, certainly more food for thought :)

 

Though I use "date" and "state" quite regular I always find myself looking at the help information to see which abbreviation is used for what particular item, in other words although their capabilities are very versatile there aren't very intuitive.

 

If an export format option is added I would prefer to use abbreviations which make sense, like your proposed "%H|%F|%D" (one can even argue to make these case insensitive).

 

Exporting of internal variables I find a risky business, using eval you would allow potentially every variable to be exported ... not my choice.

 

Link to comment

As much as the % formatters are not as easily intuitive, they are used in many tools and in c printfs. I learned to live with them.

I wouldn't make them case insensitive as no other tool does that.

 

I have a generic C function to do this that I include and modify in all programs.

I bet if we come up with a general bash function to do the replacements, it will make it easier for both tools.

 

The ability to have a basic hash export that can be immediately used in the source tool is of value when you actually have corruption.

 

From issues seen throughput the forum, when there is corruption the getfattr and setfattr will exasperate that issue and even cause kernel crashes.

You cannot rely on the metadata in all cases.

 

In the heat of a problem, walking someone through parsing the intermediary export file makes it more work then simply archiving the raw hash for input to the hasher.  This is the approach I use with export for those reasons. At the very least, perhaps provide an additional tool to convert the export file to a standard hash type file. I'll probably add the ability to use the % modifiers so the hash/scan date can be exported at will. 

I don't necessarily see the value in exporting it at this time. I would love to understand more about it's value and potential future use. 

Link to comment

Also thinking of what can be a potential usage or benefit for introducing an export format option.

 

Perhaps another approach can be to build into the tool an immediate verification from an import file to a hash utility, it would then basically replace the piped conversion syntax

 

this

 

cut -d'|' -f1,2 hashfile | sed 's/|/ */' | md5sum -c

 

becomes

 

bunker -c -md5 -f hashfile

 

Something similar can be done for bitrot as well of course ...

 

 

Link to comment

Also thinking of what can be a potential usage or benefit for introducing an export format option.

 

Perhaps another approach can be to build into the tool an immediate verification from an import file to a hash utility, it would then basically replace the piped conversion syntax

 

this

 

cut -d'|' -f1,2 hashfile | sed 's/|/ */' | md5sum -c

 

becomes

 

bunker -c -md5 -f hashfile

 

Something similar can be done for bitrot as well of course ...

 

That works, but when the raw tools exist, use them when possible.

i.e. the lowest common denominator is a hash file that can be used by the hasher itself without the need of other tools.

I've had to do this so many times as I'm consolidating a box of 30 hard drives into larger hard drives.

Keeping md5sums files as model-serial.md5sums has been invaluable and I can use it with any of the basic unix tools.

So the ability to export the file to be used by something that already exists has merit.

 

That being said, this quick script provides an quick example for using the meta characters to export data in a configurable manner.

#!/bin/bash

declare -A meta

meta[D]=`date +%s`
meta[F]="/tmp/somefilename.ext"
meta[H]="d41d8cd98f00b204e9800998ecf8427e"

set | grep meta

echo "%H=${meta[H]}"
echo "%F=${meta[F]}"
echo "%D=${meta[D]}"

string="%H|%F|%D"
string="%H *%F"

for k in "${!meta[@]}"
do  echo "key  : ${k}"
    echo "value: ${meta[${k}]}"
    string=${string//%${k}/${meta[${k}]}}
done

echo "$string"

and example output

root@unRAID:/tmp# /boot/bin/bashmetachar.sh     
BASH_SOURCE=([0]="/boot/bin/bashmetachar.sh")
meta=([D]="1419428621" [F]="/tmp/somefilename.ext" [H]="d41d8cd98f00b204e9800998ecf8427e" )
%H=d41d8cd98f00b204e9800998ecf8427e
%F=/tmp/somefilename.ext
%D=1419428621
key  : D
value: 1419428621
key  : F
value: /tmp/somefilename.ext
key  : H
value: d41d8cd98f00b204e9800998ecf8427e
d41d8cd98f00b204e9800998ecf8427e */tmp/somefilename.ext

or

root@unRAID:/tmp# /boot/bin/bashmetachar.sh   
BASH_SOURCE=([0]="/boot/bin/bashmetachar.sh")
meta=([D]="1419428644" [F]="/tmp/somefilename.ext" [H]="d41d8cd98f00b204e9800998ecf8427e" )
%H=d41d8cd98f00b204e9800998ecf8427e
%F=/tmp/somefilename.ext
%D=1419428644
key  : D
value: 1419428644
key  : F
value: /tmp/somefilename.ext
key  : H
value: d41d8cd98f00b204e9800998ecf8427e
d41d8cd98f00b204e9800998ecf8427e|/tmp/somefilename.ext|1419428644

 

 

This was a real quick example. I'm not sure what will happen if the filename itself has these meta characters in it.

Anyway, just some food for thought.

Link to comment

And another point worth considering. How do people import already existing hash/md5sums files?

 

Here's a quick example of what I had in place do this via shell.

It takes quite some time to run on a large filesystem with allot of files, which is why I started to go the .c route.

#!/bin/bash 

[ ${DEBUG:=0} -gt 0 ] && set -x -v

P=${0##*/}              # basename of program
R=${0%%/$P}             # dirname of program
P=${P%.*}               # strip off after last . character

TMPFILE=/tmp/${P}.$$
trap "rm -f ${TMPFILE}; exit" EXIT HUP INT QUIT TERM

i=0

#if [ -z "${1}" ]
#   then echo "Usage: $0 <md5sum hash file>"
#        exit
#fi

if [ ! -z "${1}" ]
   then exec 6<&0 0<${1}
        TIME=`stat -c %Y ${1}`
   else TIME=`date +%s`
fi

while read HASH FILENAME
do    (( i++ ))
      printf "%.6d %s\n" $i "${FILENAME}" >&2
      FILENAME="${FILENAME#\*}"
      setfattr -n user.hash.value  -v ${HASH} "${FILENAME}"
      TIME=`stat -c %Y "${FILENAME}"`
      setfattr -n user.hash.time   -v ${TIME} "${FILENAME}"
done 

if [ ! -z "${1}" ]
   then exec 0<&6 6<&-
fi

Link to comment

@weebo, @bonienl

 

Seeing the 2 of you creating tools for advancing the stability and reliability of our data is the work of giants. We will all thank you for this later. I want to thank you now.

 

Let's include jbartlett in this as well since bitrot is the impetus to doing this.

jbartlett's contributions are invaluable.

Link to comment

Definitely, without bitrot we wouldn't have this conversion!

 

A side note ...

 

It would be good if some "standardization" is used in the extended attributes This will make different tools exchangable.

 

So far bitrot / bunker uses:

 

user.sha256=value

user.scandate=value

 

I have added for bunker in order to support different hashing methods.

 

user.md5=value

user.blake2=value

 

Link to comment

My attributes are configurable on the command line.

They 'default' to

 

user.hash.value=

user.hash.time=

if ( !opt.attrhashvaluename ) {
     sprintf(buf,"user.%s.value", opt.hashbase ? opt.hashbase : "hash" );
     opt.attrhashvaluename = strdup(buf);
  }

  if ( !opt.attrhashtimename ) {
     sprintf(buf,"user.%s.time", opt.hashbase ? opt.hashbase : "hash" );
     opt.attrhashtimename = strdup(buf);
  }

 

If you use an external hasher, that hasher's basename is used in the attribute name such as

--hash-exec '/bin/md5sum -b {}'

 

user.md5sum.value

user.md5sum.time

However each can be overridden with --hash-name --time-name

I use the epoch time in the time value, so it can be used in arithmetic or in date conversions with strftime

Link to comment
  • 1 month later...

I've been using Bunker for a while now - thank you gentlemen! But the -i (import) option isn't working for me, I just get an "Invalid Parameter Specified" error. I have tried

bunker -i -b2 -f /mnt/disk4/disk4blake2.txt /mnt/disk4

 

Same story with -c (check). But -a (adding), -v (verify), -u (update) and -e (export) all worked fine. Are the import and check features simply not implemented yet?

Link to comment

I've been using Bunker for a while now - thank you gentlemen! But the -i (import) option isn't working for me, I just get an "Invalid Parameter Specified" error. I have tried

bunker -i -b2 -f /mnt/disk4/disk4blake2.txt /mnt/disk4

 

Same story with -c (check). But -a (adding), -v (verify), -u (update) and -e (export) all worked fine. Are the import and check features simply not implemented yet?

 

With the import and check options only a file reference is given, no folder (this information is already in the file to be imported/checked). So syntax simply becomes :

 

bunker -i -b2 -f /mnt/disk4/disk4blake2.txt

Link to comment

I've been using Bunker for a while now - thank you gentlemen! But the -i (import) option isn't working for me, I just get an "Invalid Parameter Specified" error. I have tried

bunker -i -b2 -f /mnt/disk4/disk4blake2.txt /mnt/disk4

 

Same story with -c (check). But -a (adding), -v (verify), -u (update) and -e (export) all worked fine. Are the import and check features simply not implemented yet?

 

With the import and check options only a file reference is given, no folder (this information is already in the file to be imported/checked). So syntax simply becomes :

 

bunker -i -b2 -f /mnt/disk4/disk4blake2.txt

 

Thank you. Makes sense.

 

A problem I have noticed with using this tool is if I rename a folder, all files within that folder loose their extended attribute and thus their hash. This is a problem if you want to rsync your files to a new drive in a temp location and then later rename folders after files have been verified (this avoids having duplicate files while the transfer&verification is ongoing).

 

There are a couple of workarounds, one of which is

1) Generate hashes on old (reiserfs) drive (bunker -a /mnt/disk1)

2) Copy to new (XFS) disk (rsync -avX /mnt/disk1 /mnt/disk2/temp

3) Verify files (bunker -v /mnt/disk2/temp

4) Export hashes from temp location (bunker -e -f /mnt/cache/disk2temp.txt /mnt/disk2)

5) Manually edit hash file to replace '/mnt/disk2/temp/' with '/mnt/disk2/'

6) Move files from temp to final location (mv /mnt/disk2/temp/* /mnt/disk2 or something along those lines)

7) Re-import hashes (bunker -i /mnt/cache/disk2temp.txt)

 

However, it is also a problem when you want to reorganize your media library and rename/move many folders around.

 

I'd love to hear of a solution that is more elegant than exporting hashes to one big file, then manually find&replace paths and then manually re-importing.

 

I would also love it if bunker had the capability to store hash files per directory instead of the whole ext-attrib thing. This would elegantly avoid the above problem. It would also make it possible to generate hashes on the server, but from time to time verify a file over the network (with, say the corz tool). Is this a feature request that can be considered? Thanks again!

 

Link to comment

A problem I have noticed with using this tool is if I rename a folder, all files within that folder loose their extended attribute and thus their hash. This is a problem if you want to rsync your files to a new drive in a temp location and then later rename folders after files have been verified (this avoids having duplicate files while the transfer&verification is ongoing).

 

There are a couple of workarounds, one of which is

1) Generate hashes on old (reiserfs) drive (bunker -a /mnt/disk1)

2) Copy to new (XFS) disk (rsync -avX /mnt/disk1 /mnt/disk2/temp

3) Verify files (bunker -v /mnt/disk2/temp

4) Export hashes from temp location (bunker -e -f /mnt/cache/disk2temp.txt /mnt/disk2)

5) Manually edit hash file to replace '/mnt/disk2/temp/' with '/mnt/disk2/'

6) Move files from temp to final location (mv /mnt/disk2/temp/* /mnt/disk2 or something along those lines)

7) Re-import hashes (bunker -i /mnt/cache/disk2temp.txt)

 

However, it is also a problem when you want to reorganize your media library and rename/move many folders around.

 

I'd love to hear of a solution that is more elegant than exporting hashes to one big file, then manually find&replace paths and then manually re-importing.

 

The problem is in using mv and not rsync for the second restore.

consider the following instead of move.

 

rsync --remove-source-files -avPX source dest

Link to comment

I would also love it if bunker had the capability to store hash files per directory instead of the whole ext-attrib thing. This would elegantly avoid the above problem. It would also make it possible to generate hashes on the server, but from time to time verify a file over the network (with, say the corz tool). Is this a feature request that can be considered? Thanks again!

 

I've suggested the ability to export files to an md5sums type folder.hash file per directory for the exact reason described.

I believe the current export file can be run through sed & cut to create one but not on a per directory basis.

i.e. it may traverse from a provided root, down the tree, but I'm not sure if it will use relative paths in the file without some work.

Link to comment

The problem is in using mv and not rsync for the second restore.

consider the following instead of move.

 

rsync --remove-source-files -avPX source dest

 

Hmmm, on my system, even when source and dest is on the same physical disk, the above results in reading and re-writing all the files, instead of simply renaming the top level dirs. Takes a long time and defeats the purpose of not deleting the source files until after the destination files have been verified. Am I missing something?

Link to comment

 

I'm not sure what's going on with the environment.

 

rsync works fine for me, but yes it copies the file and removes the source before it's been verified, yet it's already been verified.

i.e. moving from /mnt/disk2/temp to /mnt/disk2 shouldn't require it to be verified again.

 

In any case, I thought it odd about mv not preserving extended attributes.

I remember seeing that it did in source code.

In my most recent test with mv extended attributes are preserved.

 

root@unRAIDb:/mnt/disk2# pwd
/mnt/disk2

declare -a MD5=$(md5sum folder.hash)
setfattr -n user.hash -v ${MD5[0]} folder.hash

root@unRAIDb:/mnt/disk2# getfattr -d folder.hash
# file: folder.hash
user.hash="65b69d37c3d3f8cccce56a6f4ac7d49a"

mkdir tmp
mv folder.hash tmp
cd tmp
root@unRAIDb:/mnt/disk2/tmp# pwd
/mnt/disk2/tmp

root@unRAIDb:/mnt/disk2/tmp# getfattr -d folder.hash
# file: folder.hash
user.hash="65b69d37c3d3f8cccce56a6f4ac7d49a"

root@unRAIDb:/mnt/disk2/tmp# mkdir /mnt/disk1/tmp

root@unRAIDb:/mnt/disk2/tmp# mv folder.hash /mnt/disk1/tmp
root@unRAIDb:/mnt/disk2/tmp# getfattr -d /mnt/disk1/tmp/folder.hash 
getfattr: Removing leading '/' from absolute path names
# file: mnt/disk1/tmp/folder.hash
user.hash="65b69d37c3d3f8cccce56a6f4ac7d49a"

 

I wouldn't use rsync to move from /mnt/disk2/temp to /mnt/disk2 however your statement said,'something along those lines'.

 

After re-verifying, I would use mv.

 

However when going disk to disk or disk to disk on another system I would use rsync.

 

Usually what I do is

rsync -avPX source rsync://host/path

 

after that is done I do it again with the -rc instead of -a.

 

rsync -rcvPX source rsync://host/path

 

This does a checksum comparison second time around instead of mtime/size.

 

After that I'll do a third one with the --remove-source-files

 

rsync --remove-source-files -rcvPX source rsync://host/path

 

You can probably eliminate the second step if you do the hash verification from bunker. This is just how I do it though.

 

 

mv should be working for you. I would double check that the drives are mounted with extended attributes.

Link to comment
  • 3 weeks later...

mv should be working for you.

 

You are right, mv should be working, and as it turns out in most cases it does work for me. I might have been mistaken - haven't run into it since. Sorry to cry wolf. Moving with MC works too (unless you are merging files into existing directories). Thank you for your elaborate note.

 

However, while rsync'ing files (converting disks from ReiserFS to XFS), I ran into a problem with rsync - I apparently had some files with invaild extended attributes, and the way rsync handles that is to... not copy the files at all. So count your directory sizes defore deleting anything from old disks!!

 

Here's what I got when trying to re transfer a folder that turned out smaller on the destination disk:

 

root@FileServer:~# rsync -avX /mnt/disk1/Common/* /mnt/disk16/temp/Common/
sending incremental file list
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/MVI_1423.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/MVI_1433.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/MVI_7397.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/MVI_1433.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Jyllingeskole/913_1622_02.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Pilegaardsskolen/913_1620_01.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Pilegaardsskolen/913_1618_02.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/Pilegaardsskolen/913_1621_01.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)
rsync: get_xattr_data: lgetxattr(""/mnt/disk1/Common/MindBlowing/Rasmus film/Mindblowing transformation (1)/ekstra/MVI_1423.MOV"","user.com.dropbox.attributes",159) failed: Input/output error (5)

sent 2,017,059 bytes  received 6,978 bytes  15,161.33 bytes/sec
total size is 118,985,987,854  speedup is 58,786.47
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

 

I can copy those same files with MC no problem.

 

Perhaps your way of doing things would end up in the source disk having everything else deleted and only the problem files left. Idunno. But your method doesn't store the checksum with the files for checking for bitrot later on - except if you run bunker as a separate step.

I still think having checksums stored in a file in each directory would be simpler and more robust overall, and it would solve this issue as well.

 

Link to comment

I still think having checksums stored in a file in each directory would be simpler and more robust overall, and it would solve this issue as well.

I agree, and has the advantage of being able to be checked through windows.  I have a cron job running this script daily.  It walks through all the disks and creates a md5 for every file that doesn't already have one, and if the file has been modified (ie - streams removed, etc) since the md5 was created, it will recreate it.  I used disk shares to guarantee the .md5 winds up on the same disk as the file regardless of the split settings.  Unfortunately, I never made it to handle every file - didn't care.  It's only looking for video files.  (I really don't care and don't want the clutter from it making md5s for every .nfo), and it takes no parameters - extensions looked for are coded in.

 

#!/bin/sh
# Script to create MD5 hashes for only video files


# Get All disks installed in system

ALLDISK=$(ls /mnt --color="never" | grep "disk")
logger "Scanning disks for media files without .md5"

# Loop through the disks

for DISK in $ALLDISK
do
DIR="/mnt/$DISK"

# Loop throough all the video files on the disk

find $DIR -type f -iname "*.mkv" -o -iname "*.ts" -o -iname "*.avi" -o -iname "*.vob" -o -iname "*.mt2s" -o -iname "*.mp4" -o -iname "*.mpg" -o -iname "*.mpeg" -o -iname "*.3gp" -o -iname "*.wmv" | while read FILENAME
do
	MD5FILE="$FILENAME.md5"

# Does the MD5 already exist?

	if [ -e "$MD5FILE" ];
	then
		if [ $(date +%s -r "$FILENAME") -gt $(date +%s -r "$MD5FILE") ];
		then
			cd "${FILENAME%/*}"

                        logger "$FILENAME changed... Updating MD5"
        	                md5sum -b "$(basename "$FILENAME")" > /tmp/md5file.md5
                	        mv /tmp/md5file.md5 "$MD5FILE"
		fi
	else


# CD to the path of the file

		cd "${FILENAME%/*}"

		logger "Creating MD5 for $FILENAME"
		md5sum -b "$(basename "$FILENAME")" > /tmp/md5file.md5
		mv /tmp/md5file.md5 "$MD5FILE"
	fi

done
done
logger "Finished Scanning"


Link to comment
  • 1 month later...

I have been running the latest bunker on 6b14b and noticed this error on one run.  Subsequent runs are fine.

 

bunker -a /mnt/disk3
Scanning for new files... \ awk: cmd. line:1: fatal: division by zero attempted
Finished. Added 1 files.

 

I'm also trying to understand the differences between Bunker and jbartlett's Bitrot

 

I find only these differences.

 

Bunker can't, but Bitrot can:

 

Recover lost+found files by matching the SHA key with an exported list
  bitrot.sh --recover -p /mnt/disk1/lost+found -f /tmp/shakeys_disk1.txt
Verify SHA on a specific file
  bitrot.sh -v -p /mnt/user/Documents -m modifiedword.doc

 

Bitrot can't but Bunker can:

 

bunker -c -f /tmp/disk1_keys.txt                          check SHA key from user defined input file - no path

 

Are there other differences I've missed??

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.