File Integrity Checking

ajeffco · August 28, 2010

Hello,

I've looked around the forums for this, but didn't see anything. I did see md5deep on the wiki, just haven't installed it yet.

Does unRaid do any integrity checking of the files after they are written to the array? Here's why I ask.

I have been copying my media over to my array. Part of this process has been to clean up duplicates of my stuff. I'm using teracopy to verify that some of the duplicates matched against what had already been copied to the array. One of the pictures on my PC was of a different size than the one on the array. When I opened both, they both looked perfectly fine. So I'm not sure which one is correct .

If unraid doesn't do periodic integrity checking, I was thinking of modifying a script from work that does some integrity checking on files using md5sum (We had a high end storage SAN Storage controller corrupt database exports from a patient registration system, and we found out during DR testing... Fun times). Seems like md5sum checking against files in the array would be easy enough to accomplish with unraid.

Before I re-invent the wheel, I wanted to see if 1) Unraid already did it, 2) Someone else already did it, or 3) someone thought it might not be a good idea for any reason that I don't know of with unRaid.

BRiT · August 28, 2010

Nothing out of the gate, other than what you saw, a md5deep addon for the 4.5 series - http://lime-technology.com/forum/index.php?topic=5906.0

Though something more advanced than that is but one of many items is being worked on in the background by the community (Weebotech) for the unRAID 5 series. Here is one of numerous discussions about it: http://lime-technology.com/forum/index.php?topic=6798.0

ajeffco · August 31, 2010

Thanks for that, I think I need to refine my searching skills .

I'm having a hard time on doing a find and reading in filenames with spaces as a variable or doing anything against them, as the commands are reading them as seeing the whitespaces and reading seperate names as is all unix.

So here's what I have. These is a quick hack of a script that runs fine on AIX and SLES10, so i'm not sure if it's the script itself, or slackware (can't see why that'd be, but who knows).

So, here's what I have so far:

#!/bin/bash
#
# Script to check the integrity of files
#
#
################################################################################
# A space seperated, quoted list of directories that you'd like to have
# checked
DIR_LIST=( '/mnt/disk1/techstuff' )

#echo ${DIR_LIST[*]}
for DIR in ${DIR_LIST[*]}
do
  echo "Now Checking $DIR"
  find $DIR -type f -name '* *' | while read FILENAME
    do
     echo "$FILENAME"
    done
read junk
done

So, the script should stop after each file, and just wait for input on the read. Here's what's coming out though (the ^C is my CTRL-C on the read).

Now Checking /mnt/disk1/techstuff

/mnt/disk1/techstuff/SVC/Procedures for replacing nodes and adding nodes to existing SVC clusters V3R4.pdf

/mnt/disk1/techstuff/TPC/SG24-7725-00 - IBM Tivoli Storage Productivity Center V4.1.pdf

/mnt/disk1/techstuff/XiV/SG247659-01 - IBM XIV Storage System Architecture Implementation and Usage.pdf

/mnt/disk1/techstuff/DS8300/DS8000 overview.pdf

/mnt/disk1/techstuff/Brocade/2498-B80 Front Graphic.PNG

/mnt/disk1/techstuff/Brocade/SG24-6116-09 - Implementing an IBM-Brocade SAN with 8 Gbps Directors and Switches.pdf

/mnt/disk1/techstuff/SVC Performance/TPC Performance Workshop V5.pdf

/mnt/disk1/techstuff/SVC Performance/SVC Performance Analysis with TPC.pdf

/mnt/disk1/techstuff/SVC Performance/Hello and thank Coral Gables.pdf

/mnt/disk1/techstuff/p560 VIO SEA Failover Testing.docx

/mnt/disk1/techstuff/AIXinfo/dev/AIXinfo - Info gathered.htm

/mnt/disk1/techstuff/AIXinfo/AIXinfo Installation Guide.doc

/mnt/disk1/techstuff/AIXinfo/Documentation/AIXinfo - Info gathered.xls

/mnt/disk1/techstuff/AIXinfo/Documentation/AIXinfo Installation Guide.doc

/mnt/disk1/techstuff/AIXinfo/Documentation/Process Flow.vsd

^C

Any advice on how to get around this problem would be much appreciated.

ajeffco · August 31, 2010

It's even more fun with an apostrophe in the name of a file...

Still looking for a way to do this, and soliciting advise .

"find /mnt/disk1 -type f" appears to work fine, but the second I pipe it to another command, the whitespaces and special characters kick in. Still not clear why find | while isn't working. Tried it this morning on an AIX server and my ubuntu workstation, works perfectly fine on files with spaces, didn't try special characters. So it must be something with slackware, I just can't think why.

Joe L. · August 31, 2010

It's even more fun with an apostrophe in the name of a file...

Still looking for a way to do this, and soliciting advise .

"find /mnt/disk1 -type f" appears to work fine, but the second I pipe it to another command, the whitespaces and special characters kick in. Still not clear why find | while isn't working. Tried it this morning on an AIX server and my ubuntu workstation, works perfectly fine on files with spaces, didn't try special characters. So it must be something with slackware, I just can't think why.

To pipe the output of find to another command and have no issues with special characters use something like this:

find /mnt/disk1 -type f -print0 | xargs -0 md5sum

The -print0 option to find causes the find command to pass the output to xargs in a way where it can ignore spaces and special characters since the names are null terminated instead. the "-0" option to xargs causes it to expect that style of input.

Joe L.

WeeboTech · August 31, 2010

It's even more fun with an apostrophe in the name of a file...

Still looking for a way to do this, and soliciting advise .

"find /mnt/disk1 -type f" appears to work fine, but the second I pipe it to another command, the whitespaces and special characters kick in. Still not clear why find | while isn't working. Tried it this morning on an AIX server and my ubuntu workstation, works perfectly fine on files with spaces, didn't try special characters. So it must be something with slackware, I just can't think why.

It's the way the bash shell works.

You can use find with -print0 | xargs -0 -L1

Although you should be able to do

find -exec /bin/ls -l "{}" \;

Replace /bin/ls with your command.

I would have to see the whole context of the find pipe.

here is a rather long script that I started to work on before the sqlite DB method.

It was designed to create symlinks of a whole directory tree into another directory.

Naming the files as the md5sum of the path . md5sum of the file.

Then allow you to run par2cmd on the directory.

The problem I had was the sheer size of the link directory then the length of time for par2cmd with 1TB of data.

I would consistantly get out of memory errors in unRAID so I stopped the project.

In any case here's a shell to get your head around some techniques.

I cannot say it's done. consider it pre-alpha code, but it has some examples

#!/bin/bash

[ ${DEBUG:=0} -gt 0 ] && set -x -v

P=${0##*/}              # basename of program
R=${0%%$P}              # dirname of program
P=${P%.*}               # strip off after last . character



DBDIR="/mnt/cache/.${P}"
CONF=${CONF:=/boot/custom/etc/${P}.conf}


# TMPFILE=/tmp/${P}.$$
TMPFILE=${DBDIR}/${P}.$$



# sed line to delete a line from md5sum file. 
# sed -i -e "#/mnt/disk1/pub/CDR/CDBenchmarks/cddae_progress.gif#d" md5par2db.md5sum
# unused for now, comment for me. 


log()
{
   : ${SYSHOSTNAME:=`hostname`}
   echo "`date '+%b %d %T'` ${SYSHOSTNAME} $P[$$]: " $@
}

process_dir()
{
   DIR="${1}"
   shift

   MAX_MTIME=0
   MAX_MTIME_FILE=""

   new=0 
   failed=0
   verified=0
   files=0
   filecount=0

   find ${DIR} -depth -type f ${@} -print > ${DBDIR}/${P}.findlist

   # Count files in list so log message can show progress.
   while read FILENAME
   do    (( files++ ))
   done  < ${DBDIR}/${P}.findlist

   exec 3<&0                 # Save stdin to file descriptor 3.
                             # Redirect standard input.
   exec 0<"${DBDIR}/${P}.findlist"       

   while read FILENAME
   do 
       # [ -z "${FILENAME%\#*}" ] && continue
       (( filecount++ ))
       log "Processing [$filecount/$files]: ${FILENAME}"
       MD5PATH=`echo "${FILENAME}" | md5sum | sed -e 's# ##g' -e 's#-##g'`
       MD5SUM=`md5sum "${FILENAME}"` # Capture MD5SUM Output.
       MD5="${MD5SUM%% *}"  # Take off ' ' and filename
       # MD5NAME="${MD5PATH}.${MD5SUM}.md5"    # New Linked Filename. 
       MD5NAME="${MD5}.md5"                    # New Linked Filename. 
       if [ ! -L "${DBDIR}/${MD5NAME}" ]
          then echo "${MD5SUM}" >> "${DBDIR}/${P}.md5sum"
               if ln -s "${FILENAME}" "${DBDIR}/${MD5NAME}"
                  then log "    Linked: ${MD5NAME}"
                       (( new++ ))
                  else log "link FAILED: '${FILENAME}' to: '"${DBDIR}/${MD5NAME}"'"
                       (( failed++ ))
               fi
          else log "  Verified: ${MD5NAME}"
               (( verified++ ))
       fi

       # now get mtime, Save highest mtime for later.
       # This can be used for touch -r later.
       # May be useful for doing a find -newer in next run
       MTIME=`stat -L --printf "%Y" "${FILENAME}"`
       if [ $? -eq 0 ]; then 
          if [ ${MTIME} -gt ${MAX_MTIME} ]; then 
             MAX_MTIME=${MTIME}
             MAX_MTIME_FILE="${FILENAME}"
             touch -r ${MAX_MTIME_FILE} "${DBDIR}/${P}.newest"
          fi
       fi
   done 

   exec 0<&-                 # Close stdin.
   exec 0<&3                 # Restore old stdin.

   # set mtime of md5sum file to highest mtime or that of findlist.
   # this can be used later to do a find -newer for incremental updates.
   # touch -r ${MAX_MTIME_FILE}      ${DBDIR}/${P}.md5sum
   # touch -r ${DBDIR}/${P}.findlist ${DBDIR}/${P}.md5sum

   echo "Files:     ${filecount}"
   echo "New:       ${new}"
   echo "Failed:    ${failed}"
   echo "Verified:  ${verified}"

}



[ -f "${CONF}" ] && source "${CONF}"

trap "rm -f ${TMPFILE}" EXIT

export DBDIR=$1
shift

export SCANDIR=$1
shift

if [ -z "${DBDIR}" -o -z "${SCANDIR}" ]
  then echo -e "Creates md5 & par2 db directory from a directory to scan"
       echo -e "Usage: $0 [md5par2db directory path] [directory to scan] [optional args to find]"
       echo -e "  example:  $0 /mnt/cache/.md5par2db/disk1  /mnt/disk1"
       echo -e "            $0 /mnt/cache/.md5par2db/disk2  /mnt/disk2 -name '*.mp3'"
       exit
fi


[ ! -d "${DBDIR}" ] && mkdir -p "${DBDIR}"

process_dir ${SCANDIR} $@

File Integrity Checking

Recommended Posts

ajeffco

Link to comment

BRiT

Link to comment

ajeffco

Link to comment

ajeffco

Link to comment

Joe L.

Link to comment

WeeboTech

Link to comment

Archived