MD5 Scripting help!


TheDragon

Recommended Posts

Just done a little reading on the AIDE homepage - sounds really interesting!

 

Sounds similar to the system you'd setup yourself through scripts.  Is there a key difference between the two that I'm missing? I'm sure there must be, otherwise you wouldn't have gone to the trouble of writing all those scripts  ;)

Link to comment
  • Replies 63
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

I plan on putting my monthly MD5 files through BeyondCompare to see if any changes have occurred that were not caused by any file operations I know about (bit rot).  Then delete the oldest.  As long as I do this regularly every couple of months I should see and maybe be able to correct any problems - re-record or re-download the files.

 

That was phase 2 of my plan! The reason I chose to use MD5deep was as it seems to have an audit mode...  My thought process was that if I could cobble together a script to create the MD5 files, I could probably also cobble together another to automate running a comparison and emailing details of any files that didn't match.

 

Having read about your script (Weebo) that was lost, the ability to create the MD5's only for new files, and merge them into existing MD5 files sounds really interesting and a big time saver.  I can see I'm going to have to try and teach myself some more about scripting in Linux!  ;)

 

Can anyone recommend any good literature regarding scripting under Linux?

Link to comment

In my case, primarily so that in the event of a parity sync error, I can compare array disk contents with MD5 checksum files and determine whether the parity disk, or whether an array disk is the cause of a sync error. Rather than just blindly correcting my parity disk, and hoping!

Link to comment

Why do you want a script to create MD5 checksums of each of array disks. What purpose?

 

Yesterday morning I had a data drive red ball on me when I started a parity check.  I did the usual checks (generated a smart report, reseated drive in slot, reseated cable) but otherwise found nothing obviously wrong with the drive.  This drive has not had problems before.  All data on this drive is static and I have backups with the same identical data on them.  And I had a check-sum file for the disk.

 

I restored the data from parity back to the same data disk.  Then I ran my MD5 check-sum file against the restored data disk.  Everything came up clean (no check-sum errors).  I am now running a parity check (correction off) on the system.  So far everything is coming up clean.

 

I also run it (the check-sum file) against the backup drives to check their integrity.  By doing this on a periodic basis, this forces the drive to do a read on every byte that is in-use on it.  If there is a problem, I will know the exact file that has a problem and can take corrective measures.

 

Link to comment

Can anyone recommend any good literature regarding scripting under Linux?

 

I learn by example and there are plenty of examples on the net.

 

My first and most comprehensive book was

"The New KornShell Command And Programming Language"

 

This showed me that shell could be quite a powerful language.

 

While we don't have ksh on unRAID, it was a good book for the time.

If you can find a used copy cheap, it's worth the read.

 

If I were starting today I would start with.

 

Learning the bash Shell

bash Cookbook

 

These would be a good staple.

 

I might add, but I've never read them, so I would suggest visiting a book store first.

Classic Shell Scripting

Wicked Cool Shell Scripts.

 

The Oreilly learning books are good to start, the cook books contain recipes and scriplets which could be useful as chunklets or to learn by example. 

 

However, I learned allot about exec and co-processes on the net. There's plenty of information out there.

Read everyone's example, ask questions.

 

Link to comment

Why do you want a script to create MD5 checksums of each of array disks. What purpose?

 

In my case. twofold.

Since I can't utilize cache_dir, I create md5sum files of all the disks.

Then I can use egrep -i filename filelist.disk* to look for a file across all the disks.

Cheap man's locate.

 

Then if I have any issues with the disk, I can use md5sum to double check if I have corruption or not.

 

If I loose a disk totally. At least I'll know what was on it and what I might have to recover.

Link to comment

This was the start of something I was working on.

I had gone far past this, but it's the only version I was able to recover.

 

You'll see how I read the output of find as a file using exec to set file descriptors.

I grab the most recent mtime from the stat command.

Then at the end set the md5sum to the mtime of the newest file.

 

This was to allow me to use the find -newer option to find all the newest files since the last time I ran the md5sum.

I found a fancy way of doing adds and deletes but I gave up doing in shell.

 

I had planned to use sqlite to store the information.

Found a cool way to make sqlite and a couple key routines bash loadables and I've been waiting for V5 to be stable and released before I continue.

 

#!/bin/bash

if [ -z "${1}" ]
   then echo "usage: $0 filenamestem dir"
        exit
fi

FILELIST="$1"
shift

[ -z "${@}" ] && exec $0 "${FILELIST}" .

HOSTNAME="`hostname`"

find ${@} -type f > "${FILELIST}"
sort < "${FILELIST}" > "${FILELIST}.$$"
mv "${FILELIST}.$$" "${FILELIST}"

exec 3>&1
exec 1>"${FILELIST}.md5sum"
exec 0<"${FILELIST}"

while read FILENAME
do
      echo "Processing: ${FILENAME}" >&2
      MTIME=`stat -c "%Y" ${FILENAME}`
      if [ ${MTIME} -gt ${MTIMENEWEST:=0} ]
         then   export MTIMENEWEST=${MTIME}
                export FILENEWEST=${FILENAME}
      fi
      nice md5sum ${FILENAME}
done

exec 1>&3
exec &3>-

touch -r "${FILENEWEST}" "${FILELIST}" "${FILELIST}.md5sum"
ls -l    "${FILENEWEST}" "${FILELIST}" "${FILELIST}.md5sum"

Link to comment

Seems like quite a few people have this idea for binary file verification.

 

I'm working on one too that utilizes SHA256 and sqlite3 to provide an inventory of your files. It defaults to comparing the file size & date but you have the option of telling it to store the SHA checksum as well.

 

I decided to use SHA256 instead of MD5 because MD5 has been proven to have hash conflicts. It's likely fine for a sanity check but I wanted as close to zero chance of a hash conflict as possible.

 

Add/Update or Verify by specified path (with optional SHA verification)

Add new files only by path

Optional file name filter

Include SHA with inventory

Remove missing files from the database

Link to comment

I did the same thing with sqlite3 and also gdbm.

 

I like GDBM cuz it's self contained.

I like sqlite3 because it provides more and you have a better interface.

 

I found code on the net letting you use gdbm and sqlite3 within the bash shell as a loadable module.

Cool stuff.

 

I had written a modified strftime as a loadable and was working on an md5 loadable to allevaite all the forking needed to do this in bash.

 

While doing strftime/date/stat/md5sum via command isn't all that bad. When you multiply it 7 million files. It can take a really long time.

 

My goal was to use the md5 (even though there are possible collisions).

 

The hash is a verification of the file, a sanity check. I was using the full path of the file as a key.

I also stored, mtime, size device and inode.

There was a time field for last time the md5 was verified.

From there you could auto calculate some duration to determine when a file should be checked again.

 

The reason was to

1. sanity check if the file changed.

2. If you had corruption and the file was moved to lost+found with some unrecognizable name, you had some information in your table to "try" and determine what it was and where it lived.

 

The other reason for the sqlite3 table was to use it to locate files using regular expressions like the locate command.

 

Finally I had the goal of being able to export the table as an md5sum file so you could just use the md5sum command on the file to verify your files should they exist on some backup somewhere.

 

Link to comment

The only real drawback I've experienced with sqlite3 is that it's slow if you're invoking it for every single file. At the start, I export the files in the given path to a file and I then grep that file for the file (with path) and the hash. When adding files, I build a file containing all of the SQL statements and execute them all at once.

Link to comment

The only real drawback I've experienced with sqlite3 is that it's slow if you're invoking it for every single file. At the start, I export the files in the given path to a file and I then grep that file for the file (with path) and the hash. When adding files, I build a file containing all of the SQL statements and execute them all at once.

 

It's the journaling.

Do it on the cache drive it's faster.

Do it on a SSD it's even faster.

Do it on the ramdrive, it's lickety split.

 

Considering how long it takes to do an md5sum on a big file, the time to add a record to sqlite isn't all that bad.

 

But yes, I experienced the delays too.

 

This is why I considered using a GDBM file. I use them every where and it's easy to dump or access a key.

 

 

There was a cool co-process feature that php-syslog- ng or logzilla now did.

 

It made a fifo, there was a shell that read the fifo for sql commands.

syslog-ng would then format the sql command and write it to the fifo.

 

It was good for the output writes. Worked well for what it was.

Link to comment

Can anyone recommend any good literature regarding scripting under Linux?

 

I learn by example and there are plenty of examples on the net.

 

My first and most comprehensive book was

"The New KornShell Command And Programming Language"

 

This showed me that shell could be quite a powerful language.

 

While we don't have ksh on unRAID, it was a good book for the time.

If you can find a used copy cheap, it's worth the read.

 

If I were starting today I would start with.

 

Learning the bash Shell

bash Cookbook

 

These would be a good staple.

 

I might add, but I've never read them, so I would suggest visiting a book store first.

Classic Shell Scripting

Wicked Cool Shell Scripts.

 

The Oreilly learning books are good to start, the cook books contain recipes and scriplets which could be useful as chunklets or to learn by example. 

 

However, I learned allot about exec and co-processes on the net. There's plenty of information out there.

Read everyone's example, ask questions.

 

Thanks so much for the recommendations Weebo! I'll be sure to check them out  ;)

Link to comment

The script was supposed to run according to the schedule today... sadly no luck!  :(

 

Although the script runs fine when executed from the console, when it's started by crontab it seems to fail for some reason

 

Jul 2 17:00:08 Atlas crond[1102]: exit status 127 from user root /sbin/md5hash.sh 1>/dev/null 
Jul 2 17:00:08 Atlas crond[26357]: unable to exec /usr/sbin/sendmail: cron output for user root /sbin/md5hash.sh 1>/dev/null to /dev/null 

 

From reading online, 'exit status 127' seems to mean command not found... I'm a little confused though how the command can be found when the script is executed from the console, but not via crontab!

 

Does anyone have any ideas?  :)

Link to comment

The script was supposed to run according to the schedule today... sadly no luck!  :(

 

Although the script runs fine when executed from the console, when it's started by crontab it seems to fail for some reason

 

Jul 2 17:00:08 Atlas crond[1102]: exit status 127 from user root /sbin/md5hash.sh 1>/dev/null 
Jul 2 17:00:08 Atlas crond[26357]: unable to exec /usr/sbin/sendmail: cron output for user root /sbin/md5hash.sh 1>/dev/null to /dev/null 

 

From reading online, 'exit status 127' seems to mean command not found... I'm a little confused though how the command can be found when the script is executed from the console, but not via crontab!

 

Does anyone have any ideas?  :)

is it execuitable?

If not, you need to make it so.

chmod +x /sbin/md5hash.sh

 

Joe L.

Link to comment

The script was supposed to run according to the schedule today... sadly no luck!  :(

 

Although the script runs fine when executed from the console, when it's started by crontab it seems to fail for some reason

 

Jul 2 17:00:08 Atlas crond[1102]: exit status 127 from user root /sbin/md5hash.sh 1>/dev/null 
Jul 2 17:00:08 Atlas crond[26357]: unable to exec /usr/sbin/sendmail: cron output for user root /sbin/md5hash.sh 1>/dev/null to /dev/null 

 

From reading online, 'exit status 127' seems to mean command not found... I'm a little confused though how the command can be found when the script is executed from the console, but not via crontab!

 

Does anyone have any ideas?  :)

is it execuitable?

If not, you need to make it so.

chmod +x /sbin/md5hash.sh

 

Joe L.

 

Thanks for that Joe, I've just checked by doing an 'ls -l' in /sbin/' :

 

-rwxrwxrwx 1 root root    281 2013-07-02 16:34 md5hash.sh*

 

I believe I'm right in saying the asterisk on the end of the file name indicates its executable?

 

 

 

Does anyone have any further ideas?

 

I've included a copy of the script here, in case there is an error in that somewhere that is causing this:

 

# Script to Create MD5 Hashes of Data Files on Array Disks (Monthly)
#!/bin/bash 
dt=$(date +"%Y-%m-%d")
find /mnt/ -type d -maxdepth 1 -name disk* -print | while read DIR
do   DISK="${DIR##*/}"
     cd /mnt/
     md5deep -r ${DISK} > /mnt/cache/Backups/MD5_${DISK}_${dt}.txt
done

Link to comment

Try putting the

 

#!/bin/bash

 

as the first line.

I cannot say this is the reason, but I remember seeing lots of code that inspects the first line for the shell to run.

 

Thanks for the suggestion Weebo! I've just tried re-running the script via crontab, without my commented out first line.  I also removed an extra space after #!/bin/bash

 

Unfortunately it is still failing with the same error as before  :-\

Link to comment

Thanks for taking the time to take a look Weebo.

 

# Run hourly cron jobs at 47 minutes after the hour:
47 * * * * /usr/bin/run-parts /etc/cron.hourly 1> /dev/null
#
# Run daily cron jobs at 4:40 every day:
40 4 * * * /usr/bin/run-parts /etc/cron.daily 1> /dev/null
#
# Run weekly cron jobs at 4:30 on the first day of the week:
30 4 * * 0 /usr/bin/run-parts /etc/cron.weekly 1> /dev/null
#
# Run monthly cron jobs at 4:20 on the first day of the month:
20 4 1 * * /usr/bin/run-parts /etc/cron.monthly 1> /dev/null
# Scheduled Parity Check
0 8 1 * * /root/mdcmd check NOCORRECT 1>/dev/null 2>&1
# System data collection - poll every minute for 2 seconds
*/1 * * * * /usr/lib/sa/sa1 2 1 & >/dev/null
# list file locations at 9:AM every day:
0 9 * * * /sbin/filelocation.sh 1>/dev/null
# backup flash drive full at 9:AM every Monday:
0 9 * * 1 /sbin/flashbackupfull.sh 1>/dev/null
# backup flash drive config at 9:AM every day:
0 9 * * * /sbin/flashbackupconfig.sh 1>/dev/null
# Create hash of data files on array disks Monthly on 2nd at 9:AM
0 12 3 * * /sbin/md5hash.sh 1>/dev/null
# shutdown server if disk temperatures get too high:
*/5 * * * * /usr/local/sbin/overtemp_shutdown.sh 1>/dev/null 2>&1

Link to comment

Looks fine.

 

Make sure there are no special or unprintable characters in the cron table entry and also on the first line of the shell.

Make sure it's not a dos ^M terminated file.

 

Just given this a try, I opened both files in gvim, set the file format to unix, resaved and retested..  still no joy unfortunately  :(

 

The only other thing I've noticed, is that the text files appear in the location specified in the script, with the correct names, but are all empty - not sure if this gives any clues! I can only think it means its failing on either the 'cd' command, or the 'md5deep' command - why though is still a mystery to me!! Am finding this baffling given the script runs perfectly from the console.

 

 

 

 

Link to comment

Looks fine.

 

Make sure there are no special or unprintable characters in the cron table entry and also on the first line of the shell.

Make sure it's not a dos ^M terminated file.

 

Just given this a try, I opened both files in gvim, set the file format to unix, resaved and retested..  still no joy unfortunately  :(

 

The only other thing I've noticed, is that the text files appear in the location specified in the script, with the correct names, but are all empty - not sure if this gives any clues! I can only think it means its failing on either the 'cd' command, or the 'md5deep' command - why though is still a mystery to me!! Am finding this baffling given the script runs perfectly from the console.

 

use the full path to the md5deep command.

 

If md5deep is not found via the path, the shell will still create empty output files.

Another other choice is to put md5deep somewhere in the path that cron can access.

 

I would probably change the cron line itself to

 

/sbin/md5hash.sh 2>&1 | logger -tmd5hash.sh[$$] -puser.info

 

This will capture stdout and stderr to syslog with the tag md5hash.sh

If there is anything unexpected, it will be in your syslog.

 

Another choice is to redirect the output of the script directly with the following lines at the top of the script.

 

exec 1>/var/log/md5hash.log

exec 2>&1

Link to comment

Thank you very much for your help Weebo - very much appreciated!

 

Just wanted to let you know, after adding the full path to md5deep the script has started from crontab without an error!!  ;D

 

Now just have to wait and see how long it'll take to run on all my disks!  ;)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.