Simple MD5 script


Recommended Posts

This is a response to another thread regarding my MD5 script (which I am not sure I am the original creator of, or if I found parts of it somewhere).

http://lime-technology.com/forum/index.php?topic=34988.msg338408#msg338408

 

This post describes the content of the attached md5-script.

 

 

Your welcome Stewart, and I'm glad it could help.

 

It is correct that the program initialy writes hash values to

HASHDIR=/hash

Thats the root of the filesystem which resides in memory and is lost on reboot.

 

The program writes the hashes to one file pr disk called eg

MD5_2014-11-07_disk1.md5

and it also writes timestamps to a separate file for each disk.

 

The timestamp is just a crude 'ls' to get the modification time of files, so in case a file changes I can check the md5's and their timestamp. I expect a timestamp to have changed if the file content has.

 

After generating hashes it zips everything together and archives it in /boot/hash (on the flash drive)

zip -jm /boot/hash/md5s-${dt}.zip $HASHDIR/*

The archieve is name with a date, so old achieves are kept.

 

That's it. No fancy skipping files that didn't change since last, or validation of checksums being equal to last. For me its meant for giving me extra information in case of emergency.

 

I run the script by creating an run_md5 script in /etc/cron.monthly/. Remember to setup some copying of the run-script to this location on each boot. Personally I recursively copy /boot/custom to / on each boot, by placing this in my go-file. I also have some /home/admin folder in my custom dir, hence the chown admin lines.

 

cp  --preserve=timestamps -R /boot/custom/* /

ln -s /home/admin /admin

chown -R admin:users /home/admin

chmod -R 0755 /home/admin

chown -R nobody:users /nobody

chmod -R 0755 /nobody

 

One more thing I forgot: I set the monthly crontab to run on the 7.th not the 1.st to avoid clash with my monthly parity check. I do that by loading my custom root-crontab file on boot:

crontab -l > /boot/config/myconfig/crontab_atLastBoot

crontab /boot/config/myconfig/mycrontab

 

Best Alex

md5_array.zip

Link to comment

Thank you, Alex - that's a great bit of documentation!

 

I like your use case - monthly hashes to guard against silent corruption and bit rot.  I seem to have survived the data corruption bug earlier this year, but I welcome a strategy to guard against undetectable corruptions in the future.

 

Here's my current use case - I'm migrating my 2TB array volumes from ReiserFS to XFS.  I'm using the checksum program (by corz) from Windows to validate each step before I wipe and convert my next data drive.  My process takes about 7 hours per drive to compute the hashes because it has to ship the data over the network.  I expect building hashes directly on the unRAID server would be faster.

 

I noticed one difference between your method and my Windows-based method.  The checksum program I've been using appears to add the '-b' option to consider each file as binary rather than text, which was shown in the output as an asterisk before the path/filename.  Do you think there would there be any benefit to adding that option to your script?  I suppose the md5sum utility included in unRAID may automatically handle file types without needing this option.

 

Thanks again for sharing your knowledge and your effort on this front with the unRAID community!

 

-- stewartwb

Link to comment

Ah so that's what the star means on md5-files generated on windows, I often wondered about that difference.

 

According to the documentation it makes no difference on linux, but may make a difference on windows as far as I can tell.

 

I have enabled binary mode in attached script.

 

Best Alex

 

This is great... I'm able to interpret and modify the script to meet my use case, although I know it's not elegant.

 

I created a script for each disk, which I can run when I'm ready to build the hashes to validate the data migration after each file copy operation.

 

Here's what I came up with to grab hashes immediately for disk3, though I should make the volume a command line parameter instead.

 

#!/bin/bash
find /mnt/disk3 -type f -exec md5sum -b {} \; > /hash/MD5_$(date +"%Y-%m-%d")_disk3.md5
find /mnt/disk3 -type f -exec ls -lc {} \; > /hash/TS_$(date +"%Y-%m-%d")_disk3.txt

 

I could even simplify further by dispensing with the date stamps.

#!/bin/bash
find /mnt/disk3 -type f -exec md5sum -b {} \; > /hash/MD5_disk3.md5
find /mnt/disk3 -type f -exec ls -lc {} \; > /hash/TS_disk3.txt

 

Again, thanks for your help!

 

-- stewartwb

Link to comment

This

find /mnt/disk3 -type f -exec ls -lc {} \; > /hash/TS_$(date +"%Y-%m-%d")_disk3.txt

Can probably be adjusted to this.

find /mnt/disk3 -type f -ls > /hash/TS_$(date +"%Y-%m-%d")_disk3.txt

 

find has a -ls parameter that does an internal ls -l

Example:

    19   32 -rwxr-xr-x   1 root     root        31110 Sep 24 10:13 /mnt/disk3/filedb/bin/gdbm
    29    4 -rwxrw-rw-   1 root     root          701 Oct 27 20:49 /mnt/disk3/filedb/bin/setfattr_md5sums
    21    4 -rwxrw-rw-   1 root     root         1008 Oct 26 09:04 /mnt/disk3/filedb/bin/gdbm_md5sum_delete.sh
    17  721 -rwxr-xr-x   1 root     root       735690 Sep 17 23:06 /mnt/disk3/filedb/bin/b2sum
    24    4 ---xrw-rw-   1 root     root         1112 Oct 26 09:27 /mnt/disk3/filedb/bin/gdbm_md5sum_insert.sh
    23    4 -rwxrw-rw-   1 root     root          743 Oct 26 09:28 /mnt/disk3/filedb/bin/gdbm_md5sum_import.sh
    20   40 -rwxr-xr-x   1 root     root        39505 Sep 24 09:38 /mnt/disk3/filedb/bin/gdbm.so
  3965   40 -rwxr-xr-x   1 root     root        39524 Nov  2 19:14 /mnt/disk3/filedb/bin/ftwgdbm
    27   20 -rwxr-xr-x   1 root     root        20106 Sep 24 09:39 /mnt/disk3/filedb/bin/md5.so
    30   16 -rwxr-xr-x   1 root     root        13023 Sep 24 09:39 /mnt/disk3/filedb/bin/strftime.so

 

also I think this is more useful for programmatic purposes.

find /mnt/disk3 -type f -fprintf /tmp/outputfile.txt '%Ts %s %p\n'

 

 

Example:

root@unRAID:/mnt/disk1/pub/unraid# head /tmp/outputfile.txt 
1411568002 31110 /mnt/disk3/filedb/bin/gdbm
1414457357 701 /mnt/disk3/filedb/bin/setfattr_md5sums
1414328679 1008 /mnt/disk3/filedb/bin/gdbm_md5sum_delete.sh
1411009607 735690 /mnt/disk3/filedb/bin/b2sum
1414330039 1112 /mnt/disk3/filedb/bin/gdbm_md5sum_insert.sh
1414330126 743 /mnt/disk3/filedb/bin/gdbm_md5sum_import.sh
1411565935 39505 /mnt/disk3/filedb/bin/gdbm.so
1414973688 39524 /mnt/disk3/filedb/bin/ftwgdbm
1411565956 20106 /mnt/disk3/filedb/bin/md5.so
1411565944 13023 /mnt/disk3/filedb/bin/strftime.so

 

What this does is save the mtime in epoch time and size in bytes along with the file.

While it's not all that human readable, it allows you to do quick programmatic compares with the stat command or a massive diff to find files that have changed.

 

Then you do not have to md5sum the whole disk.

 

 

here's how to pull out that data from the filesystem with stat

root@unRAID:/mnt/disk1/pub/unraid# stat -c '%Y %s' /mnt/disk3/filedb/bin/setfattr_md5sums
1414457357 701

 

 

and in a bash array

declare -a STAT # this is only done once
STAT=( $(stat -c '%Y %s' /mnt/disk3/filedb/bin/setfattr_md5sums) ) 
echo ${STAT[0]} ${STAT[1]}
1414457357 701

 

and another interesting lesson.

declare -a MD5
MD5=( $(md5sum /mnt/disk3/filedb/bin/setfattr_md5sums) ) 
set | grep MD5
MD5=([0]="b48148e6661a238fe6dde8c919c16edd" [1]="/mnt/disk3/filedb/bin/setfattr_md5sums")

This breaks on the filename if the filename has spaces, but what you usually want is the hash anyway.

 

While I see that the whole disk is hashed, is there another tool that actually compares these values or do you redo the md5sum -c using the output of the prior run(either way works).

 

furthermore this command

find /mnt/disk3 -type f -exec md5sum -b {} \; > /hash/MD5_$(date +"%Y-%m-%d")_disk3.md5

 

could possibly be combined to do the md5sum and make the filelist in one command using -fprintf like above.

find /mnt/disk3 -type f -fprintf /hash/TS_$(date +"%Y-%m-%d")_disk3.txt '%Ts %s %p\n' -exec md5sum -b {} \; > /hash/MD5_$(date +"%Y-%m-%d")_disk3.md5

 

You'll have to adjust the '%Ts %s %p\n' to what it is you want to preserve.

I usually make a crlf .txt file that is accessible so I can load it via notepad quickly for scanning

so my personal end of line is \r\n for windows notepad.

Link to comment
  • 2 months later...

@Weebotech,

 

Rereading your advices to my code I noticed you mention you use '\n\r'. I 'recently' found an almost 'solve-all' solution to that problem (as if those ever existed :) which is simply and surprisingly to never use \n\r in files I write in my text editor. Just use unix-newlines. The only application I have found which does not understand unix-newlines as newlines is the notepad (which I find useless anyway). For instance it seems the following applications understand unix-newlines just fine in windows 7:

• cmd scripts

• powershell scripts

• word 2013 (I can open a txt file with unix-newlines, though I never

use that, I can also paste text with unix-newlines and get

correct/desired line breaking)

• OneNote 2013 (pasting text)

• wordpad (not that I use it)

• Sublime Text 3 (naturally, just on the list because it the best in my experience! )

• Eclipse

That cmd-scripts work with unix-newlines was the most surprising and crucial feature for me. There are bound to be gotchas that may be discovered over the years, but so far so good, I havn't had any problems for 9-months now. I think Microsoft is trying to help here...

 

I then configure git with 'autocrlf = false' which seems to give me the least of trouble, but your milage may vary there.

 

and a ps: I really love Sublime Text 3 for text-editing. And its got vertical selections which many otherwise feature rich editors do not.

 

PS: I just hijacked my own thread - wow :)

 

Best Alex

Link to comment

@Weebotech,

 

Rereading your advices to my code I noticed you mention you use '\n\r'. I 'recently' found an almost 'solve-all' solution to that problem (as if those ever existed :) which is simply and surprisingly to never use \n\r in files I write in my text editor. Just use unix-newlines. The only application I have found which does not understand unix-newlines as newlines is the notepad (which I find useless anyway).

 

How many regular uses do you know that install better more advanced editors and set the association correctly?

For a quick 'make a text file' that any windows machine can read, \r\n as a name.txt works easily.

 

If you ever had to process the file in another script breaking on \r\n would probably be more reliable then \n alone.

If there were a \n in a filename (and I have seen them) it could not always be parsed correctly. Where as if it's \r\n delimited, chances of that being in a file name are even slimmer. and | is even slimmer!

 

I use editpad++ and notepadd++, but it also means I have to right click on the file and select that program to open as.

 

So instead I just use \r\n for my filelists and I'm done with it.

Link to comment
I use editpad++ and notepadd++, but it also means I have to right click on the file and select that program to open as.
I just associate .txt files with the alternate notepad and then I can double click the text file.  On Windows 7 it is "Open with/Choose default program..." from the right click popup menu.  Then a double click opens in my alternate (notepad2 in my case).
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.