Alex R. Berg Posted November 21, 2014 Share Posted November 21, 2014 This is a response to another thread regarding my MD5 script (which I am not sure I am the original creator of, or if I found parts of it somewhere). http://lime-technology.com/forum/index.php?topic=34988.msg338408#msg338408 This post describes the content of the attached md5-script. Your welcome Stewart, and I'm glad it could help. It is correct that the program initialy writes hash values to HASHDIR=/hash Thats the root of the filesystem which resides in memory and is lost on reboot. The program writes the hashes to one file pr disk called eg MD5_2014-11-07_disk1.md5 and it also writes timestamps to a separate file for each disk. The timestamp is just a crude 'ls' to get the modification time of files, so in case a file changes I can check the md5's and their timestamp. I expect a timestamp to have changed if the file content has. After generating hashes it zips everything together and archives it in /boot/hash (on the flash drive) zip -jm /boot/hash/md5s-${dt}.zip $HASHDIR/* The archieve is name with a date, so old achieves are kept. That's it. No fancy skipping files that didn't change since last, or validation of checksums being equal to last. For me its meant for giving me extra information in case of emergency. I run the script by creating an run_md5 script in /etc/cron.monthly/. Remember to setup some copying of the run-script to this location on each boot. Personally I recursively copy /boot/custom to / on each boot, by placing this in my go-file. I also have some /home/admin folder in my custom dir, hence the chown admin lines. cp --preserve=timestamps -R /boot/custom/* / ln -s /home/admin /admin chown -R admin:users /home/admin chmod -R 0755 /home/admin chown -R nobody:users /nobody chmod -R 0755 /nobody One more thing I forgot: I set the monthly crontab to run on the 7.th not the 1.st to avoid clash with my monthly parity check. I do that by loading my custom root-crontab file on boot: crontab -l > /boot/config/myconfig/crontab_atLastBoot crontab /boot/config/myconfig/mycrontab Best Alex md5_array.zip Quote Link to comment
stewartwb Posted November 21, 2014 Share Posted November 21, 2014 Thank you, Alex - that's a great bit of documentation! I like your use case - monthly hashes to guard against silent corruption and bit rot. I seem to have survived the data corruption bug earlier this year, but I welcome a strategy to guard against undetectable corruptions in the future. Here's my current use case - I'm migrating my 2TB array volumes from ReiserFS to XFS. I'm using the checksum program (by corz) from Windows to validate each step before I wipe and convert my next data drive. My process takes about 7 hours per drive to compute the hashes because it has to ship the data over the network. I expect building hashes directly on the unRAID server would be faster. I noticed one difference between your method and my Windows-based method. The checksum program I've been using appears to add the '-b' option to consider each file as binary rather than text, which was shown in the output as an asterisk before the path/filename. Do you think there would there be any benefit to adding that option to your script? I suppose the md5sum utility included in unRAID may automatically handle file types without needing this option. Thanks again for sharing your knowledge and your effort on this front with the unRAID community! -- stewartwb Quote Link to comment
Alex R. Berg Posted November 21, 2014 Author Share Posted November 21, 2014 Ah so that's what the star means on md5-files generated on windows, I often wondered about that difference. According to the documentation it makes no difference on linux, but may make a difference on windows as far as I can tell. I have enabled binary mode in attached script. Best Alex md5_array_binaryoption.zip Quote Link to comment
stewartwb Posted November 21, 2014 Share Posted November 21, 2014 Ah so that's what the star means on md5-files generated on windows, I often wondered about that difference. According to the documentation it makes no difference on linux, but may make a difference on windows as far as I can tell. I have enabled binary mode in attached script. Best Alex This is great... I'm able to interpret and modify the script to meet my use case, although I know it's not elegant. I created a script for each disk, which I can run when I'm ready to build the hashes to validate the data migration after each file copy operation. Here's what I came up with to grab hashes immediately for disk3, though I should make the volume a command line parameter instead. #!/bin/bash find /mnt/disk3 -type f -exec md5sum -b {} \; > /hash/MD5_$(date +"%Y-%m-%d")_disk3.md5 find /mnt/disk3 -type f -exec ls -lc {} \; > /hash/TS_$(date +"%Y-%m-%d")_disk3.txt I could even simplify further by dispensing with the date stamps. #!/bin/bash find /mnt/disk3 -type f -exec md5sum -b {} \; > /hash/MD5_disk3.md5 find /mnt/disk3 -type f -exec ls -lc {} \; > /hash/TS_disk3.txt Again, thanks for your help! -- stewartwb Quote Link to comment
WeeboTech Posted November 21, 2014 Share Posted November 21, 2014 This find /mnt/disk3 -type f -exec ls -lc {} \; > /hash/TS_$(date +"%Y-%m-%d")_disk3.txt Can probably be adjusted to this. find /mnt/disk3 -type f -ls > /hash/TS_$(date +"%Y-%m-%d")_disk3.txt find has a -ls parameter that does an internal ls -l Example: 19 32 -rwxr-xr-x 1 root root 31110 Sep 24 10:13 /mnt/disk3/filedb/bin/gdbm 29 4 -rwxrw-rw- 1 root root 701 Oct 27 20:49 /mnt/disk3/filedb/bin/setfattr_md5sums 21 4 -rwxrw-rw- 1 root root 1008 Oct 26 09:04 /mnt/disk3/filedb/bin/gdbm_md5sum_delete.sh 17 721 -rwxr-xr-x 1 root root 735690 Sep 17 23:06 /mnt/disk3/filedb/bin/b2sum 24 4 ---xrw-rw- 1 root root 1112 Oct 26 09:27 /mnt/disk3/filedb/bin/gdbm_md5sum_insert.sh 23 4 -rwxrw-rw- 1 root root 743 Oct 26 09:28 /mnt/disk3/filedb/bin/gdbm_md5sum_import.sh 20 40 -rwxr-xr-x 1 root root 39505 Sep 24 09:38 /mnt/disk3/filedb/bin/gdbm.so 3965 40 -rwxr-xr-x 1 root root 39524 Nov 2 19:14 /mnt/disk3/filedb/bin/ftwgdbm 27 20 -rwxr-xr-x 1 root root 20106 Sep 24 09:39 /mnt/disk3/filedb/bin/md5.so 30 16 -rwxr-xr-x 1 root root 13023 Sep 24 09:39 /mnt/disk3/filedb/bin/strftime.so also I think this is more useful for programmatic purposes. find /mnt/disk3 -type f -fprintf /tmp/outputfile.txt '%Ts %s %p\n' Example: root@unRAID:/mnt/disk1/pub/unraid# head /tmp/outputfile.txt 1411568002 31110 /mnt/disk3/filedb/bin/gdbm 1414457357 701 /mnt/disk3/filedb/bin/setfattr_md5sums 1414328679 1008 /mnt/disk3/filedb/bin/gdbm_md5sum_delete.sh 1411009607 735690 /mnt/disk3/filedb/bin/b2sum 1414330039 1112 /mnt/disk3/filedb/bin/gdbm_md5sum_insert.sh 1414330126 743 /mnt/disk3/filedb/bin/gdbm_md5sum_import.sh 1411565935 39505 /mnt/disk3/filedb/bin/gdbm.so 1414973688 39524 /mnt/disk3/filedb/bin/ftwgdbm 1411565956 20106 /mnt/disk3/filedb/bin/md5.so 1411565944 13023 /mnt/disk3/filedb/bin/strftime.so What this does is save the mtime in epoch time and size in bytes along with the file. While it's not all that human readable, it allows you to do quick programmatic compares with the stat command or a massive diff to find files that have changed. Then you do not have to md5sum the whole disk. here's how to pull out that data from the filesystem with stat root@unRAID:/mnt/disk1/pub/unraid# stat -c '%Y %s' /mnt/disk3/filedb/bin/setfattr_md5sums 1414457357 701 and in a bash array declare -a STAT # this is only done once STAT=( $(stat -c '%Y %s' /mnt/disk3/filedb/bin/setfattr_md5sums) ) echo ${STAT[0]} ${STAT[1]} 1414457357 701 and another interesting lesson. declare -a MD5 MD5=( $(md5sum /mnt/disk3/filedb/bin/setfattr_md5sums) ) set | grep MD5 MD5=([0]="b48148e6661a238fe6dde8c919c16edd" [1]="/mnt/disk3/filedb/bin/setfattr_md5sums") This breaks on the filename if the filename has spaces, but what you usually want is the hash anyway. While I see that the whole disk is hashed, is there another tool that actually compares these values or do you redo the md5sum -c using the output of the prior run(either way works). furthermore this command find /mnt/disk3 -type f -exec md5sum -b {} \; > /hash/MD5_$(date +"%Y-%m-%d")_disk3.md5 could possibly be combined to do the md5sum and make the filelist in one command using -fprintf like above. find /mnt/disk3 -type f -fprintf /hash/TS_$(date +"%Y-%m-%d")_disk3.txt '%Ts %s %p\n' -exec md5sum -b {} \; > /hash/MD5_$(date +"%Y-%m-%d")_disk3.md5 You'll have to adjust the '%Ts %s %p\n' to what it is you want to preserve. I usually make a crlf .txt file that is accessible so I can load it via notepad quickly for scanning so my personal end of line is \r\n for windows notepad. Quote Link to comment
stewartwb Posted November 21, 2014 Share Posted November 21, 2014 ...and another interesting lesson... Much appreciated - guided tips like yours are an excellent way to learn. Thanks!! -- stewartwb ps - only 10 more posts to you reach 8192, or 2^13 - quite a milestone! Thanks for all of your contributions, Weebo! Quote Link to comment
Alex R. Berg Posted November 21, 2014 Author Share Posted November 21, 2014 Wow, awesome help WeeboTech, that should make it possible to make a much smoother simple bash script doing selective scanning on changed files, and yeah a timestamp files with epoch timestamps is much better and easier to use programmatically. Best Alex Quote Link to comment
Alex R. Berg Posted February 4, 2015 Author Share Posted February 4, 2015 @Weebotech, Rereading your advices to my code I noticed you mention you use '\n\r'. I 'recently' found an almost 'solve-all' solution to that problem (as if those ever existed which is simply and surprisingly to never use \n\r in files I write in my text editor. Just use unix-newlines. The only application I have found which does not understand unix-newlines as newlines is the notepad (which I find useless anyway). For instance it seems the following applications understand unix-newlines just fine in windows 7: • cmd scripts • powershell scripts • word 2013 (I can open a txt file with unix-newlines, though I never use that, I can also paste text with unix-newlines and get correct/desired line breaking) • OneNote 2013 (pasting text) • wordpad (not that I use it) • Sublime Text 3 (naturally, just on the list because it the best in my experience! ) • Eclipse That cmd-scripts work with unix-newlines was the most surprising and crucial feature for me. There are bound to be gotchas that may be discovered over the years, but so far so good, I havn't had any problems for 9-months now. I think Microsoft is trying to help here... I then configure git with 'autocrlf = false' which seems to give me the least of trouble, but your milage may vary there. and a ps: I really love Sublime Text 3 for text-editing. And its got vertical selections which many otherwise feature rich editors do not. PS: I just hijacked my own thread - wow Best Alex Quote Link to comment
WeeboTech Posted February 5, 2015 Share Posted February 5, 2015 @Weebotech, Rereading your advices to my code I noticed you mention you use '\n\r'. I 'recently' found an almost 'solve-all' solution to that problem (as if those ever existed which is simply and surprisingly to never use \n\r in files I write in my text editor. Just use unix-newlines. The only application I have found which does not understand unix-newlines as newlines is the notepad (which I find useless anyway). How many regular uses do you know that install better more advanced editors and set the association correctly? For a quick 'make a text file' that any windows machine can read, \r\n as a name.txt works easily. If you ever had to process the file in another script breaking on \r\n would probably be more reliable then \n alone. If there were a \n in a filename (and I have seen them) it could not always be parsed correctly. Where as if it's \r\n delimited, chances of that being in a file name are even slimmer. and | is even slimmer! I use editpad++ and notepadd++, but it also means I have to right click on the file and select that program to open as. So instead I just use \r\n for my filelists and I'm done with it. Quote Link to comment
BobPhoenix Posted February 5, 2015 Share Posted February 5, 2015 I use editpad++ and notepadd++, but it also means I have to right click on the file and select that program to open as.I just associate .txt files with the alternate notepad and then I can double click the text file. On Windows 7 it is "Open with/Choose default program..." from the right click popup menu. Then a double click opens in my alternate (notepad2 in my case). Quote Link to comment
Alex R. Berg Posted February 8, 2015 Author Share Posted February 8, 2015 on newlines... Ah yes, if you live in an environment where you interact with windows pc's in control of other users, then definitely I see why keeping windows newline \r\n is best. I'm just so used to not have to worry about reading my scripts/files from other peoples win-pc's. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.