TheDragon Posted June 23, 2013 Share Posted June 23, 2013 I'm trying to cobble together a script to create MD5 checksums of each of my array disks. I've been successful for the most part, however the finishing touch I can't seem to get right is including the disk number that each file relates to. Here is what I have so far: # Script to Create MD5 Hashes of Data Files on Array Disks (Monthly) cd /mnt/ find /mnt/ -type d -maxdepth 1 | grep -v cache | grep -v user | grep disk | cut -c 6-11 | xargs -n 1 -I {} md5deep -re {} > /mnt/cache/Backups/MD5_{}_$(date +"%d_%m_%Y").txt It creates the file, with MD5 hashes as expected.. however, the files are being generated with a name as below: MD5_{}_23_06_2013.txt I was expecting to get files named like this: MD5_disk1_23_06_2013.txt MD5_disk2_23_06_2013.txt Can anyone more knowledgeable than myself, see where I'm going wrong? Any input would be greatly appreciated! Quote Link to comment
cassiusdrow Posted June 23, 2013 Share Posted June 23, 2013 You are getting "{}" instead of "disk1" because of the pipes through the greps. How about this: cd /mnt dt=$(date +"%d_%m_%Y") for dsk in disk* do md5deep -re /mnt/${dsk} > /mnt/cache/Backups/MD5_${dsk}_${dt}.txt done cd ${OLDPWD} I don't have md5deep to test it. Quote Link to comment
WeeboTech Posted June 23, 2013 Share Posted June 23, 2013 Here's a scriptlet that may give you a different idea on how to do this. While I don't have the whole dated thing going on, it's how i created a filelist.disk# and a filelist.disk#.newer This was a precursor to a larger script to do it with md5sums. #!/bin/bash find /mnt -type d -maxdepth 1 -name disk* -print | while read DIR do DISK="${DIR##*/}" FILELIST=/mnt/cache/.flocate/filelist.${DISK} find ${DIR} -type f -newer ${FILELIST} -print | sort > ${FILELIST}.newer find ${DIR} -type f -print | sort > ${FILELIST} # ls -l /mnt/cache/.flocate/filelist.${DISK} done Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 Thanks for such speedy replies guys! I'm a bit of a rookie when it comes to Linux scripting (as you may have guessed!) I'll try and get my head around the suggestions you have both posted and see what I can do. Thank you! Quote Link to comment
BobPhoenix Posted June 23, 2013 Share Posted June 23, 2013 When you get your script completed post it please. Would be interested in it myself. Quote Link to comment
WeeboTech Posted June 23, 2013 Share Posted June 23, 2013 FWIW, you are better off dating your scripts with YYYY-MM-DD or some derivative like that. This way when you sort the file list, they are sorted correctly. date "+%Y-%m-%d" 2013-06-23 Since YYYY-MM-DD is an ISO standard, I usually use that or without the dashes so I can just check the length an know I have a date. 20130623 i.e. 8 digits following very specific limits. Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 Okay.. using both of your suggestions I've managed to cobble something together that seems to have the desired effect! I can't say I'm entirely sure how/why it works though Any constructive criticism welcome!! This is what I've got: # Script to Create MD5 Hashes of Data Files on Array Disks (Monthly) #!/bin/bash dt=$(date +"%Y-%m-%d") find /mnt/ -type d -maxdepth 1 -name disk* -print | while read DIR cd /mnt/ do DISK="${DIR##*/}" md5deep -re ${DISK} > /mnt/cache/Backups/MD5_${DISK}_${dt}.txt done I have a couple of questions about how/why this works, if anyone is happy to answer If I type ' find /mnt/ -type d -maxdepth 1 -name disk* -print ' at the console, it returns the disks, along with '/mnt/' prefix. I can't see where this is removed in the script, since the file names I end up with don't include '/mnt/'. I'm also not 100% sure of the effect of 'while read DIR'. From my googling, I'm guessing this is reordering the disks in alpha/numerical order? Final question! Is how/why do the disk numbers end up in the ${DISK} variable? EDIT: Updated code to include WeeboTech's suggestion re date format - Thank you!! Quote Link to comment
kegler Posted June 23, 2013 Share Posted June 23, 2013 As I am not yet fluent in the fine art of scripting, I found an open source windows explorer shell extension called HashCheck that accomplishes the same thing. http://code.kliu.org/hashcheck/ You can generate MD5 hash files for entire directories that can be edited with your favorite text editor. I use as a check on both my unRAID array and backup disks. Gives me peace of mind that everything is working properly. It will also generate and display as a property page all the different hashes for a given file. Very easy to use. I use it from a Win7 machine. Quote Link to comment
WeeboTech Posted June 23, 2013 Share Posted June 23, 2013 DISK="${DIR##*/}" Takes apart the /mnt/disk# root@unRAID:~# DIR=/mnt/disk1 root@unRAID:~# echo ${DIR##*/} disk1[/code Quote Link to comment
WeeboTech Posted June 23, 2013 Share Posted June 23, 2013 Here are some other topics/scripts for assistance. http://lime-technology.com/forum/index.php?topic=27296.msg239754#msg239754 http://lime-technology.com/forum/index.php?topic=23152.msg204749#msg204749 Quote Link to comment
graywolf Posted June 23, 2013 Share Posted June 23, 2013 Here you are piping the list into the variable DIR and entering a DO WHILE loop. find /mnt/ -type d -maxdepth 1 -name disk* -print | while read DIR So for each value in DIR, you set the variable DISK then do you md5deep on it. Then it hits the done and goes back to the DIR array and reads the next value until no more values. cd /mnt/ do DISK="${DIR##*/}" md5deep -re ${DISK} > /mnt/cache/Backups/MD5_${DISK}_${dt}.txt done Personally, I would put the cd /mnt/ line just above the find /mnt/ line. no need to keep ccd /mnt/ each iteration Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 Here you are piping the list into the variable DIR and entering a DO WHILE loop. find /mnt/ -type d -maxdepth 1 -name disk* -print | while read DIR So for each value in DIR, you set the variable DISK then do you md5deep on it. Then it hits the done and goes back to the DIR array and reads the next value until no more values. cd /mnt/ do DISK="${DIR##*/}" md5deep -re ${DISK} > /mnt/cache/Backups/MD5_${DISK}_${dt}.txt done Personally, I would put the cd /mnt/ line just above the find /mnt/ line. no need to keep ccd /mnt/ each iteration I hadn't twigged that the DIR in caps was indicating a variable - thanks, that makes sense now! As for the cd /mnt/ line, I did try putting that above the find /mnt/ line, but for some reason when I did that the script produced an error.. not really sure why! However by moving it down after the /mnt/ find line it seemed to work fine. The error I got was find: paths must precede expression: disk2 Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 DISK="${DIR##*/}" Takes apart the /mnt/disk# root@unRAID:~# DIR=/mnt/disk1 root@unRAID:~# echo ${DIR##*/} disk1[/code Thank you! That also makes perfect sense now, hopefully one of these days once I've learnt a little more, I might actually be able to make a script from scratch Quote Link to comment
WeeboTech Posted June 23, 2013 Share Posted June 23, 2013 All operations in a while loop must occur on or after the do in my case I had do DISK= Just move the DISK= to the line below, indent accordingly. add the cd after the DISK= or before it. while read DIR do <your lines go here> done Quote Link to comment
BobPhoenix Posted June 23, 2013 Share Posted June 23, 2013 Here is a script that I put together from one I found a while back, the threads Joe L posted and portions of yours jack0w: # Script to Create MD5 Hashes of Data Files on Array Disks (Monthly) #!/bin/bash dt=$(date +"%Y_%m_%d") mkdir -p /mnt/cache/Backup/N40L find /mnt/disk* -type f -exec md5sum {} \;>> /mnt/cache/Backup/N40L/MD5_${dt}.txt Obviously the directories above that put the text file on the cache drive would have to change but it appears to do what I needed anyway. I think it will be several hours before it gets to my next disk to see if that works or not but it is working on the first disk so far just like I want anyway. Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 All operations in a while loop must occur on or after the do in my case I had do DISK= Just move the DISK= to the line below, indent accordingly. add the cd after the DISK= or before it. while read DIR do <your lines go here> done Okay think I've got this sussed... I started with: # Script to Create MD5 Hashes of Data Files on Array Disks (Monthly) #!/bin/bash dt=$(date +"%Y-%m-%d") find /mnt/ -type d -maxdepth 1 -name disk* -print | while read DIR cd /mnt/ do DISK="${DIR##*/}" md5deep -re ${DISK} > /mnt/cache/Backups/MD5_${DISK}_${dt}.txt done As per your suggestion I've now changed this to: # Script to Create MD5 Hashes of Data Files on Array Disks (Monthly) #!/bin/bash dt=$(date +"%Y-%m-%d") find /mnt/ -type d -maxdepth 1 -name disk* -print | while read DIR do DISK="${DIR##*/}" cd /mnt/ md5deep -re ${DISK} > /mnt/cache/Backups/MD5_${DISK}_${dt}.txt done Is what I've got now correct? graywolf seemed to suggest that it was incorrect or bad script etiquette (I'm not sure which) to cd /mnt/ for each iteration. Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 Here is a script that I put together from one I found a while back, the threads Joe L posted and portions of yours jack0w: I'm glad what I started cobbling together was of some use to somebody else!! Quote Link to comment
BobPhoenix Posted June 23, 2013 Share Posted June 23, 2013 Here is a script that I put together from one I found a while back, the threads Joe L posted and portions of yours jack0w: I'm glad what I started cobbling together was of some use to somebody else!! Was a big help. Now to get it to execute periodically once it completes. Don't think that will be a problem but got to wait until it gets done. I'm thinking I will zip up the file as well to take less space so will have to add that also have to add it to my other unRAID servers. Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 Here is a script that I put together from one I found a while back, the threads Joe L posted and portions of yours jack0w: I'm glad what I started cobbling together was of some use to somebody else!! Was a big help. Now to get it to execute periodically once it completes. Don't think that will be a problem but got to wait until it gets done. I'm thinking I will zip up the file as well to take less space so will have to add that also have to add it to my other unRAID servers. If it helps, I did try creating MD5 hashes on a single disk a few days ago, just to get an idea of file size and time taken to complete per disk. On an almost full 3TB disk, I ended up with a 411.2 kB file containing the hashes, and it took around 7 hours to complete. Have you given much thought to the interval you will run it at? I am thinking of doing it monthly, although *I wonder* if since it is reading from the whole disk, it may well contribute to the 'ageing' of the disk? As a result I'm not sure if running it monthly is wise! I did wonder about running it bi-monthly? That's the only reason I haven't immediately added a line to my go file to get crontab to run it regularly. What interval do you run your version of this script at WeeboTech? Quote Link to comment
BobPhoenix Posted June 23, 2013 Share Posted June 23, 2013 If it helps, I did try creating MD5 hashes on a single disk a few days ago, just to get an idea of file size and time taken to complete per disk. On an almost full 3TB disk, I ended up with a 411.2 kB file containing the hashes, and it took around 7 hours to complete. Have you given much thought to the interval you will run it at? I am thinking of doing it monthly, although *I wonder* if since it is reading from the whole disk, it may well contribute to the 'ageing' of the disk? As a result I'm not sure if running it monthly is wise! I did wonder about running it bi-monthly? That's the only reason I haven't immediately added a line to my go file to get crontab to run it regularly. What interval do you run your version of this script at WeeboTech? On my Media unRAID servers my current MD5 checksums take 7749KB and 1318KB respectively with my old script which I had to edit for each disk change. On my N40L which contains downloads and backups I will know sometime tomorrow when it completes. It should be larger than the media servers because it has many more smaller files. Before I deleted the old MD5Sums the largest drive (by file count) was approximately 33300KB. I have already added a copy command to my go file to copy my version of the script to cron.monthly so that it will run monthly. Not worried about the drives especially since I'm using WD Reds for 2 of my 3 unRAID boxes. They are designed to be spinning 24x7 at least and I'm sure reading will not add much more ware. The only thing I'm worried about is that this will run at the same time as the month parity check and slow both down allot. Edit: First disk on N40L just completed and has 295833 files in 786.59G. Rest of the disks (all 2TB WD Reds) are about the same usage or have less free space but the files are much larger. I probably only have a couple of thousand files on the other 4 disks of the array but each disk has between 400G and 950G free each. Quote Link to comment
TheDragon Posted June 23, 2013 Author Share Posted June 23, 2013 On my Media unRAID servers my current MD5 checksums take 7749KB and 1318KB respectively with my old script which I had to edit for each disk change. On my N40L which contains downloads and backups I will know sometime tomorrow when it completes. It should be larger than the media servers because it has many more smaller files. Before I deleted the old MD5Sums the largest drive (by file count) was approximately 33300KB. I have already added a copy command to my go file to copy my version of the script to cron.monthly so that it will run monthly. Not worried about the drives especially since I'm using WD Reds for 2 of my 3 unRAID boxes. They are designed to be spinning 24x7 at least and I'm sure reading will not add much more ware. The only thing I'm worried about is that this will run at the same time as the month parity check and slow both down allot. Ah I see! Well I guess since my array is full of a smaller quantity of larger files, that accounts for why our checksum file sizes are significantly different. I can now understand why you want them zipped! I'm also using all WD Red drives (bar one), and your argument sounds logical to me. I think I will follow suit and run it monthly. Thanks for the additional info. Also big thanks to everyone who pitched in to help me get to the point of having a working script! Hopefully it may prove useful, even if only as a starting point, for others in future Quote Link to comment
WeeboTech Posted June 23, 2013 Share Posted June 23, 2013 I'm resurrecting some old snippets here to give you some ideas since I cannot locate the full script. I had a daily job that would traverse one disk for each day via cron. It would find today's day of the month. Look to see if that disk mount point existed. If so, continue and make a filelist. Here are the snippets. # DD=`date "+%e"|sed -e 's# ##g'` # [ ! -e /mnt/disk${DD} ] && exit 3 # find /mnt/disk${DD} -type f > /mnt/cache/.flocate/filelist.disk${DD} My script was a bit more extravagant though. The later creation made the md5sums as did the other scriptlet shown in the thread. I then read the md5sum file and checked each file for it's respective MTIME. The latest MTIME was used as a reference to set the TIME of the md5sum. On subsequent iterations, I used find /mnt/disk/${DD} -newer (seed md5sum file). >> filelist.disk${DD}.newer From here I would use the newer file as input to make new md5sums of only the updated files. Then merge them into the original md5sum file. Then set the timestamp of the md5sum file to the timestamp of the newest file. So rather then doing all your md5sums daily or monthly. I did an MD5SUM for each /mnt/disk${DD} based on the day of the month. I changed my parity check to run on the 27th day so that there were no collisions. I started parity check on the 27th in case it ran over 24 hours. This allowed disk 1-24 to be processed on those respective days cache to be any day between or reruns of any other day to occur over night. Before I lost everything, I had this neat scheduler. It read a named pipe and would spawn up to CPU # of md5sum parallel processes so the job got done quicker. Quote Link to comment
BobPhoenix Posted June 23, 2013 Share Posted June 23, 2013 I plan on putting my monthly MD5 files through BeyondCompare to see if any changes have occurred that were not caused by any file operations I know about (bit rot). Then delete the oldest. As long as I do this regularly every couple of months I should see and maybe be able to correct any problems - re-record or re-download the files. Quote Link to comment
WeeboTech Posted June 23, 2013 Share Posted June 23, 2013 What about something like tripwire or AIDE? http://aide.sourceforge.net/stable/manual.html Quote Link to comment
BobPhoenix Posted June 23, 2013 Share Posted June 23, 2013 What about something like tripwire or AIDE? http://aide.sourceforge.net/stable/manual.html I'm more comfortable with Windows and I already have BeyondCompare at home and use it often at work (weekly/monthly). I'm sure what you posted would work however. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.