MortenSchmidt Posted April 27, 2015 Share Posted April 27, 2015 Are there other differences I've missed?? Yes, Blake2 support in Bunker. Much less CPU intensive. A must if you want to several disks at the same time, or if you want plex transcoding while using it. Quote Link to comment
tr0910 Posted June 7, 2015 Share Posted June 7, 2015 I have a text file from SHA256 for Windows that needs translated into Linux for a repetitive Win->unRaid workflow. Image files get created in Windows, have their SHA256 hashes calculated, rsynced to unRaid via cwrsync along with the hash text file, have their Hashes verified, and bunker then puts the hashes into the extended attributes. Problems first Windows EOL vs Unix EOL. Note the difference between what we have from Windows and what we need for bunker import: 0c50530b7c6e9ce822aadd70a6d69af2620dc311f40d018c2e247017a08547e7 C:\Users\Images\Image Compare\IMG_2456.JPG bec92581255ff043d576f3bd4f2dc530b41c75662390d14b34480dd2a718b8fd C:\Users\Images\Image Compare\IMG_2456A.JPG unRaid bunker needs: 0c50530b7c6e9ce822aadd70a6d69af2620dc311f40d018c2e247017a08547e7|/mnt/disk1/downloads/IMG_2456.JPG bec92581255ff043d576f3bd4f2dc530b41c75662390d14b34480dd2a718b8fd|/mnt/disk1/downloads/IMG_2456A.JPG What is the best way to take a Windows text file and translate into unRaid bunker friendly format? Quote Link to comment
bonienl Posted June 7, 2015 Author Share Posted June 7, 2015 You can use the linux sed command to do a conversion: sed 's! C:\\Users\\Images\\Image Compare!|/mnt/disk1/downloads!;s!\\!/!g;s!\r!!' windows_file >linux _file The above line does the following: 1. Convert the windows folder C:\Users\Images\Image Compare to the linux directory /mnt/disk1/downloads (including the | character in the front) 2. Convert backslashes (\) into forward slashes (/) 3. Remove the carriage return (\r) character at the end of the line windows_file = name of the source windows file linux_file = name of the destination linux file (suitable for bunker) Adapt the names of the folders and directories as needed. Quote Link to comment
tr0910 Posted June 7, 2015 Share Posted June 7, 2015 Thanks, will start my sed learning with your quick boost up. "Sed for Dummies" here we come.... One more question, I have one 3tb drive with documents, that the contents change daily. It would be something like your source code drive for Dynamix. Each day I want to scan the drive for only the files where the file date is newer than the hash date, and update the hashes for only these few files. "bunker -u" seems to want to take 24 hours for this task. Is it calculating all the hashes over again? Is there a different way I should be doing this? "bunker -a" will check a 3tb drive for new files in only a few minutes. Quote Link to comment
bonienl Posted June 7, 2015 Author Share Posted June 7, 2015 -u option will recalculate all files, while the -a option skips existing hashes and only calculate new files. Quote Link to comment
BRiT Posted June 7, 2015 Share Posted June 7, 2015 -u option will recalculate all files, while the -a option skips existing hashes and only calculate new files. I assume that 'new' is files without hashes regardless of the file timestamps? Is there an option to recalculate hashes for files that have a modified timestamp or last written timestamp after the hash calculated timestamp? Quote Link to comment
bonienl Posted June 7, 2015 Author Share Posted June 7, 2015 I assume that 'new' is files without hashes regardless of the file timestamps? Correct Is there an option to recalculate hashes for files that have a modified timestamp or last written timestamp after the hash calculated timestamp? See the -d option Quote Link to comment
tr0910 Posted June 7, 2015 Share Posted June 7, 2015 But what about file saved / updated date. Can that be compared?? -d <days> optional only verify/update/remove for files scanned <days> or longer ago I only want to update the hashes for files that have been updated since the last hash calculation. For example a script file is opened in notepad++ and changed and resaved under the same name. The hash for this file is no longer valid and needs updated. This file can be found quickly if you compare the file change date with the latest hash date. Then you can calculate a new hash for only this one file. @bonienl (your sed script works like a charm) Quote Link to comment
bonienl Posted June 8, 2015 Author Share Posted June 8, 2015 For example a script file is opened in notepad++ and changed and resaved under the same name. The hash for this file is no longer valid and needs updated. This file can be found quickly if you compare the file change date with the latest hash date. Then you can calculate a new hash for only this one file. This is not possible with bunker, a workaround can be to write your own wrapper which compares file dates and sizes and calls bunker for only those files which have changed. I'll think about an implementation in bunker itself too, but can't give timelines. Quote Link to comment
tr0910 Posted June 8, 2015 Share Posted June 8, 2015 Maybe I don't have the workflow properly understood. How would you handle this situation? (It just took 23 hours to bunker -u a 3tb drive with only 10 updates. Bunker -a is done in a few minutes to add 10 files) Quote Link to comment
WeeboTech Posted June 8, 2015 Share Posted June 8, 2015 For example a script file is opened in notepad++ and changed and resaved under the same name. The hash for this file is no longer valid and needs updated. This file can be found quickly if you compare the file change date with the latest hash date. Then you can calculate a new hash for only this one file. This is not possible with bunker, a workaround can be to write your own wrapper which compares file dates and sizes and calls bunker for only those files which have changed. I'll think about an implementation in bunker itself too, but can't give timelines. There's gotta be a method to do this simply. If mtime > scandate, assume file has changed and recalculate hash. As far as a custom local wrapper, there is the find command with -newer. Set some semaphore file time with touch. Do a find /mnt/disk? -newer semaphorefile -print > filelist Touch the semaphore file after you are done. The only issue with this method is the possibility to miss a file. So it has to be more involved with a touch -r semaphorefile semaphorefile.tmp, then touch the actual semaphore file immediately. After that use the tmp semaphore file with the newer option, then remove it. However with this method, if you kill the job, the semaphore file has already been updated. It's much more involved then using the scandate and mtime comparison. (Which would be much less error prone when done correctly). Quote Link to comment
bonienl Posted June 8, 2015 Author Share Posted June 8, 2015 Ok, I made an update to bunker and released version 1.5 In this version there is a new command -U = same as -u command, but only updates files which are newer than the scandate of the hash. In other words files which have changed after the hash has been calculated There is also a new option -D <time> = only include files which are newer than the specified time. The time is expressed in seconds (s), minutes (m), hours (h), days (d) or weeks (w). Eg. -D 1w means files created/modified in the last week. For example you can do the following: bunker -U /mnt/user/data - this will update only the files in /mnt/user/data which have been changed since the last hash calculation bunker -U -D 4h /mnt/user/data - the same as above but search is limited to files which are modified in the last four hours See OT to download the new version, and let me know if it is useful. Quote Link to comment
bonienl Posted June 8, 2015 Author Share Posted June 8, 2015 Yeah, it wasn't that complicated afterall ... Quote Link to comment
tr0910 Posted June 8, 2015 Share Posted June 8, 2015 Thanks bonienl. Just threw it at 2 servers both with one 3 tb drive to check. Server 1 Finished. Verified 11 files Skipped 197499 files Found 9 mismatches. Server 2 Finished. Verified 2 files Skipped 209471 files Found 0 mismatches. Run time about 2hr 15min each server, one with a Xeon 1220 and the other with a Amd Phenom X6 1055T. Quote Link to comment
bonienl Posted June 9, 2015 Author Share Posted June 9, 2015 Thanks bonienl. Just threw it at 2 servers both with one 3 tb drive to check. Server 1 Finished. Verified 11 files Skipped 197499 files Found 9 mismatches. Server 2 Finished. Verified 2 files Skipped 209471 files Found 0 mismatches. Run time about 2hr 15min each server, one with a Xeon 1220 and the other with a Amd Phenom X6 1055T. When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter. Quote Link to comment
tr0910 Posted June 14, 2015 Share Posted June 14, 2015 When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter. But doesn't your new code make it only look at files that have changed anyway. I tried it as follows. bunker -U -l -D 1w /mnt/disk2 I am getting an error /boot/config/bunker: line 152: -1: substring expression < 0 date: invalid date `@' Quote Link to comment
bonienl Posted June 14, 2015 Author Share Posted June 14, 2015 When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter. But doesn't your new code make it only look at files that have changed anyway. I tried it as follows. bunker -U -l -D 1w /mnt/disk2 I am getting an error /boot/config/bunker: line 152: -1: substring expression < 0 date: invalid date `@' The -D option acts as a filter in the find command, when it is used there will be less files to 'process', resulting in less execution time. You are right that the same result is obtained without the option but may take longer to execute. Strange, I copied/pasted your command above and it runs fine. The error says that the value '1w' after the -D option is somehow incorrect, but it isn't ... Quote Link to comment
sincero Posted July 8, 2015 Share Posted July 8, 2015 I'm running unraid 5. I ran... time bunker -U -md5 /mnt/user/Media/ and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either. Additional question: How does this handle deleted files with -U / exporting? Quote Link to comment
bonienl Posted July 8, 2015 Author Share Posted July 8, 2015 I'm running unraid 5. I ran... time bunker -U -md5 /mnt/user/Media/ and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either. Additional question: How does this handle deleted files with -U / exporting? When using the -u or -U command it will only update those files which have been previously added by the -a command. This means that any files which don't have a checksum value in their extended attributes are skipped. If you want to ensure all files have a checksum then run the -a command first. Mismatches can be logged in the syslog (-l option) or in a designated file (-f option). The program will abort execution immediately with an error message when sha256 or md5 executables are not present. If you want to see missing files then you need to make use of the export (-e) and check (-c) commands. This allows you to store the checksums in an external file, which can be checked later for any mismatching or missing files. You need version 1.6 to do this. Quote Link to comment
sincero Posted July 8, 2015 Share Posted July 8, 2015 I'm running unraid 5. I ran... time bunker -U -md5 /mnt/user/Media/ and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either. Additional question: How does this handle deleted files with -U / exporting? When using the -u or -U command it will only update those files which have been previously added by the -a command. This means that any files which don't have a checksum value in their extended attributes are skipped. If you want to ensure all files have a checksum then run the -a command first. Mismatches can be logged in the syslog (-l option) or in a designated file (-f option). The program will abort execution immediately with an error message when sha256 or md5 executables are not present. If you want to see missing files then you need to make use of the export (-e) and check (-c) commands. This allows you to store the checksums in an external file, which can be checked later for any mismatching or missing files. You need version 1.6 to do this. How does one add files that have not been indexed yet, then? Do I need to run -a over the entire mount again? Then follow up with -U? Sorry... just one more question. How will this prevent against corruptions? I'm trying to come to terms with this situation: 1) I put a file on my drive, call it A, get the checksum 2) A is corrupted silently 3) I run the checksum again and it's marked as "changed" / failed to verify. But... What if 1) I put a file on my drive, call it A, get the checksum 2) I run a checksum 3) I change the file 4) It gets corrupted 5) I run verify / it's marked as changed but I knew I changed it. How does that work? Quote Link to comment
BRiT Posted July 8, 2015 Share Posted July 8, 2015 None of these tools "prevent corruption", they detect it by comparing against a known checksum at some moment in time. If you change the file but fail to update the checksum it will always report as being non matching. Its as simpIe as that. Quote Link to comment
bonienl Posted July 8, 2015 Author Share Posted July 8, 2015 How does one add files that have not been indexed yet, then? Do I need to run -a over the entire mount again? Then follow up with -U? You only need to run the -a command to add new files, it will skip any files which have already a chechsum. Sorry... just one more question. How will this prevent against corruptions? I'm trying to come to terms with this situation: 1) I put a file on my drive, call it A, get the checksum 2) A is corrupted silently 3) I run the checksum again and it's marked as "changed" / failed to verify. But... What if 1) I put a file on my drive, call it A, get the checksum 2) I run a checksum 3) I change the file 4) It gets corrupted 5) I run verify / it's marked as changed but I knew I changed it. How does that work? Using the verify (-v) command will tell you which files have their checksum changed. This could be because the file itself has changed or some corruption occured. The program can't tell the difference, only you can. So you have to decide to (a) mark the file as bad or (b) update its checksum. Quote Link to comment
sincero Posted July 8, 2015 Share Posted July 8, 2015 Thanks a lot! That's a shame for some of my more frequently changing files. I'll have to think about it, most of the media should be fine, though. Quote Link to comment
bonienl Posted July 8, 2015 Author Share Posted July 8, 2015 Thanks a lot! That's a shame for some of my more frequently changing files. I'll have to think about it, most of the media should be fine, though. You can use the -u command in combination with the -D <time> option, this allows you to update only files which have been modified in the last <time> period. Eg. bunker -u -D 1h /mnt/user/files Will update those files modified in the last hour. Note: files must have initially been added using the -a command Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.