bunker - yet another utility for file integrity checks


Recommended Posts

  • 1 month later...

I have a text file from SHA256 for Windows that needs translated into Linux for a repetitive Win->unRaid workflow.

 

Image files get created in Windows, have their SHA256 hashes calculated, rsynced to unRaid via cwrsync along with the hash text file, have their Hashes verified, and bunker then puts the hashes into the extended attributes.  Problems first Windows EOL vs Unix EOL.  Note the difference between what we have from Windows and what we need for bunker import:

 

0c50530b7c6e9ce822aadd70a6d69af2620dc311f40d018c2e247017a08547e7  C:\Users\Images\Image Compare\IMG_2456.JPG
bec92581255ff043d576f3bd4f2dc530b41c75662390d14b34480dd2a718b8fd  C:\Users\Images\Image Compare\IMG_2456A.JPG

unRaid bunker needs:

 

0c50530b7c6e9ce822aadd70a6d69af2620dc311f40d018c2e247017a08547e7|/mnt/disk1/downloads/IMG_2456.JPG
bec92581255ff043d576f3bd4f2dc530b41c75662390d14b34480dd2a718b8fd|/mnt/disk1/downloads/IMG_2456A.JPG

 

What is the best way to take a Windows text file and translate into unRaid bunker friendly format?

 

 

Link to comment

You can use the linux sed command to do a conversion:

 

sed 's! C:\\Users\\Images\\Image Compare!|/mnt/disk1/downloads!;s!\\!/!g;s!\r!!' windows_file  >linux _file

 

The above line does the following:

1. Convert the windows folder C:\Users\Images\Image Compare to the linux directory /mnt/disk1/downloads (including the | character in the front)

2. Convert backslashes (\) into forward slashes (/)

3. Remove the carriage return (\r) character at the end of the line

 

windows_file = name of the source windows file

linux_file = name of the destination linux file (suitable for bunker)

 

Adapt the names of the folders and directories as needed.

 

 

Link to comment

Thanks, will start my sed learning with your quick boost up.  "Sed for Dummies" here we come....

 

One more question, I have one 3tb drive with documents, that the contents change daily.  It would be something like your source code drive for Dynamix.  Each day I want to scan the drive for only the files where the file date is newer than the hash date, and update the hashes for only these few files. 

 

 "bunker -u" seems to want to take 24 hours for this task.  

 

Is it calculating all the hashes over again?  Is there a different way I should be doing this? 

 

  "bunker -a" will check a 3tb drive for new files in only a few minutes.

Link to comment

-u option will recalculate all files, while the -a option skips existing hashes and only calculate new files.

 

I assume that 'new' is files without hashes regardless of the file timestamps?

 

Is there an option to recalculate hashes for files that have a modified timestamp or last written timestamp after the hash calculated timestamp?

Link to comment

I assume that 'new' is files without hashes regardless of the file timestamps?

 

Correct

 

Is there an option to recalculate hashes for files that have a modified timestamp or last written timestamp after the hash calculated timestamp?

 

See the -d option

 

Link to comment

But what about file saved / updated date.  Can that be compared??

 

 -d <days>   optional only verify/update/remove for files scanned <days> or longer ago

 

I only want to update the hashes for files that have been updated since the last hash calculation.

 

For example a script file is opened in notepad++ and changed and resaved under the same name.  The hash for this file is no longer valid and needs updated.  This file can be found quickly if you compare the file change date with the latest hash date.  Then you can calculate a new hash for only this one file.

 

@bonienl (your sed script works like a charm)

Link to comment

For example a script file is opened in notepad++ and changed and resaved under the same name.  The hash for this file is no longer valid and needs updated.  This file can be found quickly if you compare the file change date with the latest hash date.  Then you can calculate a new hash for only this one file.

 

This is not possible with bunker, a workaround can be to write your own wrapper which compares file dates and sizes and calls bunker for only those files which have changed.

 

I'll think about an implementation in bunker itself too, but can't give timelines.

 

Link to comment

For example a script file is opened in notepad++ and changed and resaved under the same name.  The hash for this file is no longer valid and needs updated.  This file can be found quickly if you compare the file change date with the latest hash date.  Then you can calculate a new hash for only this one file.

 

This is not possible with bunker, a workaround can be to write your own wrapper which compares file dates and sizes and calls bunker for only those files which have changed.

 

I'll think about an implementation in bunker itself too, but can't give timelines.

 

There's gotta be a method to do this simply.

If mtime > scandate, assume file has changed and recalculate hash.

 

As far as a custom local wrapper, there is the find command with -newer.

 

Set some semaphore file time with touch.

Do a find /mnt/disk? -newer semaphorefile -print > filelist

 

Touch the semaphore file after you are done.

 

The only issue with this method is the possibility to miss a file.

So it has to be more involved with a touch -r semaphorefile semaphorefile.tmp, then touch the actual semaphore file immediately.

After that use the tmp semaphore file with the newer option, then remove it.

However with this method, if you kill the job, the semaphore file has already been updated.

It's much more involved then using the scandate and mtime comparison. (Which would be much less error prone when done correctly).

Link to comment

Ok, I made an update to bunker and released version 1.5

 

In this version there is a new command

 

-U = same as -u command, but only updates files which are newer than the scandate of the hash. In other words files which have changed after the hash has been calculated

 

There is also a new option

 

-D <time> = only include files which are newer than the specified time. The time is expressed in seconds (s), minutes (m), hours (h), days (d) or weeks (w). Eg. -D 1w means files created/modified in the last week.

 

For example you can do the following:

 

bunker -U /mnt/user/data  - this will update only the files in /mnt/user/data which have been changed since the last hash calculation

 

bunker -U -D 4h /mnt/user/data - the same as above but search is limited to files which are modified in the last four hours

 

See OT to download the new version, and let me know if it is useful.

 

 

Link to comment

Thanks bonienl.  Just threw it at 2 servers both with one 3 tb drive to check.

 

Server 1 Finished. Verified 11 files Skipped 197499 files Found 9 mismatches.

Server 2 Finished. Verified 2 files Skipped 209471 files Found 0 mismatches.

 

Run time about 2hr 15min each server, one with a Xeon 1220 and the other with a Amd Phenom X6 1055T.

Link to comment

Thanks bonienl.  Just threw it at 2 servers both with one 3 tb drive to check.

 

Server 1 Finished. Verified 11 files Skipped 197499 files Found 9 mismatches.

Server 2 Finished. Verified 2 files Skipped 209471 files Found 0 mismatches.

 

Run time about 2hr 15min each server, one with a Xeon 1220 and the other with a Amd Phenom X6 1055T.

 

When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter.

 

Link to comment

 

When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter.

 

But doesn't your new code make it only look at files that have changed anyway.  I tried it as follows.

 

  bunker -U -l -D 1w  /mnt/disk2

 

I am getting an error

 

  /boot/config/bunker: line 152: -1: substring expression < 0

  date: invalid date `@'

 

Link to comment

 

When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter.

 

But doesn't your new code make it only look at files that have changed anyway.  I tried it as follows.

 

  bunker -U -l -D 1w  /mnt/disk2

 

I am getting an error

 

  /boot/config/bunker: line 152: -1: substring expression < 0

  date: invalid date `@'

 

The -D option acts as a filter in the find command, when it is used there will be less files to 'process', resulting in less execution time. You are right that the same result is obtained without the option but may take longer to execute.

 

Strange, I copied/pasted your command above and it runs fine. The error says that the value '1w' after the -D option is somehow incorrect, but it isn't ...

 

Link to comment
  • 4 weeks later...

I'm running unraid 5. I ran...

 

time bunker -U -md5 /mnt/user/Media/

 

and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either.

 

Additional question: How does this handle deleted files with -U / exporting?

Link to comment

I'm running unraid 5. I ran...

 

time bunker -U -md5 /mnt/user/Media/

 

and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either.

 

Additional question: How does this handle deleted files with -U / exporting?

 

When using the -u or -U command it will only update those files which have been previously added by the -a command. This means that any files which don't have a checksum value in their extended attributes are skipped. If you want to ensure all files have a checksum then run the -a command first.

 

Mismatches can be logged in the syslog (-l option) or in a designated file (-f option).

 

The program will abort execution immediately with an error message when sha256 or md5 executables are not present.

 

If you want to see missing files then you need to make use of the export (-e) and check (-c) commands. This allows you to store the checksums in an external file, which can be checked later for any mismatching or missing files. You need version 1.6 to do this.

 

Link to comment

I'm running unraid 5. I ran...

 

time bunker -U -md5 /mnt/user/Media/

 

and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either.

 

Additional question: How does this handle deleted files with -U / exporting?

 

When using the -u or -U command it will only update those files which have been previously added by the -a command. This means that any files which don't have a checksum value in their extended attributes are skipped. If you want to ensure all files have a checksum then run the -a command first.

 

Mismatches can be logged in the syslog (-l option) or in a designated file (-f option).

 

The program will abort execution immediately with an error message when sha256 or md5 executables are not present.

 

If you want to see missing files then you need to make use of the export (-e) and check (-c) commands. This allows you to store the checksums in an external file, which can be checked later for any mismatching or missing files. You need version 1.6 to do this.

 

How does one add files that have not been indexed yet, then? Do I need to run -a over the entire mount again? Then follow up with -U?

 

Sorry... just one more question. How will this prevent against corruptions? I'm trying to come to terms with this situation:

 

1) I put a file on my drive, call it A, get the checksum

2) A is corrupted silently

3) I run the checksum again and it's marked as "changed" / failed to verify. But...

 

What if

 

1) I put a file on my drive, call it A, get the checksum

2) I run a checksum

3) I change the file

4) It gets corrupted

5) I run verify / it's marked as changed but I knew I changed it.

 

How does that work?

Link to comment

None of these tools "prevent corruption", they detect it by comparing against a known checksum at some moment in time.

 

If you change the file but fail to update the checksum it will always report as being non matching. Its as simpIe as that.

Link to comment

How does one add files that have not been indexed yet, then? Do I need to run -a over the entire mount again? Then follow up with -U?

 

You only need to run the -a command to add new files, it will skip any files which have already a chechsum.

 

Sorry... just one more question. How will this prevent against corruptions? I'm trying to come to terms with this situation:

 

1) I put a file on my drive, call it A, get the checksum

2) A is corrupted silently

3) I run the checksum again and it's marked as "changed" / failed to verify. But...

 

What if

 

1) I put a file on my drive, call it A, get the checksum

2) I run a checksum

3) I change the file

4) It gets corrupted

5) I run verify / it's marked as changed but I knew I changed it.

 

How does that work?

 

Using the verify (-v) command will tell you which files have their checksum changed. This could be because the file itself has changed or some corruption occured. The program can't tell the difference, only you can. So you have to decide to (a) mark the file as bad or (b) update its checksum.

 

Link to comment

Thanks a lot! That's a shame for some of my more frequently changing files. I'll have to think about it, most of the media should be fine, though.

 

You can use the -u command in combination with the -D <time> option, this allows you to update only files which have been modified in the last <time> period. Eg.

 

bunker -u -D 1h /mnt/user/files

 

Will update those files modified in the last hour.

 

Note: files must have initially been added using the -a command

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.