bunker - yet another utility for file integrity checks

MortenSchmidt · April 27, 2015

Are there other differences I've missed??

Yes, Blake2 support in Bunker. Much less CPU intensive. A must if you want to several disks at the same time, or if you want plex transcoding while using it.

tr0910 · June 7, 2015

I have a text file from SHA256 for Windows that needs translated into Linux for a repetitive Win->unRaid workflow.

Image files get created in Windows, have their SHA256 hashes calculated, rsynced to unRaid via cwrsync along with the hash text file, have their Hashes verified, and bunker then puts the hashes into the extended attributes. Problems first Windows EOL vs Unix EOL. Note the difference between what we have from Windows and what we need for bunker import:

0c50530b7c6e9ce822aadd70a6d69af2620dc311f40d018c2e247017a08547e7  C:\Users\Images\Image Compare\IMG_2456.JPG
bec92581255ff043d576f3bd4f2dc530b41c75662390d14b34480dd2a718b8fd  C:\Users\Images\Image Compare\IMG_2456A.JPG

unRaid bunker needs:

0c50530b7c6e9ce822aadd70a6d69af2620dc311f40d018c2e247017a08547e7|/mnt/disk1/downloads/IMG_2456.JPG
bec92581255ff043d576f3bd4f2dc530b41c75662390d14b34480dd2a718b8fd|/mnt/disk1/downloads/IMG_2456A.JPG

What is the best way to take a Windows text file and translate into unRaid bunker friendly format?

bonienl · June 7, 2015

You can use the linux sed command to do a conversion:

sed 's! C:\\Users\\Images\\Image Compare!|/mnt/disk1/downloads!;s!\\!/!g;s!\r!!' windows_file >linux _file

The above line does the following:

1. Convert the windows folder C:\Users\Images\Image Compare to the linux directory /mnt/disk1/downloads (including the | character in the front)

2. Convert backslashes (\) into forward slashes (/)

3. Remove the carriage return (\r) character at the end of the line

windows_file = name of the source windows file

linux_file = name of the destination linux file (suitable for bunker)

Adapt the names of the folders and directories as needed.

tr0910 · June 7, 2015

Thanks, will start my sed learning with your quick boost up. "Sed for Dummies" here we come....

One more question, I have one 3tb drive with documents, that the contents change daily. It would be something like your source code drive for Dynamix. Each day I want to scan the drive for only the files where the file date is newer than the hash date, and update the hashes for only these few files.

 "bunker -u" seems to want to take 24 hours for this task.

Is it calculating all the hashes over again? Is there a different way I should be doing this?

  "bunker -a" will check a 3tb drive for new files in only a few minutes.

bonienl · June 7, 2015

-u option will recalculate all files, while the -a option skips existing hashes and only calculate new files.

BRiT · June 7, 2015

-u option will recalculate all files, while the -a option skips existing hashes and only calculate new files.

I assume that 'new' is files without hashes regardless of the file timestamps?

Is there an option to recalculate hashes for files that have a modified timestamp or last written timestamp after the hash calculated timestamp?

bonienl · June 7, 2015

I assume that 'new' is files without hashes regardless of the file timestamps?

Correct

Is there an option to recalculate hashes for files that have a modified timestamp or last written timestamp after the hash calculated timestamp?

See the -d option

tr0910 · June 7, 2015

But what about file saved / updated date. Can that be compared??

 -d <days>   optional only verify/update/remove for files scanned <days> or longer ago

I only want to update the hashes for files that have been updated since the last hash calculation.

For example a script file is opened in notepad++ and changed and resaved under the same name. The hash for this file is no longer valid and needs updated. This file can be found quickly if you compare the file change date with the latest hash date. Then you can calculate a new hash for only this one file.

@bonienl (your sed script works like a charm)

bonienl · June 8, 2015

For example a script file is opened in notepad++ and changed and resaved under the same name. The hash for this file is no longer valid and needs updated. This file can be found quickly if you compare the file change date with the latest hash date. Then you can calculate a new hash for only this one file.

This is not possible with bunker, a workaround can be to write your own wrapper which compares file dates and sizes and calls bunker for only those files which have changed.

I'll think about an implementation in bunker itself too, but can't give timelines.

tr0910 · June 8, 2015

Maybe I don't have the workflow properly understood. How would you handle this situation?

(It just took 23 hours to bunker -u a 3tb drive with only 10 updates. Bunker -a is done in a few minutes to add 10 files)

WeeboTech · June 8, 2015

For example a script file is opened in notepad++ and changed and resaved under the same name. The hash for this file is no longer valid and needs updated. This file can be found quickly if you compare the file change date with the latest hash date. Then you can calculate a new hash for only this one file.

This is not possible with bunker, a workaround can be to write your own wrapper which compares file dates and sizes and calls bunker for only those files which have changed.

I'll think about an implementation in bunker itself too, but can't give timelines.

There's gotta be a method to do this simply.

If mtime > scandate, assume file has changed and recalculate hash.

As far as a custom local wrapper, there is the find command with -newer.

Set some semaphore file time with touch.

Do a find /mnt/disk? -newer semaphorefile -print > filelist

Touch the semaphore file after you are done.

The only issue with this method is the possibility to miss a file.

So it has to be more involved with a touch -r semaphorefile semaphorefile.tmp, then touch the actual semaphore file immediately.

After that use the tmp semaphore file with the newer option, then remove it.

However with this method, if you kill the job, the semaphore file has already been updated.

It's much more involved then using the scandate and mtime comparison. (Which would be much less error prone when done correctly).

bonienl · June 8, 2015

Ok, I made an update to bunker and released version 1.5

In this version there is a new command

-U = same as -u command, but only updates files which are newer than the scandate of the hash. In other words files which have changed after the hash has been calculated

There is also a new option

-D <time> = only include files which are newer than the specified time. The time is expressed in seconds (s), minutes (m), hours (h), days (d) or weeks (w). Eg. -D 1w means files created/modified in the last week.

For example you can do the following:

bunker -U /mnt/user/data - this will update only the files in /mnt/user/data which have been changed since the last hash calculation

bunker -U -D 4h /mnt/user/data - the same as above but search is limited to files which are modified in the last four hours

See OT to download the new version, and let me know if it is useful.

WeeboTech · June 8, 2015

Nice job.

bonienl · June 8, 2015

Yeah, it wasn't that complicated afterall ...

tr0910 · June 8, 2015

Thanks bonienl. Just threw it at 2 servers both with one 3 tb drive to check.

Server 1 Finished. Verified 11 files Skipped 197499 files Found 9 mismatches.

Server 2 Finished. Verified 2 files Skipped 209471 files Found 0 mismatches.

Run time about 2hr 15min each server, one with a Xeon 1220 and the other with a Amd Phenom X6 1055T.

bonienl · June 9, 2015

Thanks bonienl. Just threw it at 2 servers both with one 3 tb drive to check.

Server 1 Finished. Verified 11 files Skipped 197499 files Found 9 mismatches.

Server 2 Finished. Verified 2 files Skipped 209471 files Found 0 mismatches.

Run time about 2hr 15min each server, one with a Xeon 1220 and the other with a Amd Phenom X6 1055T.

When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter.

tr0910 · June 14, 2015

When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter.

But doesn't your new code make it only look at files that have changed anyway. I tried it as follows.

bunker -U -l -D 1w /mnt/disk2

I am getting an error

/boot/config/bunker: line 152: -1: substring expression < 0

date: invalid date `@'

bonienl · June 14, 2015

When you do a periodic check, say every day, then using the -D 1d option can speed up the process considerably, as it will make the list of files to verify a lot shorter.

But doesn't your new code make it only look at files that have changed anyway. I tried it as follows.

bunker -U -l -D 1w /mnt/disk2

I am getting an error

/boot/config/bunker: line 152: -1: substring expression < 0

date: invalid date `@'

The -D option acts as a filter in the find command, when it is used there will be less files to 'process', resulting in less execution time. You are right that the same result is obtained without the option but may take longer to execute.

Strange, I copied/pasted your command above and it runs fine. The error says that the value '1w' after the -D option is somehow incorrect, but it isn't ...

sincero · July 8, 2015

I'm running unraid 5. I ran...

time bunker -U -md5 /mnt/user/Media/

and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either.

Additional question: How does this handle deleted files with -U / exporting?

bonienl · July 8, 2015

I'm running unraid 5. I ran...
time bunker -U -md5 /mnt/user/Media/
and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either.

Additional question: How does this handle deleted files with -U / exporting?

When using the -u or -U command it will only update those files which have been previously added by the -a command. This means that any files which don't have a checksum value in their extended attributes are skipped. If you want to ensure all files have a checksum then run the -a command first.

Mismatches can be logged in the syslog (-l option) or in a designated file (-f option).

The program will abort execution immediately with an error message when sha256 or md5 executables are not present.

If you want to see missing files then you need to make use of the export (-e) and check (-c) commands. This allows you to store the checksums in an external file, which can be checked later for any mismatching or missing files. You need version 1.6 to do this.

sincero · July 8, 2015

I'm running unraid 5. I ran...
time bunker -U -md5 /mnt/user/Media/
and it finished in about 3 minutes and 40s. I don't know if I have the sha256 package or MD5, really. Is there anything I need to do before running that? It seemed a little too quick. I don't see a .log file in temp, either.

Additional question: How does this handle deleted files with -U / exporting?
When using the -u or -U command it will only update those files which have been previously added by the -a command. This means that any files which don't have a checksum value in their extended attributes are skipped. If you want to ensure all files have a checksum then run the -a command first.

Mismatches can be logged in the syslog (-l option) or in a designated file (-f option).

The program will abort execution immediately with an error message when sha256 or md5 executables are not present.

If you want to see missing files then you need to make use of the export (-e) and check (-c) commands. This allows you to store the checksums in an external file, which can be checked later for any mismatching or missing files. You need version 1.6 to do this.

How does one add files that have not been indexed yet, then? Do I need to run -a over the entire mount again? Then follow up with -U?

Sorry... just one more question. How will this prevent against corruptions? I'm trying to come to terms with this situation:

1) I put a file on my drive, call it A, get the checksum

2) A is corrupted silently

3) I run the checksum again and it's marked as "changed" / failed to verify. But...

What if

1) I put a file on my drive, call it A, get the checksum

2) I run a checksum

3) I change the file

4) It gets corrupted

5) I run verify / it's marked as changed but I knew I changed it.

How does that work?

BRiT · July 8, 2015

None of these tools "prevent corruption", they detect it by comparing against a known checksum at some moment in time.

If you change the file but fail to update the checksum it will always report as being non matching. Its as simpIe as that.

bonienl · July 8, 2015

How does one add files that have not been indexed yet, then? Do I need to run -a over the entire mount again? Then follow up with -U?

You only need to run the -a command to add new files, it will skip any files which have already a chechsum.

Sorry... just one more question. How will this prevent against corruptions? I'm trying to come to terms with this situation:

1) I put a file on my drive, call it A, get the checksum

2) A is corrupted silently

3) I run the checksum again and it's marked as "changed" / failed to verify. But...

What if

1) I put a file on my drive, call it A, get the checksum

2) I run a checksum

3) I change the file

4) It gets corrupted

5) I run verify / it's marked as changed but I knew I changed it.

How does that work?

Using the verify (-v) command will tell you which files have their checksum changed. This could be because the file itself has changed or some corruption occured. The program can't tell the difference, only you can. So you have to decide to (a) mark the file as bad or (b) update its checksum.

sincero · July 8, 2015

Thanks a lot! That's a shame for some of my more frequently changing files. I'll have to think about it, most of the media should be fine, though.

bonienl · July 8, 2015

Thanks a lot! That's a shame for some of my more frequently changing files. I'll have to think about it, most of the media should be fine, though.

You can use the -u command in combination with the -D <time> option, this allows you to update only files which have been modified in the last <time> period. Eg.

bunker -u -D 1h /mnt/user/files

Will update those files modified in the last hour.

Note: files must have initially been added using the -a command

bunker - yet another utility for file integrity checks

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

bonienl

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation