bunker - yet another utility for file integrity checks


Recommended Posts

Thanks a lot! That's a shame for some of my more frequently changing files. I'll have to think about it, most of the media should be fine, though.

 

You can use the -u command in combination with the -D <time> option, this allows you to update only files which have been modified in the last <time> period. Eg.

 

bunker -u -D 1h /mnt/user/files

 

Will update those files modified in the last hour.

 

Note: files must have initially been added using the -a command

 

My worry was more along the lines of I changed 50 documents in the last week for example, and now I need to manually verify each is OK before changing their checksums, no? I can't just let it auto update or it might have gone bad since and it'l get the "bad" checksum and I might not know until it's too late.

Link to comment

Well you have to make a strategy for yourself how to protect and this really depends on how you are using the system.

 

A possible scenario in your case (and you may want to automate this in cron):

 

1. At regular intervals, e.g. once a day run the -a command to add any new files

2. Perform -u -D <time> command to update files after you own modifications

3. Run the -v command say every week to detect any corruption

 

Link to comment

Have you had any thoughts of turning this into a plugin that would be able to integrate with v6 web GUI and give you visual options to set the bunker update files and add new files on a schedule? Also be able to setup a period verification checks via the GUI?

 

Bunker really started as a side project where I needed some projection 'scheme' for myself. Did not have the intention to build a GUI front-end. It looks like only a selected few people are using it.

 

Link to comment

Bunker really started as a side project where I needed some projection 'scheme' for myself. Did not have the intention to build a GUI front-end. It looks like only a selected few people are using it.

 

No worries, thought I would inquire. I've still been using Corz but my end goal would to be having a scheduled check periodically that is installed on unRAID. Bunker seems like it will meet those requirements of adding new files, updating modified files, and checking for corruption. I wonder if only a select few are using it because the rest have no idea they should have way to verify data...

Link to comment

 

Well you have to make a strategy for yourself how to protect and this really depends on how you are using the system.

 

A possible scenario in your case (and you may want to automate this in cron):

 

1. At regular intervals, e.g. once a day run the -a command to add any new files

2. Perform -u -D <time> command to update files after you own modifications

3. Run the -v command say every week to detect any corruption

 

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file? 

Link to comment

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file?

 

-u -D <timer> acts as a filter, which means that only files modified in the period as specified by <time> will be processed. When combined with the command -u, this results in these files will get their checksum updated if a mismatch is found (e.g. because the file content has changed).

 

Link to comment

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file?

 

-u -D <timer> acts as a filter, which means that only files modified in the period as specified by <time> will be processed. When combined with the command -u, this results in these files will get their checksum updated if a mismatch is found (e.g. because the file content has changed).

 

OK, I think I understand but to clarify. If I use -u then it will update the checksum of any checksum that has changed regardless if the file is corrupt correct?

Link to comment

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file?

 

-u -D <timer> acts as a filter, which means that only files modified in the period as specified by <time> will be processed. When combined with the command -u, this results in these files will get their checksum updated if a mismatch is found (e.g. because the file content has changed).

 

OK, I think I understand but to clarify. If I use -u then it will update the checksum of any checksum that has changed regardless if the file is corrupt correct?

I think he means that it uses the file timestamp to decide whether a file has been modified since the time specified in -D.

 

If it was based instead on whether the checksum had changed then it would have to actually do the checksum on every file to know whether it had changed or not.

 

Link to comment

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file?

 

-u -D <timer> acts as a filter, which means that only files modified in the period as specified by <time> will be processed. When combined with the command -u, this results in these files will get their checksum updated if a mismatch is found (e.g. because the file content has changed).

 

OK, I think I understand but to clarify. If I use -u then it will update the checksum of any checksum that has changed regardless if the file is corrupt correct?

 

Yes, any mismatch will get corrected. Bunker can not know whether this mismatch is because of an intended or a corrupted file change.

 

Link to comment

I think he means that it uses the file timestamp to decide whether a file has been modified since the time specified in -D.

 

If it was based instead on whether the checksum had changed then it would have to actually do the checksum on every file to know whether it had changed or not.

 

Not quite. The -u (update) command will recalculate the checksum and updates any file which has a different checksum stored.

 

Using the -D option will limit the scope of files which are going to be processed.

 

Say for the past 2 hours you have been working on several files and you KNOW their content has changed, so subsequently a new checksum needs to be calculated for those files, then issuing the command "bunker -u -D 2h /path/to/files" ensures that only those files get updated.

 

Alternatively you can do "bunker -u /specific/file/name" and update each file individually.

 

Link to comment

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file?

 

-u -D <timer> acts as a filter, which means that only files modified in the period as specified by <time> will be processed. When combined with the command -u, this results in these files will get their checksum updated if a mismatch is found (e.g. because the file content has changed).

 

OK, I think I understand but to clarify. If I use -u then it will update the checksum of any checksum that has changed regardless if the file is corrupt correct?

 

Yes, any mismatch will get corrected. Bunker can not know whether this mismatch is because of an intended or a corrupted file change.

 

Understood  ;D Next question, is it possible to add the date the checksum was created into the extended attributes of the file and then have bunker flag to compare the date the checksum was created vs modified file date? I believe this is how Corz keeps track of if the file has been changed by the user or if corruption has occurred but Corz keeps that information in a separate file, which may not be possible to store that information in the extended attributes file?

Link to comment

I think he means that it uses the file timestamp to decide whether a file has been modified since the time specified in -D.

 

If it was based instead on whether the checksum had changed then it would have to actually do the checksum on every file to know whether it had changed or not.

 

Not quite. The -u (update) command will recalculate the checksum and updates any file which has a different checksum stored.

 

Using the -D option will limit the scope of files which are going to be processed.

 

Say for the past 2 hours you have been working on several files and you KNOW their content has changed, so subsequently a new checksum needs to be calculated for those files, then issuing the command "bunker -u -D 2h /path/to/files" ensures that only those files get updated.

It uses the timestamp to limit the scope or not? I think we are talking past each other.
Link to comment

I think he means that it uses the file timestamp to decide whether a file has been modified since the time specified in -D.

 

If it was based instead on whether the checksum had changed then it would have to actually do the checksum on every file to know whether it had changed or not.

 

Not quite. The -u (update) command will recalculate the checksum and updates any file which has a different checksum stored.

 

Using the -D option will limit the scope of files which are going to be processed.

 

Say for the past 2 hours you have been working on several files and you KNOW their content has changed, so subsequently a new checksum needs to be calculated for those files, then issuing the command "bunker -u -D 2h /path/to/files" ensures that only those files get updated.

It uses the timestamp to limit the scope or not? I think we are talking past each other.

 

Sorry, I misunderstood you, indeed it is based on the timestamp (=file modified date stamp) to include/exclude files.

 

Link to comment

I think he means that it uses the file timestamp to decide whether a file has been modified since the time specified in -D.

 

If it was based instead on whether the checksum had changed then it would have to actually do the checksum on every file to know whether it had changed or not.

 

Not quite. The -u (update) command will recalculate the checksum and updates any file which has a different checksum stored.

 

Using the -D option will limit the scope of files which are going to be processed.

 

Say for the past 2 hours you have been working on several files and you KNOW their content has changed, so subsequently a new checksum needs to be calculated for those files, then issuing the command "bunker -u -D 2h /path/to/files" ensures that only those files get updated.

It uses the timestamp to limit the scope or not? I think we are talking past each other.

I believe using -u -D 4h will find all the files that have been modified in the last 4 hours and recalculate the checksums of all those files. I'm assuming that it will recalculate all the checksums in the last 4 hours regardless if the checksum is different or not?

Link to comment

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file?

 

-u -D <timer> acts as a filter, which means that only files modified in the period as specified by <time> will be processed. When combined with the command -u, this results in these files will get their checksum updated if a mismatch is found (e.g. because the file content has changed).

 

OK, I think I understand but to clarify. If I use -u then it will update the checksum of any checksum that has changed regardless if the file is corrupt correct?

 

Yes, any mismatch will get corrected. Bunker can not know whether this mismatch is because of an intended or a corrupted file change.

 

Understood  ;D Next question, is it possible to add the date the checksum was created into the extended attributes of the file and then have bunker flag to compare the date the checksum was created vs modified file date? I believe this is how Corz keeps track of if the file has been changed by the user or if corruption has occurred but Corz keeps that information in a separate file, which may not be possible to store that information in the extended attributes file?

 

The scan date is stored together with the checksum. I didn't make however a function to compare the stored scan date against the file modified date. An interesting idea though !

 

Link to comment

I have a question on step #2. Does it only update the checksum if the file date has been modified? Or does it update if the checksum has changed. If it's just updating based on the modified date then hypothetical if corruption occurred then the modified file date should still be the same and then we would know once we ran the -v command. If it's updating based on changed checksum then the only way to tell if the file is corrupt would be to open that file?

 

-u -D <timer> acts as a filter, which means that only files modified in the period as specified by <time> will be processed. When combined with the command -u, this results in these files will get their checksum updated if a mismatch is found (e.g. because the file content has changed).

 

OK, I think I understand but to clarify. If I use -u then it will update the checksum of any checksum that has changed regardless if the file is corrupt correct?

 

Yes, any mismatch will get corrected. Bunker can not know whether this mismatch is because of an intended or a corrupted file change.

 

Understood  ;D Next question, is it possible to add the date the checksum was created into the extended attributes of the file and then have bunker flag to compare the date the checksum was created vs modified file date? I believe this is how Corz keeps track of if the file has been changed by the user or if corruption has occurred but Corz keeps that information in a separate file, which may not be possible to store that information in the extended attributes file?

 

The scan date is stored together with the checksum. I didn't make however a function to compare the stored scan date against the file modified date. An interesting idea though !

 

Awesome! That would certainly be a useful flag to include! That is really the only thing stopping me from jumping ship from Corz. Any chance you may look into that in the future?

Link to comment

I believe using -u -D 4h will find all the files that have been modified in the last 4 hours and recalculate the checksums of all those files. I'm assuming that it will recalculate all the checksums in the last 4 hours regardless if the checksum is different or not?

 

Correct.

Link to comment

I believe using -u -D 4h will find all the files that have been modified in the last 4 hours and recalculate the checksums of all those files. I'm assuming that it will recalculate all the checksums in the last 4 hours regardless if the checksum is different or not?

Maybe I don't understand the question, but how could it know if the checksum is different without recalculating the checksum?
Link to comment

I believe using -u -D 4h will find all the files that have been modified in the last 4 hours and recalculate the checksums of all those files. I'm assuming that it will recalculate all the checksums in the last 4 hours regardless if the checksum is different or not?

Maybe I don't understand the question, but how could it know if the checksum is different without recalculating the checksum?

hmm that is interesting if you look at the OP

-u          update mismatched hash keys with correct hash key attribute (may use -f)

so if I am understanding that flag, -u does check to see if the checksum is mismatched and recalculates if it has changed? The flaw with this method is that it doesn't distinguish between user changes or corruption.

 

EDIT: unless "updating mismatched keys" really means it just recalculates everything?

Link to comment

I believe using -u -D 4h will find all the files that have been modified in the last 4 hours and recalculate the checksums of all those files. I'm assuming that it will recalculate all the checksums in the last 4 hours regardless if the checksum is different or not?

Maybe I don't understand the question, but how could it know if the checksum is different without recalculating the checksum?

hmm that is interesting if you look at the OP

-u          update mismatched hash keys with correct hash key attribute (may use -f)

so if I am understanding that flag, -u does check to see if the checksum is mismatched and recalculates if it has changed? The flaw with this method is that it doesn't distinguish between user changes or corruption.

 

EDIT: unless "updating mismatched keys" really means it just recalculates everything?

When comparing for a mismatch, what is it going to compare against if not the calculation?
Link to comment

I believe using -u -D 4h will find all the files that have been modified in the last 4 hours and recalculate the checksums of all those files. I'm assuming that it will recalculate all the checksums in the last 4 hours regardless if the checksum is different or not?

Maybe I don't understand the question, but how could it know if the checksum is different without recalculating the checksum?

hmm that is interesting if you look at the OP

-u          update mismatched hash keys with correct hash key attribute (may use -f)

so if I am understanding that flag, -u does check to see if the checksum is mismatched and recalculates if it has changed? The flaw with this method is that it doesn't distinguish between user changes or corruption.

 

EDIT: unless "updating mismatched keys" really means it just recalculates everything?

 

There is a difference between -v (verify) and -u (update). If the intention is to find files with possible corruption then -v must be used. This will list all files which have a mismatch between the checksum stored in the extended attributes and the recalculated checksum. Next YOU have to decide which files have an expected mismatch, cause they were changed, and which ones are unexpected (possible corruption).

 

The -u command implicitely expects that files with a different checksum (again this is compared between stored checksum and the recalculated checksum) need to get updated, this means, the recalculated checksum will be written to the extended attributes thus overwriting the previous value.

 

Several options exist to shorten the search list, translating in faster execution (less files need to be recalculated), and these options may be combined with the above commands.

 

Hope this makes sense  ;D

Link to comment

I think what some are hoping here though is that if you are storing a copy of the file modification date, then there is an opportunity to determine with some accuracy whether a checksum mismatch is a user modification or a corruption.  If you detect a timestamp mismatch, then you can report "File was modified, updating checksum".  If the timestamp matches but the checksum does not, then you can report "Probable file corruption".

Link to comment

I think what some are hoping here though is that if you are storing a copy of the file modification date, then there is an opportunity to determine with some accuracy whether a checksum mismatch is a user modification or a corruption.  If you detect a timestamp mismatch, then you can report "File was modified, updating checksum".  If the timestamp matches but the checksum does not, then you can report "Probable file corruption".

 

Hit the nail on the head I think! :)

 

+1

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.