Dynamix File Integrity plugin


bonienl

Recommended Posts

Thank you. I think it may have something to do with my QDirStat docker. It only seems to happen with that docker running.

 

After a reboot and disabling that docker, my scans seem to be working.

 

What would you recommend my next steps be to find out why that docker is giving me OOM errors (or if something else is).

Link to comment

It's happening again without QDirstat running. So far Disk 2 is the only disk that has found files after multiple scan attempts on all disks. I know there are tons of files on the other disks that are not being found for some reason.

Link to comment

I just had the plugin run on schedule for the first time. I got a notification that it started but I can't seem to find any form of progress indicator anywhere. Am I missing something?

 

Edit: Now I got a notification that it finished and a log of files with "BLAKE2 hash key corruption" but they're all files that were deleted already. Is that intended behavior?

Edited by KnifeFed
Link to comment
  • 2 weeks later...
  • 1 month later...

I would like to ask the question from December 2020 again.

What about BLAKE3 integration?

 

https://github.com/BLAKE3-team/BLAKE3/releases/tag/1.0.0

 

Is anybody working on that, or should I create a pull request for that?

By now, enough people should have CPUs with AVX2 support to use it.

 

Performance could be even better than ardware accelerated SHA https://www.phoronix.com/forums/forum/software/programming-compilers/1175724-blake3-cryptographic-hashing-function-sees-experimental-vulkan-implementation?p=1176083#post1176083 which is only supported on latest CPUS https://en.wikipedia.org/wiki/Intel_SHA_extensions

Quote

If all that's correct, I'd interpret these figures to mean that on the Ryzen 5 3400G, single-threaded BLAKE3 is slightly faster than hardware-accelerated SHA-256.

 

Edited by Falcosc
Link to comment
On 6/24/2021 at 3:47 AM, sunbear said:

Just FYI, the find command created by this plugin is hanging and preventing my array from stopping. It gives me an unclean shutdown every time I reboot so I had to uninstall it.

There are 2 find commands:

  1. is executed once a day
  2. is executed every 2 minutes to monitor active drives while watching the File Integrity Control page

I had to fine tune the 2nd one because it was messing with my check performance.

I could resolve your issue as well, could you try to describe when these did occur to remove them?

Link to comment
On 6/9/2021 at 6:29 PM, Supershocker said:

MD5 hash key mismatch, /stats/combnk.html is corrupted
MD5 hash key mismatch, box/fixedpoint/ui/nedVisuals/fxpHistogram/release/images/toolstrip/liveeditor/left-align-24.png is corrupted
MD5 hash key mismatch, box/matlab/datatypes/duration/+matlab/+internal/+duration/getDetectionFormats.m is corrupted
MD5 hash key mismatch (updated), /mnt/disk1/system/docker/docker.img was modified
MD5 hash key mismatch (updated), /mnt/disk1/system/libvirt/libvirt.img was modified

 

I got the above response from an automated check. I understand docker and libvirt normally change but the corruptions I don't understand. I cant figure out what these files are and how to resolve this. 

Depending on how you use it you can get false positives. I got 2 as well but was able to fix them.

If you are still interested, you can get in touch with me.

Link to comment
On 4/22/2021 at 11:06 PM, touz said:

Did you finally found out why file names/paths are truncated? It also happens to me, and can be very annoying to find the exact file which is "supposed" to be corrupted.

I can fix this issue. At which action did this happen?

Link to comment
On 12/17/2020 at 8:51 AM, Fizz said:

My scheduled verify job reported "bunker verify command Found 1 file with BLAKE2 hash key corruption" but doesn't tell me what file is corrupted. I had error logging set to syslog in settings, but I don't see the file listed there either. How can I see what file is causing the problem?

 

Here is the relevant part of syslog:


bunker: error: BLAKE2 hash key mismatch,  is corrupted
bunker: Verify task for disk1 finished, duration: 12 hr, 2 min, 0 sec.
bunker: verified 455797 files from /mnt/disk1. Found: 0 mismatches, 1 corruption. Duration: 12:02:00. Average speed: 98.3 MB/s

 

 

On 12/20/2020 at 8:24 PM, Fizz said:

I posted about a similar problem above. For me, syslog does not say what the corrupted file is.

 

On 12/28/2020 at 3:59 PM, CS01-HS said:

Was this ever resolved? I'm seeing it now.

 

On 1/6/2021 at 8:33 PM, vakilando said:

same here:

two of the corrupt files are not mentioned and I don't know what/where the third one is


Jan  6 03:31:59  bunker: error: BLAKE2 hash key mismatch,  is corrupted
Jan  6 03:51:41  bunker: error: BLAKE2 hash key mismatch,  is corrupted
Jan  6 03:58:50  bunker: error: BLAKE2 hash key mismatch, ng/fs.json is corrupted
Jan  6 04:02:59  bunker: Verify task for disk1 finished, duration: 1 hr, 2 min, 58 sec.
Jan  6 04:02:59  bunker: verified 198104 files from /mnt/disk1. Found: 0 mismatches, 3 corruptions. Duration: 01:02:58. Average speed: 36.1 MB/s

 

 

On 2/13/2021 at 1:06 PM, mani3321 said:

Is there a solution regarding false positives?

 

On 3/1/2021 at 10:23 AM, servidude said:

I'm seeing similar notifications about 3 supposedly corrupted files, but no names!


BLAKE2 hash key mismatch,  is corrupted
BLAKE2 hash key mismatch,  is corrupted
BLAKE2 hash key mismatch,  is corrupted

 

The corresponding syslog entries are also missing file info, for example:


Feb 28 18:17:44 Tower bunker: error: BLAKE2 hash key mismatch, is corrupted

 

What is the cause of this (apparently I'm not the only one seeing it), and how can I fix it - is there a way to reset the file hashes?

 

 

On 4/22/2021 at 11:06 PM, touz said:

Did you finally found out why file names/paths are truncated? It also happens to me, and can be very annoying to find the exact file which is "supposed" to be corrupted.

 

I found a possible issue, something can get wrong during parsing the response from getting the extended file attributes.

I can fix it if anybody of you is still around, you just need to execute the following commands to create something which can be manually checked:
 

path=/mnt/disk1
hash=blake2
find "$path" -type f -name "*" -exec getfattr -n user.$hash --absolute-names "{}" 2>/dev/null 1>/tmp/debugfiles_raw.txt +
sed -n "/^$/d;s/^# file: //;h;n;s/^user.$hash=\"//;s/\"$//;G;s/\n/ \*/;p" /tmp/debugfiles_raw.txt 1>/tmp/debugfiles_sed.txt

In /tmp/debugfiles_sed.txt you will find your truncated line or an empty line or a line where the key is there but no file name after the key.
I would like to see your result to fix it.

I was unable to construct a broken file to get this to fail. Maybe there is an unexpected response from getfattr on stdout.

Edited by Falcosc
  • Thanks 1
Link to comment
12 hours ago, Falcosc said:

I found a possible issue, something can get wrong during parsing the response from getting the extended file attributes.

I can fix it if anybody of you is still around, you just need to execute the following commands to create something which can be manually checked:
 





path=/mnt/disk1
hash=blake2
find "$path" -type f -name "*" -exec getfattr -n user.$hash --absolute-names "{}" 2>/dev/null 1>/tmp/debugfiles_raw.txt +
sed -n "/^$/d;s/^# file: //;h;n;s/^user.$hash=\"//;s/\"$//;G;s/\n/ \*/;p" /tmp/debugfiles_raw.txt 1>/tmp/debugfiles_sed.txt

In /tmp/debugfiles_sed.txt you will find your truncated line or an empty line or a line where the key is there but no file name after the key.
I would like to see your result to fix it.

I was unable to construct a broken file to get this to fail. Maybe there is an unexpected response from getfattr on stdout.

Appreciate this script to check does abnormal found in extend attributes.

Although I haven't such problem.

Edited by Vr2Io
Link to comment
1 hour ago, Vr2Io said:

No update for longtime, the creator seems stop development.

Community comes to the rescue!

 

4 hours ago, looop said:

will the app keep update?

This weekend some or all of my pull requests get tested and some time later released

 

I did these updates to speed up my process because the old tool can't handle small files:

Old version: Average speed: 62 MB/s

new Version: Average speed: 167 MB/s

Edited by Falcosc
  • Like 3
Link to comment
1 hour ago, Falcosc said:

Community comes to the rescue!

Could you add a feature for user choice md5 / blake2 / blake3 ... and a option to generate multiple hash in parallel, for minimize the change, verify hash only on the one as setting only.

Link to comment
20 minutes ago, Vr2Io said:

Could you add a feature for user choice md5 / blake2 / blake3 ... and a option to generate multiple hash in parallel, for minimize the change, verify hash only on the one as setting only.

I thought about it, but it is disruptive.

Each hash has it's own field
User.md5
User.blake2
and so on
So for hashes we already on track :)

 

But the following fields are shared:
User.filesize (file size while hash creation)
User.filedate (fie data during hash creation)
User.scandate (last scan date)

So if you change the hash method the plugin will get confused because the shared fields doesn't match to the hash creation.

To solve it we would have to use this structure:
User.blake2="hash filesize filedate scandate"

I have tried this structure, it does improve creation time a bit because you need 3 syscalls less. But this change would invalidate existing attributes.

I will consider this for a later release, for example by adding a migration script to avoid slow and complex bash script handling of backwards compatibility. For now, it is better to only use one hash method.

By the way, b2sum is a bit unstable. It does return no key in one of 400.000 calls on my host. That's the reason why so many people have issues with truncated file names. If key is missing, the file name gets truncated by the length of the missing key.

blake3 will fix this because it comes with a new CLI b3sum and I did already called b3sum 1.200.000 times without a single crash.

 

 

Edited by Falcosc
Link to comment
2 minutes ago, Falcosc said:

But the following fields are shared:
User.filesize (file size while hash creation)
User.filedate (fie data during hash creation)
User.scandate (last scan date)

So if you change the hash method the plugin will get confused because the shared fields doesn't match to the hash creation.

To solve it we would have to use this structure:
User.blake2="hash filesize filedate scandate"

I miss this situation, because according the hash result file, it haven't any filesize / filedate / scandate info., so I think FIP just check does hash in extend attributes, if yes then ignore it.

 

10 minutes ago, Falcosc said:

By the way, b2sum is a bit unstable. It does return no key in one of 500.000 calls on my host. Thats the reason why so many people have issues with truncated file names. If key is missing, the file name gets truncated by the length of the missing key.

I use manual to trigger FIP execute, I am not sure does due to this so I almost haven't trouble.

 

11 minutes ago, Falcosc said:

blake3 will fix this, because it comes with a new cli and I did already called b3sum 1.200.000 times without a single crash.

Nice.

Link to comment

I don't know why b2sum does not return a key in very rare cases because I did not log stderr and sigcode of all 800.000 b2sum calls. But it happened twice on my host and because b3sum is still without any calculation failure after 1.200.000 calls on 200.000 different files, I am confident that our b2sum binary is a bit unstable and may need an update.
Testing such rare issue is complicated because my 1.200.000 hashes needed to be validated as well.

 

If anybody want to still use the old blake2, we may have to check if the old b2sum cli got any kind of fixes.

But blake3 is better than blake2 in performance anyway we don't take care of b2sum and if you need tools without blake3 support you should consider sha256 with hardware acceleration https://en.wikipedia.org/wiki/Intel_SHA_extensions

 

It's cool that software calculated blake3 is even faster than current Intel SHA extensions :D That shows just how efficient blake3 really is.

Edited by Falcosc
Link to comment
4 minutes ago, Falcosc said:

I don't know why b2sum does not return a key in very rare cases because I did not log stderr and sigcode of all 800.000 b2sum calls. But it happened twice on my host and because b3sum is still without any calculation failure after 1.200.000 calls on 200.000 different files, I am confident that our b2sum binary is a bit unstable and may need an update.
Testing such rare issue is complicated because my 1.200.000 hashes need to be validated as well.

Sorry for my misunderstand, I put all small file in one disk and it only have 148.700 files, so this far less then your test.

 

image.thumb.png.7a08bf1e949d0d155fce08a5df1a553f.png

Link to comment

I use 500GB test data with 180.000 files and I run through that about 20 times already to get all these improvements tested and benchmarked.

 

I got 2 builds with 1 missing key each. And it was not the same file. So b2sum did just not return a key and I don't know the condition at which this happened. I don't terminate my test runs, so I was really unsure what is happening. But the user responses related to truncated files did prove that something is unstable.

 

Did you notice that all truncated file reports did use blake2? Maybe just a coincidence, but maybe a clue.

Edited by Falcosc
  • Like 1
Link to comment
On 6/15/2021 at 12:28 AM, KnifeFed said:

I just had the plugin run on schedule for the first time. I got a notification that it started but I can't seem to find any form of progress indicator anywhere. Am I missing something?

This did bother me, too.
You see it on the circle, but a progress bar is much better. So I added it:

At some point we lost our documentation, it was still available but kind of hidden. I add the help text toggle to the disc table and button bar to give new users a more easy start:

On 12/7/2020 at 3:00 PM, CS01-HS said:

I recently changed Error Logging from Syslog Only to Syslog and Output file.

 

34806807_ScreenShot2020-12-07at8_42_31AM.thumb.png.6b18413b81091ceb4b96088dd9bfbbf4.png

 

After a check I see the entries in syslog but can't find the "output file" - any idea where it's stored? (I saw a couple of posters ask but didn't see an answer.)

 

1098729893_ScreenShot2020-12-07at8_42_04AM.thumb.png.fa86edde23ee95050139acf6b2be48ce.png

 

On 12/20/2020 at 4:50 PM, pman said:

I received an unraid alert for "Found 1 file with SHA256 hash key corruption". How do I tell which file is corrupt? No log file was created. My last dynamix log file is from 2019. Thanks.

 

EDIT: I ran this cmd to find it, but surely there has to be a better way? Why is it not logging? "cat /var/log/syslog | grep corrupt"

Which process did not appear at dynamix.file.integrity/logs?

 

For now, I did add a small non-downloadable watch errors popup which can be accessed during progress but only appears if there are at least 1 skip/warning/error:

Link to comment
On 2/18/2021 at 5:33 PM, shEiD said:

I've just installed and started using the File Integrity plugin today.

  • I manually started a `Build` process on 7 (out of 28) drives in my array
  • disk1 had least amount of files, so it has already finished

But, the UI shows some nonsense 🤔

  • it shows disk1 as a circle, not green checkmark, even though it has just finished the build, and is up-to-date
  • it shows disks 4, 5, 9 and 10 with a green checkmark, even though the builds are clearly still running and aren't finished
  • it shows disks 7, 12, 17, 18, 19, 22, 23, 24, 26, 27 and 28 with a green checkmark, even though the build process has never been run on these disk...

I mean, WAT? 😳

Can it be, that am I really not understanding what the circle/checkmark/cross means?

 

Or is this a bug?

 

2021-02-18_11-08-46__chrome.png

 

Not a bug, more an issue about presentation. Let me describe what is happening

 

  • it shows disk1 as a circle, not green checkmark, even though it has just finished the build, and is up-to-date
    • Problem is "just finished" the search for untagged files find command is only executed every 2 minutes
    • this find is very expensive, so the 2 minutes repeat is already to much for my taste
    • it is only executed every 2 minutes if you are looking at the page
    • If you look at the page everything will go slower because the search for new files does slow down spinning disks, I did fix that problem: avoid indexing of filesystem while watching progress https://github.com/bergware/dynamix/pull/45
  • it shows disks 4, 5, 9 and 10 with a green checkmark, even though the builds are clearly still running and aren't finished
    • maybe the find command did not finish because your build process is slowing the find command down to much
  • it shows disks 7, 12, 17, 18, 19, 22, 23, 24, 26, 27 and 28 with a green checkmark, even though the build process has never been run on these disk...
    • Depending on how many files you have there, the check is maybe still running

Does this feature need to be improved?

  • Like 1
Link to comment
On 6/24/2021 at 3:47 AM, sunbear said:

Just FYI, the find command created by this plugin is hanging and preventing my array from stopping. It gives me an unclean shutdown every time I reboot so I had to uninstall it.

I just found your bug.

I wonder how anybody with more than 3 disks can actually watch the progress page without crashing the array performance :D

 

One issue was that the currently running disk was disrupted by a full file system indexing in progress. I fixed that at the beginning of the week, I didn't have any more issues because disks in idle don't get checked every 2 minutes anyway.

 

But today I had 3 disks online, and I really suffered from bad array performance.

Turns out that the script with the find command doesn't check if it is already running. Which means if the total number of disks cannot finish a full file system find command in sequential order (one disk at a time) in 2 minutes, it will get a 2nd execution.

 

After some time, you stack many find commands over each other that the whole array nearly stops working.

 

Here is the fix https://github.com/bergware/dynamix/pull/45/commits/ebe6e770d1c46064064f9b3f4434dbc204b161e5

Link to comment

I did add Disk status check icons to fix the confusing presentation issue of our disk status.

 

Maybe the Disk Status was broken, there was a regex which was unable to delete the build status if build was not the first entry.

 

128567507-89de90d3-7d8b-4c76-9ec6-6ca369

 

It scans only one disk at a time and take a while if you have many files. Disk 3 gets skipped because it has work to do. I thought about parallel execution, but rejected the Idea because it would hurt the array performance, so we keep it sequential.
 

Edited by Falcosc
  • Like 1
Link to comment

Latest Unraid (6.9.2), all plugins updated.

 

I'm not sure that I have this utility setup properly, or perhaps I just don't fully understand the outputs.  After running through the various functions (Build, Export, Check), I saw a bunch of "bunker: error: no export of file: /mnt/..." errors in the syslog (attached).  It seems like its flagging these files?  What should I do about these errors?

 

Thanks in advance!

hunternas-syslog-20210808-1534.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.