Dynamix File Integrity plugin


bonienl

Recommended Posts

Good that you did report this before my 10 bug fixes and improvement got through test phase and pushed to the public. If you have told us this one day later, I would have thought that I broke it.

 

You can ignore these, they do just tell you that you did not build a hash sum for your new or changed files.
Your disk should get a yellow circle in the files integrity page, if not, then you may need to wait for the bug fixes.

 

But don't stay to long of the file integrity status page, it has a bug which schedules an infinit amount of full file system indexing tasks.

 

These are the fixes and improvements which are coming soon:

 

If you wonder why so much is in work, well, I did start to use the plugin and was unsatisfied with the performance and put 50 hours of developing and testing into it.

Edited by Falcosc
  • Thanks 1
Link to comment

The release got delayed a bit because the maintainer is busy with Unraid 6.10 preparations.

 

Does anyone have questions about the change log before the release?
- added blake3 hash support for 2-6 times more hash rate than blake2 (reduce CPU load up to 4 times at same data rate)
- improve build, verify and check speed for all hash methods on small files
- fixed full file system find commands for build status checks while watching control page
- fixed starting multiple build status checks while watching control page
- added monitor background verify processes and all manual bunker script executions
- fixed rare process status detection bug "Operation aborted"
- fixed file name truncation if key is broken
- inline help for disk table and commands more accessible
- fixed multi-language support for buttons
- added watch error and warning messages of running or finished progress
- added Disk status icon for running build status check

Edited by Falcosc
  • Like 2
Link to comment
13 hours ago, Falcosc said:

Does anyone have questions about the change log before the release?

Last time I ran an integrity check I got a log of files with "BLAKE2 hash key corruption" and they were all files that had been deleted prior to the check. Do any of the new commits remedy this behavior?

Link to comment

Corruptions are not reported for non-existing files during check. Not before and not after this change.

 

But to use the check button, you need to export the hashes first.

- inline help for disk table and commands more accessible

 

If you did delete the wrong files because you needed to guess them based on truncated file names, then it is related to usage of b2sum rarely leads to empty hashes.

- fixed file name truncation if key is broken

 

 

Link to comment
On 8/1/2021 at 5:44 AM, Falcosc said:

 

On 8/5/2021 at 7:04 PM, Falcosc said:

It's cool that software calculated blake3 is even faster than current Intel SHA extensions :D That shows just how efficient blake3 really is.

 

Does blake3 execute in single or multi thread or will it have significant performance improve for big file ?  I really hope blake3 have break through in performance, so it will increase the efficiency on some work. In past, it only execute in single thread or use SSE2 instruction set ( ? ), improvement just small if you change the CPU.

 

Currently, I got ~725MB/s hash throughput for big file.

 

A 23GB file hash time FYR

 

time md5sum 7074E568EB95CAC4312C6B20C88C56D5.rar
7074e568eb95cac4312c6b20c88c56d5  7074E568EB95CAC4312C6B20C88C56D5.rar

real    0m31.835s
user    0m28.471s
sys     0m3.364s

time b2sum 7074E568EB95CAC4312C6B20C88C56D5.rar
124237f1c72aa51da3a561674ded10263aacfa91227551c1a291a272b78fa04e6827902ba6cf4e07304d78f3ea4ef1ff8c8160647e7fec87e4cbe376adc13783  7074E568EB95CAC4312C6B20C88C56D5.rar

real    0m26.089s
user    0m23.392s
sys     0m2.698s

time sha256sum 7074E568EB95CAC4312C6B20C88C56D5.rar
b8313452b2173c43951b4ed58efa48679eec62548db9289732f8946fda52c45e  7074E568EB95CAC4312C6B20C88C56D5.rar

real    1m28.502s
user    1m25.675s
sys     0m2.827s

time sha1sum 7074E568EB95CAC4312C6B20C88C56D5.rar
f914e2324be4a9c355b62acfd8572fa1c2371ac6  7074E568EB95CAC4312C6B20C88C56D5.rar

real    0m35.882s
user    0m33.131s
sys     0m2.751s

 

image.png.5508a552928f4adff1c4619151a59a3c.png

Link to comment

It depends on your CPU. Without AVX it uses SSE4.1 and is only 2 times faster than blake2. But with AVX it is just crazy:

speed.svg

 

1 thread 6gb/s.

Because having more than 10-60gb/s speed per disk is really uncommon even in the far future of Unraid I did force b3sum to run in single thread mode and I did force it to not use mmap.

 

mmap is an expensive syscall to put your file into userspace memory, users did already complain that having mmap does reduce the performance on small files too much because this syscall takes much longer then hashing the whole file. For that reason, the recent version of the b3sum CLI does skip mmap on files smaller than 16kb.

 

Multithreading has overhead, too. So disabling multithreading does give even more small file performance.

 

And if you use single threading, you don't benefit from mmap, you don't need to put your file into memory because you sequentially read it anyway.

If you have multiple disks in your array, you don't need multithreading at all. So for the Unraid use case, it doesn't make sense to even make the parameters of b3sum configurable in the UI.

 

I have a 2-Thread G4400T in temporary use, which is already faster than my array on blake3 despite being nearly the worst-case CPU for Unraid.

 

On some CPUs with Intel SHA Extension, blake3 is even 2 times faster than Hardware accelerated if running single threaded
https://github.com/xoofx/Blake3.NET#results-with-sha-cpu-extensions

 

Running multithreaded is just unnecessary computing overhead, nobody has storage to benefit from 10 times more speed than Hardware accelerated SHA https://news.ycombinator.com/item?id=22236507

 

But you can just test it by yourself

wget https://github.com/BLAKE3-team/BLAKE3/releases/download/1.0.0/b3sum_linux_x64_bin
time b3sum_linux_x64_bin largefile
time b3sum_linux_x64_bin --num-threads=1 --no-mmap largefile

 

Edited by Falcosc
Link to comment
16 minutes ago, Falcosc said:

Remember, with AVX-512 you can calculate 8 64bit integer in one go. So basically, 8 times speed on a single thread. That's just cheating if correctly implemented ;)


That far enough if practically have several GB/s in single thread with AVX support. My 9800x support AVX512 have some fun.

Edited by Vr2Io
Link to comment

This ridiculous speed is the reason why the addition of blake3 is more about reducing CPU stress.

 

But I managed to get a 2-3 Times speed improvement by reduce the number of expensive CLIs calls per file in the build, check and verification process. I did avoid writing numbers into the change log because the script speed improvement is only related to file count.

With an average of 2,8mb/file (180.000 files at 500GB) I got from 65mb/s to 170mb/s on any hash method.

Link to comment
5 hours ago, Falcosc said:

But you can just test it by yourself

wget https://github.com/BLAKE3-team/BLAKE3/releases/download/1.0.0/b3sum_linux_x64_bin
time b3sum_linux_x64_bin largefile
time b3sum_linux_x64_bin --num-threads=1 --no-mmap largefile

 

 

Result just crazy fast on the same 23GB file, excellent 👍 even I have set PL2 ( longterm power limit to 9800x )

 

time b3sum_linux_x64_bin 7074E568EB95CAC4312C6B20C88C56D5.rar
904b1aaf320bb5d8856cdb23c6db7b74b8f2104a34dd6894aa51f5d9200de226  7074E568EB95CAC4312C6B20C88C56D5.rar

real    0m1.034s
user    0m7.953s
sys     0m1.774s

time b3sum_linux_x64_bin --num-threads=1 --no-mmap 7074E568EB95CAC4312C6B20C88C56D5.rar
904b1aaf320bb5d8856cdb23c6db7b74b8f2104a34dd6894aa51f5d9200de226  7074E568EB95CAC4312C6B20C88C56D5.rar

real    0m5.801s
user    0m3.150s
sys     0m2.650s

 

Edited by Vr2Io
Link to comment

I don't know how good hash calculations should scale, but spending more than twice the CPU time on multithreaded blake3 (8s multi vs 3s single) sounds a bit too wasteful.

I mean, it is good that they at least give this possibility in case somebody would need the 23gb/s throughput, even if less efficient. But this just confirmed my feeling that we should keep it single threaded to improve efficiency.

 

Maybe memory access time counts as user time? Because perfect scaling should result into 60gb/s which would be already the limit of your memory performance. We need more memory channels and DDR5 🤓

 

On huge files it would be most efficient to have single threaded but with mmap because on your 23gb file run no-mmap you spend additional 900ms sys time for all the little file access calls. I don't want b3sum to pop up in the memory usage graph as a big resource hugger to avoid wrong perceptions of this wonderful tool, so I would sacrifice the sys time saving for having an invisible memory footprint with no-mmap.

 

Edited by Falcosc
Link to comment

 

55 minutes ago, Falcosc said:

On huge files it would be most efficient to have single threaded but with mmap because on your 23gb file run no-mmap you spend additional 900ms sys time for all the little file access calls. I don't want b3sum to pop up in the memory usage graph as a big resource hugger to avoid wrong perceptions of this wonderful tool, so I would sacrifice the sys time saving for having an invisible memory footprint with no-mmap.

 

Agree, we should take balance on cons pros, thanks again.

 

 

55 minutes ago, Falcosc said:

Maybe memory access time counts as user time?

 

Actually it complete in ~1s, may be it spread to multithread, so it multip with threading.

Edited by Vr2Io
Link to comment

Yes, but it should be done in 0.4s real time and 3.5s user time if it scaled better.  (user time is the sum of all thread times)

Because single threaded was done in 3s user time.

So, I thought maybe it is not a scaling issue, maybe it is a memory speed issue because for 60gb/s hashing rate you may need memory speed of 80-160gb/s which your platform doesn't have.

But at these insane speeds, many things could be limiting the scaleability of the multithreaded execution. Your CPU is just too fast for blake3 ;)

Edited by Falcosc
  • Like 1
Link to comment

Each hash method gets it's own extended file attribute, so build is enough if you want to keep your blake2 keys.

 

But because the last scan date with file stats are shared, and we don't have a button to just clean the last scan date it would be better if you remove all attributes.

 

Just wanted to make clear that you don't have too. On the other hand, shared scan date isn't tested very well because hash method changes is an uncommon usecase.

Link to comment

I'm getting entries like this one:

 

SHA256 hash key has unexpected length, /mnt/disk2/Time Machine/iMac.sparsebundle/bands/2cea

 

How do I fix the length? Could it be that because I used a different hashing method before that it's not actually checking against the correct hash type?

How do I fix it? Should I completely rebuild to be on the safe side? Hopefully not because of two entries (so far).

Also, maybe I missed it, but is there any proper advise on how to handle Nextcloud paths? Nextcloud seems to love to modify stuff without changing the modified date&time values really messing with the plugin.

 

PS: Really loving the new Integrity tab UI with the current process displayed along with progress and overall I feel a lot more like I finally understand the plugin better. Much welcome change! Thank you so much for this invaluable tool!

Edited by Glassed Silver
Added PS part
Link to comment

You need to remove your SHA256 export file.


In my last post, I told that you can keep your SHA hashes in your extended file attributes, but cannot use the "Check Export" button if you did configure blake3 and have SHA hashes in your export folder.

 

If you want to keep your SHA exports, just rename them to avoid getting picked by the "Check Export" button.

Edited by Falcosc
Link to comment
19 hours ago, Falcosc said:

If you want to keep your SHA exports, just rename them to avoid getting picked by the "Check Export" button.

 

I have made an update which includes the hashing method in the name of the export file.

 

E.g. disk1.export.blake3.hash

or   disk1.export.sha256.hash

 

To work with the new names the user has to rename manually the existing export files, these are located on the flash device in /config/plugins/dynamix.file.integrity/export

 

Or run a new "Export" action for all disks to create files with the new names.

 

This allows users to keep export files for different hashing methods, though only one hashing method is active/used at the time.

Hashing results are stored in the extended attributes of a file, and a file may have different hashing results associated with it.

 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.