Dynamix File Integrity plugin


bonienl

Recommended Posts

I thought about making the export files hash specific by myself, but after discovering all the places I did not dare to change the export file names, @bonienl did you forget some of them?

 

I remember there were multiple places where these filenames are used. We have Export File creation, we have export file checks, export file status. And I remember we have an additional hash file definition in the exportrotate script and another one in the watcher script.

 

At least for the export status, you find the error here: https://github.com/bergware/dynamix/blob/master/source/file-integrity/include/ProgressInfo.php#L57

Link to comment
11 minutes ago, bastl said:

pressing the update button gives me the following error.

 

You caught me while upgrading the plugin. Didn't expect that.

There is now version 24a to rectify the installation error.

 

Hmm, something is not going right ... checking!

Edited by bonienl
  • Haha 1
Link to comment
5 hours ago, bonienl said:

Please install latest version 24e. Thx

Thanks for the update.

 

There are a minor bug, I have apply few "exclude folder" and "exclude file", during export, it always prompt 2 file skip in syslog.

 

Finished - exported 148699 files, skipped 2 files. Duration: 00:00:45

 

Below are the two file attribute

 

getfattr -d *.md5

# file: HASSIO.md5
user.DOSATTRIB=0sAAAEAAQAAAARAAAAIAAAAAAAAAAAAAAAQDLrLcJl1wE=

# file: Hassio snapshots.md5
user.DOSATTRIB=0sAAAEAAQAAAARAAAAIAAAAAAAAAAAAAAA649GN8Jl1wE=

 

I have try rename HASSIO.md5 to HASSIO.md to verify does because .md5 cause the problem, but it still show that two file skip and generate error log.

 

Last, I remove the remain attribute in HASIO.md5, then FIP show skip file be 1 in the result. So, it seems extra attribute in file will wrongly count as skip.

 

Finished - exported 148699 files, skipped 1 file. Duration: 00:00:45

 

Edited by Vr2Io
Link to comment
8 minutes ago, bonienl said:

If you have updated the list of excluded folders or files, you need to run the "Clean" command on your disks to remove an existing hash. This will prevent skipped files.

 

 

I have execute "clear", but those attribute not generate by FIP, so it won't clear ( also expect ). The name was "user.DOSATTRIB"

 

Thanks

Edited by Vr2Io
Link to comment

Short report for something weird happened to me. Planned verify task for 2 disks started this morning, 1 disk finished and reported a corruption on disk 2. Verify task on disk 1 still running. The coruppted file in my case was the nextcloud.log file getting flooded. I cleared the logfile, verify on disk 1 still running. I selected disk 2 and started a export task and the following happened in the ui

 

grafik.thumb.png.fa91c7581b785208240e9a08077a725e.png

 

The ui shows as it looks like the full list of files beeing scanned on disk 1. Refreshing the page doesn't help. The counter for processed files is still counting up and constant read speeds on disk 1 showing that the process looks like it's still working. Only the page is messed up.

Link to comment
26 minutes ago, bastl said:

Short report for something weird happened to me. Planned verify task for 2 disks started this morning, 1 disk finished and reported a corruption on disk 2. Verify task on disk 1 still running. The coruppted file in my case was the nextcloud.log file getting flooded. I cleared the logfile, verify on disk 1 still running. I selected disk 2 and started a export task and the following happened in the ui

 

grafik.thumb.png.fa91c7581b785208240e9a08077a725e.png

 

The ui shows as it looks like the full list of files beeing scanned on disk 1. Refreshing the page doesn't help. The counter for processed files is still counting up and constant read speeds on disk 1 showing that the process looks like it's still working. Only the page is messed up.

Please share your /tmp/*disk2* files and /var/tmp/*disk2* files. I did add errors to them, and it looks like there is at least on layout function which does not ignore this new content.

Is a disk2 process running? Because you should not see any disk2 stuff if no disk 2 process is running. After a process is finished, reload will remove your disk2 progress info.

Edited by Falcosc
Link to comment

@Falcosc I have no /tmp/disk2* files in that folder

 

grafik.png.8147c811e236f33216dd63bbd75bed44.png

 

and I can't really share the /var/tmp/disk2* filelist with you. I have a lot of personal and work content on that disk, some named with my real name or the name of the company I work for.  I can try to remove the personal stuff if it helps.

 

14 minutes ago, Falcosc said:

Is a disk2 process running?

I guess not. On the main tab I only see reads happen on disk one where the verify process is still active. Looks like export for disk2 is already finished. Timestamp for disk2 is short after the time a started the process for a new export.

grafik.thumb.png.9e6d95d15aca518b7b40fd6d1b98f41d.png

 

Link to comment

@Falcosc Maybe it helps to note that I am from germany and some of the files include the german letters Ää Öö Üü and ß for example

 

*/mnt/disk2/company/tmp/Präsentation/industrie-medium.png

 

also I have a folder where backups from my android phone are placed, some with the following naming included with an @ for example

 

*/mnt/disk2/nextcloud/user/files_versions/.SeedVaultAndroidBackup/1678426041580/kv/@pm@/Y2ibnQudnVXN1cn9tLZleQ.v1614280421

 

Link to comment

@Falcosc Verify on disk 1 is finished. The file integrity control panel page still shows the the files from disk2 like in the screenshot of my fist post.

 

I have a lot of *.tmp.end files in /var/tmp and a disk2.tmp.end. Some are from august some from today. I renamed the disk2.tmp.end refreshed the page and disk2 export showed "canceled with error". Refreshed the page again. Error is gone. Started a new export of disk2. No errors. I'am kinda confused right now.

Link to comment

Actually you can't see how fast blake3 is because a single thread can handle 4-6GB/s which is way beyond spinning disk performance.
So, we configured to run single threaded to be more efficient.

 

The main goal off introducing blake3 was to reduce CPU load.

 

I did benchmark memory and thread settings to get the best small file performance.

 

The biggest issue was the amount of syscalls in the old script. But we had other issues which did slow down spinning disk IO as well.

 

- added blake3 hash support for 2-6 times more hash rate than blake2

- reduced CPU load up to 4 times at same data rate when using blake3

- improve build, verify and check speed for all hash methods on small files

- fixed stacking find commands avoiding clean shutdowns while watching control page

- fixed starting multiple build status checks while watching control page

- added monitor background verify processes and all manual bunker script executions

- fixed rare process status detection bug "Operation aborted"

- fixed file name truncation if key is broken

- inline help for disk table and commands more accessible

- fixed multi-language support for buttons

- added watch error and warning messages of running or finished progress

- added Disk status icon for running build status check

Edited by Falcosc
Link to comment

Now that it has settled down, I'm seeing one thread per disk when it's running through my larger files. My disks are Very old 3TB that struggle to hit above 140MB for most of the disk surface. I like what you've done there with the plugin. Usually developers throw the whole system at every file and then it just sits there at 50 MB/s with 24 cores trying to shatter the platter and make dead the head. I always seem to have to reduce these things or tune them to my setup, but you guys did it right by default. If you're hashing an SSD more cores the better, but I'm rarely hashing my ssd's.

 

Link to comment

It was a bit tricky to remove the bottleneck because it already was tuned to process sequentially with one thread per disk.

 

The issue was around expensive binary calls with a lot of syscalls like file stat, get file attribute, unnecessary subshells for each file and so on.

 

After fixing these I found all the other issues like unwanted parallel IO caused by monitoring.

 

I did also experiment with offloading the really needed but still expensive syscalls into asynchronous subshells. But XFS isn't good enough to handle asynchronous file metadata actions without hurting spinning disk IO too much. If XFS metadata were completely cached, then you would reach nearly peak disk read performance on small files (did tests without Metadata, which did achieve this)

 

But check export does skip all these things because it doesn't need to read or write file attributes if the exported hash matches. For that reason, this process is the fastest one. It is basically directly using the hash list cli argument and only starts custom error handling in case the cli response stream contains verification errors.

Edited by Falcosc
Link to comment
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.