Dynamix File Integrity plugin

March 8, 20179 yr

Just now, bonienl said:

FIP has some logic to start concurrent "hashing" sessions depending on the number of available cores in the system.

Perhaps I should make that a variable which can be adjusted thru the GUI, it would allow people to experiment and see what works best on their system.

That would be a great addition and I think I could definitely get around the issue by just setting it to one or two.

It would also be great to be able to limit concurrent hashing sessions per physical disk though, as that is the real bottleneck in my system. My disk just couldn't keep up with all of the concurrent read/writes, whereas if the files were on multiple disks I think I would have been fine.

Anyway, thanks for the response!

Quote

March 8, 20179 yr

4 minutes ago, ksarnelli said:

My disk just couldn't keep up with all of the concurrent read/writes, whereas if the files were on multiple disks I think I would have been fine.

IMO this is more important, ie, even with multiple cores hashing multiple files on the same disk concurrently will always be slower than one at a time.

Quote

March 8, 20179 yr

Definitely agree => you do NOT want multiple simultaneous hashing of different files on the same disk. It's FAR faster to do one-at-a-time in this case ... regardless of the number of CPU cores you have.

Quote

March 8, 20179 yr

54 minutes ago, johnnie.black said:

even with multiple cores hashing multiple files on the same disk concurrently will always be slower than one at a time.

Depends on the hashing algorithm. If disk I/O and access time is only 1/20 of the total time required to hash and record the results, it would be faster to assign multiple cores to the same disk. It's all about processor math vs I/O speed.

Granted, the current profile of hashing algorithms, disk I/O and processor speed is suited to 1 disk per thread, that's not an absolute.

Quote

March 8, 20179 yr

I'd disagree with that => a single seek on a modern disk typically takes 10-15ms (seek plus latency) ... that's 10,000 - 15,000 usec, which is a LONG time at modern CPU speeds. Thrashing between multiple files on the same disk will result in a LOT of those seek delays. The problem is that even if disk I/O and access time is 1/20th of the total time, UNLESS the entire file is read on a single access, that would expand dramatically as the heads had to re-seek to read the next piece of the file. Granted, if the code was modified to read an entire file; THEN hash it; then it might be advantageous to have multiple threads operating on the same disk (as long as there's enough memory to buffer the files being processed) ... but I think it's simply best to limit any given disk to a single thread.

Quote

March 30, 20179 yr

On 1/9/2017 at 1:25 PM, bonienl said:

Doing a manual Build will add any files missing previously.

It is recommended to exclude folders/files which change frequently, here the normal error detection of the disk will take care. Detection of bitrot is more meaningful on folders/files which are not accessed or modified frequently.

So I should exclude documents folders containing working documents, and only use this for integrity checking of largely static data, such as photos, home videos, media collection, etc?

Quote

April 14, 20179 yr

On 3/8/2017 at 7:59 PM, bonienl said:

FIP has some logic to start concurrent "hashing" sessions depending on the number of available cores in the system.

Perhaps I should make that a variable which can be adjusted thru the GUI, it would allow people to experiment and see what works best on their system.

Any news on this? It is especially painful when a mover operation is running. Not only FIP tries to calculate checksums for multiple files at a time on one disk, mover is also trying to copy more files to said disk. This increases the move time significantly and basically makes the array unusable while it is running.

Quote

May 17, 20179 yr

I am getting notifications of hash key mismatches from directories that are in my exclude custom folder list.

I want to include my /mnt/user/Backups share because all my machines at home are backing up to here, but I want to exclude a subset of directories under that share because my server is a backup location for several family member's machines.

My structure is:

/mnt/user/Backups/...

+ Local machine 1 backup

+ Local machine 2 backup

+ Local machine 3 backup

+ CommunityApplicationAppdataBackup

+ Crashplan machine 1 backup

+ Crashplan machine 2 backup

+ Crashplan machine n backup

In the excluded files and folders Custom Folders box, I have this:

619375375455617289,619380445513515330,619452463559020549,622829716619395332,622831926866608387,682140704451330318,712875340537760924,CommunityApplicationsAppdataBackup

The numbered directories are the ones CrashPlan creates when an incoming backup is created. I recall that when I first excluded the folders, there was some sort of drop-down picker and that the format above was created by that, but now I'm not so certain of my recollection and I may have just made it up.

2 part question, then:

1) why am I still getting notifications of file mismatches in these folders?

2) if the answer to 1) is "because the exclude folder format is wrong", what is the correct format for this box? Do I need the full path, and if so from root (/)?

Quote

June 1, 20179 yr

22 minutes ago, zin105 said:

Is this plugin fine to run if my cache drive is formatted to BTRFS?

Plugin only runs on array devices, also btrfs already cheksums all data, so it's not needed.

Quote

June 1, 20179 yr

15 minutes ago, zin105 said:

Does it checksum and automatically repair on unRAID?

Any cache files are not parity protected if that's what you mean, and unRAID can't fix corrupt files, it can fix a failed array device.

Quote

June 4, 20179 yr

Any one counter one disk is completed "build" process with green tick, but will get "circle" at the same day....

Under "Automatically protect new and modified files" enabled, "green" click should imply complete the integrity build but why I get "circle" later on??

Quote

June 11, 20179 yr

I've been rebuilding all my hashes one disk at a time to attempt to eliminate the constant warning I was getting because I think I improperly excluded some paths. I came home this evening to a very unusual display.

It's telling me that my Disk11 is 100% completed with an ETA of 00:00:00, however it's reporting that it's working on file 664,413 of 2,248,611, and the current file number is incrementing.

Is there possibly an issue with the size of the counter variable that's overflowing and causing the percentage to think it's done?

Quote

July 5, 20179 yr

I have 17 disks and I setup the schedule for 1 task for each disk daily at 00:00. Yesterday disk5 ran fine. Today disk 6 should have run but i did not get a notification it started and I do not see the bunker script running. Is there any way to find out why it didn't run this morning?

Quote

July 5, 20179 yr

I looked on the console and saw these messages, not sure what it means

grep: /boot/config/plugins/dynamix.file.integrity/disks.ini no such file or directory

/usr/local/emhttp/plugins/dynamix.file.integrity/scripts/bunker: line 347: 24875 Terminated $exec $argv "$file" > $tmpfile.0

Quote

July 5, 20179 yr

Why is it skipping files?

Also, i have a feeling it isn't hashing every newly added file. When it starts a scheduled scan (I only do one disc a month) it says the amount of data it is scanning and it is always a few hundred gigabytes less than is on the drive it is scanning.

Quote

July 22, 20178 yr

Can someone explain what the "Task" checkboxes are supposed to be for?

I'm trying to learn more about how to use this plugin after I found out some of my media files start acting crazy at random sections. I suspect some of the files got messed up. Ive had a few bad disks over the years and no way to tell if any files got damaged.

Also, I just have ONE reiserfs disc remaining that I am planning to replace. Should I replace that before getting started with this plugin? I did notice the warning on the first page but was curious if I could just excude the disk that is reiserfs format. I'm guessing that the answer is no.

Edited July 22, 20178 yr by DazedAndConfused

Quote

July 23, 20178 yr

16 hours ago, DazedAndConfused said:

just excude the disk that is reiserfs format.

That is exactly what I did.

When I got my first disk converted from RFS to XFS, I started the FIP running against that one XFS drive. Each time I got a drive converted, I built & exported that disk and added it to the check schedule. Eventually I got the whole server converted to XFS and now all drives are being tested on a regular basis.

Quote

July 23, 20178 yr

I've noticed that one disk takes an extreme amount of time to be checked by FIP. By extreme, I mean 30+ hours.

I've got a dozen data disks:

root@NAS:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md1        932G  914G   18G  99% /mnt/disk1
/dev/md2        1.9T  1.8T   22G  99% /mnt/disk2
/dev/md3        932G  929G  2.9G 100% /mnt/disk3
/dev/md4        1.9T  1.9T  9.5G 100% /mnt/disk4
/dev/md5        3.7T  3.7T   33G 100% /mnt/disk5
/dev/md6        1.9T  1.9T   16G 100% /mnt/disk6
/dev/md7        1.9T  1.3T  571G  70% /mnt/disk7
/dev/md8        1.9T  1.9T  151M 100% /mnt/disk8
/dev/md9        932G  932G   55M 100% /mnt/disk9
/dev/md10       2.8T  2.8T   25G 100% /mnt/disk10
/dev/md11       3.7T  2.1T  1.7T  56% /mnt/disk11
/dev/md12       3.7T  1.9T  1.9T  51% /mnt/disk12

Of the 4TB drives (md4, 11 & 12) 5 & 12 run to completion in 6-10 hours (don't recall off the top of my head), but disk 11 regularly takes the 30+ hours.

Using jbartlett's excellent drive performance test all 3 of those disks are performing at right about the same speeds. (FIP is currently processing disks 11 & 12, so there were some files open that probably slowed them down a bit.)

Disk 5: 	HGST HDN724040ALE640 PK1338P4GT2D9B  	4 TB	  133 MB/sec avg
Disk 11: 	HGST HMS5C4040ALE640 PL2331LAG6W5WJ  	4 TB	  100 MB/sec avg
Disk 12: 	HGST HMS5C4040ALE640 PL1331LAHE2R0H  	4 TB	  102 MB/sec avg

So I mark this up to the very large number of files on disk11:

root@NAS:/mnt# ls disk5 -ARl|egrep -c '^-'
74328
root@NAS:/mnt# ls disk11 -ARl|egrep -c '^-'
2311532
root@NAS:/mnt# ls disk12 -ARl|egrep -c '^-'
380921

Which brings me back to the question I asked here - how do I properly exclude directories that I don't want to have checked? Since asking that question, I changed my directory exclude settings to have full paths from root, then executed 'Clear', 'Remove', 'Build', 'Export' for each and every disk in turn in an effort to update FIP's understanding of what it's supposed to do, but I'm still getting bunker reports of hash key mismatches on directories that should be excluded. I've set the "Exclude" paths from /mnt/users, do I need to exclude /mnt/diskx instead? I would think doing this would be a major pain since I'm writing to user shares that can easily span multiple drives - to begin with I'd need to exclude the paths from every existing disk, then I would need to remember to update my FIP settings every time I add a new disk. (Granted, I don't do it that often, but that's still a royal pain.)

I've confirmed that disk11 does contain a large portion of the files I'd like to exclude from FIP scanning.

Is this an issue with how FIP is skipping the paths in the "exclude" setting or how I'm defining them, or is there something else I'm missing completely?

Quote

September 7, 20178 yr

Just building my first unRAID and copying TBs of data to it. Would it be best to wait until all data is copied over and then run the Build? Or should I just enable it right off the bat and let it compute checksums as stuff is copied to the array? Any major perf hit by just turning it on now and having it start doing its thing while I copy data?

Also I'm copying the files in SSH screen by using unassigned devices mounts and cp direct from my previous NTFS drives directly to the /mnt/diskN/share locations. Does it still compute the checksum when moving files around this way rather than via SMB shares and such?

Edited September 7, 20178 yr by deusxanime
more info/questions

Quote

September 7, 20178 yr

1 hour ago, deusxanime said:

Just building my first unRAID and copying TBs of data to it. Would it be best to wait until all data is copied over and then run the Build? Or should I just enable it right off the bat and let it compute checksums as stuff is copied to the array? Any major perf hit by just turning it on now and having it start doing its thing while I copy data?

The issue I brought up about 6 months ago (concurrent checksum calculations on a single disk) still hasn't been corrected, so I would recommend not having the automatic option enabled during your initial copy. I actually had to leave it disabled permanently because every time my mover ran the checksum calculations would crush one of my drives.

Quote

September 7, 20178 yr

57 minutes ago, ksarnelli said:

The issue I brought up about 6 months ago (concurrent checksum calculations on a single disk) still hasn't been corrected, so I would recommend not having the automatic option enabled during your initial copy. I actually had to leave it disabled permanently because every time my mover ran the checksum calculations would crush one of my drives.

Ouch thanks for the heads up. Guess I'll leave it off for now and turn it on later once things have settled in.

Quote

October 14, 20178 yr

On 7/23/2017 at 10:30 AM, FreeMan said:

Which brings me back to the question I asked here - how do I properly exclude directories that I don't want to have checked? Since asking that question, I changed my directory exclude settings to have full paths from root, then executed 'Clear', 'Remove', 'Build', 'Export' for each and every disk in turn in an effort to update FIP's understanding of what it's supposed to do, but I'm still getting bunker reports of hash key mismatches on directories that should be excluded. I've set the "Exclude" paths from /mnt/users, do I need to exclude /mnt/diskx instead? I would think doing this would be a major pain since I'm writing to user shares that can easily span multiple drives - to begin with I'd need to exclude the paths from every existing disk, then I would need to remember to update my FIP settings every time I add a new disk. (Granted, I don't do it that often, but that's still a royal pain.)

Sadly, I'm still getting notification of errors on files that should be excluded, so either the exclude logic is broken or I don't understand how to use it. It seems that the "how to use it" part is a big secret, since I've asked about it twice and nobody's felt it was appropriate to share.

I'm also getting 2 types of notifications from FIP:

Quote

Event: unRAID file corruption
Subject: Notice [NAS] - bunker verify command
Description: Found 14 files with BLAKE2 hash key mismatch
Importance: warning

and

Quote

Event: unRAID file corruption
Subject: Notice [NAS] - bunker verify command
Description: Found 6 files with BLAKE2 hash key corruption
Importance: alert

English is difficult (even though it's my native language) and the semantics can be particular. I'm not sure whether a "warning" is more or less important than an "alert", though I would think that "corruption" is worse than a "mismatch". However, I'm not sure what the difference between the two of those really is, either. In what way can the hash key be mismatched if not due to corruption?

Quote

October 14, 20178 yr

Author

Try the following steps:

Settings -> FIP -> Automatically protect = disabled
Settings -> FIP -> Disk verification schedule = disabled
Settings -> FIP -> Fill in all folders + files which need to be excluded
Apply
Tools -> FIP -> Select all disks
Tools -> FIP -> Clear
Settings -> FIP -> Automatically protect = enabled
Settings -> FIP -> Disk verification schedule = enabled (select schedule)
Apply

Files with a "key mismatch" have been updated (modification time changed), but their key is incorrect. Usually happens when applications make changes to files without proper open/close notifications. Best practice: exclude these folders or files.

Files with a "key corruption" are not modified and the key signifies the file has corrupted content. Sometimes this can be a false positive, but in general these files need to be checked against a backup copy to verify their content (manual action).

Quote

October 14, 20178 yr

Thanks, @bonienl. I've followed your instructions and I've got my verifications scheduled again. I'll continue to monitor to see if I get any additional issues.

If anything, I would guess that it's the CA-Backup that's not properly opening/closing files, because all the issues are in my Backups share on the CA backups path. I've had them excluded with "/mnt/user/Backups/CommunityApplicationsAppdataBackup", and that doesn't seem to have excluded these files from the verification, so I just changed it to "/mnt/user/Backups/CommunityApplicationsAppdataBackup/*" (with the /* at the end) in the hopes that would do it. I've excluded other directories, as well, and comma separated them - is that the proper way of doing it? I believe I also had spaces after the comma between the paths, maybe that was breaking things - I've eliminated that now, too.

Quote

October 15, 20178 yr

Author

To exclude a custom folder you would simply use its name. In your case CommunityApplicationsAppdataBackup, but perhaps it is even better to exclude the complete Backups folder.

Multiple folder or file names are separated by a comma don't use spaces.

Quote

Dynamix File Integrity plugin

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

CajunCoding

paululibro

limetech

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

CajunCoding

paululibro

limetech

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)