Dynamix File Integrity plugin


bonienl

Recommended Posts

Really excited for this - initial check nearly done on my i5 server (3 4Tb data disks).  Is it normal for a lower specd server to become fully unresponsive during the initial check (specs in signature)

 

If you started all three disks simultaneously, it can introduce a full load on your processor. Alternatively you can build one disk at the time to lower the load, but it will take longer to complete all disks.

 

Perhaps adding nice/ionice tunable to the process to keep it from dragging down the system?

Link to comment

Just to confirm, After the initial build & export

Hashing of new or date modified files is done automatically on all disks (daily) with no configuration required

The Scheduled disk hash rechecks all the files by disk for potential corruption

 

...

 

Will this use more than 1 tread/disk?  It appears that a Atom D525 - 1.8 Ghz Dual Core can handle 1 disk check plus 1 additional task (writing data, streaming data, or parity check) before being overloaded and locking up.

 

Cheers,

Link to comment

Just to confirm, After the initial build & export

Hashing of new or date modified files is done automatically on all disks (daily) with no configuration required

 

Any new file created will get automatically the hash value set in the extended attributes, and any modified file will get an automatic verification/hash update to detect silent corruption. This happens transparent to the user and in real-time.

 

Will this use more than 1 tread/disk?  It appears that a Atom D525 - 1.8 Ghz Dual Core can handle 1 disk check plus 1 additional task (writing data, streaming data, or parity check) before being overloaded and locking up.

 

The scheduled verification is a separate functionality. Each task determines how many disks are verified concurrently. If your processor isn't capable enough to do concurrency then select only one disk per task and as many tasks to have all disks verified. See also the online help.

 

Link to comment

Really excited for this - initial check nearly done on my i5 server (3 4Tb data disks).  Is it normal for a lower specd server to become fully unresponsive during the initial check (specs in signature)

 

If you started all three disks simultaneously, it can introduce a full load on your processor. Alternatively you can build one disk at the time to lower the load, but it will take longer to complete all disks.

 

Perhaps adding nice/ionice tunable to the process to keep it from dragging down the system?

 

2016.01.04a, running on a single disk.  Takes 67% of the dual core hyperthreaded CPU - no other tasks currently running.

 

So far, so good :)

Link to comment

Version 2016.01.05 is now available, this version marks the official release of this plugin. See OT. Thanks everyone for testing.

 

Please upgrade to this version if you are still on an older version.

 

Hope this is a useful extension to the Dynamix family. Enjoy!

 

Link to comment

One small request: Would it be possible to show what action is currently underway above the progress bar?

 

I have a relatively underpowered CPU so I set a task in motion and then check back over the day and I am likely to forget exactly what is running.

 

Attached is a crude representation of what I am suggesting!!

 

Many Thanks,

 

The Capt.

 

EDIT: attachment updated to slightly improve representation of idea. (If single or multiple discs active, show what tasks are running)

screenshot2.png.e98b75195183b72e6a39e7392b483eac.png

Link to comment

Small issue but I believe should be an easy fix, if when doing multiple checks or builds you change WebGUI page to e.g. main, and some disks had finished, when you go back to plugin page finished disks appear as aborted, see screens for better example, latest version 2016.01.05a used.

1.png.d150bab0c067d9ef97beddbff7f021dd.png

2.png.c0aa30c6f50e39da29c38836fa071419.png

Link to comment

Small issue but I believe should be an easy fix, if when doing multiple checks or builds you change WebGUI page to e.g. main, and some disks had finished, when you go back to plugin page finished disks appear as aborted, see screens for better example, latest version 2016.01.05b used.

I see this as well!

 

I wonder if it is better to simply remove these from the list of tasks rather than try and work out how to mark them as completed rather than aborted?

Link to comment

Small issue but I believe should be an easy fix, if when doing multiple checks or builds you change WebGUI page to e.g. main, and some disks had finished, when you go back to plugin page finished disks appear as aborted, see screens for better example, latest version 2016.01.05a used.

 

I believe this is related to the bug I just fixed, you may want to retry with version 2016.01.05b.

 

Link to comment

Small issue but I believe should be an easy fix, if when doing multiple checks or builds you change WebGUI page to e.g. main, and some disks had finished, when you go back to plugin page finished disks appear as aborted, see screens for better example, latest version 2016.01.05b used.

I see this as well!

 

I wonder if it is better to simply remove these from the list of tasks rather than try and work out how to mark them as completed rather than aborted?

 

The intend is to show the final result and it stays there as long as you don't leave the page. Once going to another page and do a revisit then anything completed isn't shown anymore.

 

Link to comment

Small issue but I believe should be an easy fix, if when doing multiple checks or builds you change WebGUI page to e.g. main, and some disks had finished, when you go back to plugin page finished disks appear as aborted, see screens for better example, latest version 2016.01.05b used.

I see this as well!

 

I wonder if it is better to simply remove these from the list of tasks rather than try and work out how to mark them as completed rather than aborted?

 

The intend is to show the final result and it stays there as long as you don't leave the page. Once going to another page and do a revisit then anything completed isn't shown anymore.

I just checked with the latest update and it does seem to be fixed.  The behaviour is now exactly what I wanted.
Link to comment

Working Great!  Thank you :)

 

SMALL feature request for future bumps... a flag indicating if an initial build and a flag indicating an export has been accomplished per disk.

 

Two more feature requests/wish list items  ;) Not sure how much trouble they would be to implement but I figure it can not hurt to ask and see what sticks.

 

1. It would be awesome if there was some sort of icon letting you know that all items (besides excluded folders) on disk1, 2, 3, etc. have a checksum. It would be nice for peace of mind. Maybe if not all file had a checksum then you could see a list of them?

 

2. Might also be nice if there was a way to check individual folders or files for valid checksums. Especially with people who have slower CPU's they might be concerned about a specific folder but a 8TB filled drive takes awhile to calculate.

Link to comment

Once initial build is done and automatic protection is ON, all should stay up-to-date automatically. For your peace of mind you can occasionally do a manual build, it should report that 0 files are added. If not, then the extended attributes are created and the new checksums are automatically added to the export file.

 

Checking all files for the presence of the extended attributes and detect anything missing (this is what build actually does) will be a time consuming business when many files are involved.

 

I'll keep all requests in mind though, but no promises.

 

Link to comment

Again, fantastic plugin Bonienl!

 

My request is based on the need to do each disk separately due to the slow atom cpu... each 4TB disk takes about 20 hours to complete.  If I had more disks remembering which I was on could be a bit bothersome.  I am also thinking of the flag being a nice feature for when I add a disk in the future, and 2 months later wonder If I did the initial build on the disk or not.

Link to comment

I installed this earlier and it almost immediately triggered the reiser issue described in https://lime-technology.com/forum/index.php?topic=39911.0 which I've never seen before, by "almost immediately" I mean "within a minute or so".

 

emhttp was then hung and wa times were at ~50% or so & it seems this was due to a bs2sum process being stuck in D state (hence unkillable even by kill -9) as it attempted to calculate the hash for some particular file, the offending disk was thus unmountable and a hard reset was the only way out. The system itself was still responsive through all this, just emhttp and that zombie process was the (big) problem.

 

The disk in question has been running solidly for a shade over 2yrs and no other disk issues noted until now on that sata controller. reiserfsck has just finished on the offending disk and no issues noted (some journal replay action but nothing else to report), smart report is also completely clear.

 

It's not obvious how your plugin could trigger this (as opposed to some previously unexploited weakness in my system) but thought I'd mention it anyway. 

Link to comment

I installed this earlier and it almost immediately triggered the reiser issue described in https://lime-technology.com/forum/index.php?topic=39911.0 which I've never seen before, by "almost immediately" I mean "within a minute or so".

 

emhttp was then hung and wa times were at ~50% or so & it seems this was due to a bs2sum process being stuck in D state (hence unkillable even by kill -9) as it attempted to calculate the hash for some particular file, the offending disk was thus unmountable and a hard reset was the only way out. The system itself was still responsive through all this, just emhttp and that zombie process was the (big) problem.

 

The disk in question has been running solidly for a shade over 2yrs and no other disk issues noted until now on that sata controller. reiserfsck has just finished on the offending disk and no issues noted (some journal replay action but nothing else to report), smart report is also completely clear.

 

It's not obvious how your plugin could trigger this (as opposed to some previously unexploited weakness in my system) but thought I'd mention it anyway.

 

Thanks for your observation.

 

I have seen in the past that RFS can have issues in V6 when writing extended attributes and once in a blue moon completely hangs. I am going to put a BIG warning that using this plugin in combination with RFS may lead to instability.

 

WARNING: USING THIS PLUGIN ON DISKS FORMATTED IN REISERFS MAY LEAD TO SYSTEM INSTABILITY. IT IS ADVISED TO USE XFS.

 

Link to comment

I installed this earlier and it almost immediately triggered the reiser issue described in https://lime-technology.com/forum/index.php?topic=39911.0 which I've never seen before, by "almost immediately" I mean "within a minute or so".

 

emhttp was then hung and wa times were at ~50% or so & it seems this was due to a bs2sum process being stuck in D state (hence unkillable even by kill -9) as it attempted to calculate the hash for some particular file, the offending disk was thus unmountable and a hard reset was the only way out. The system itself was still responsive through all this, just emhttp and that zombie process was the (big) problem.

 

The disk in question has been running solidly for a shade over 2yrs and no other disk issues noted until now on that sata controller. reiserfsck has just finished on the offending disk and no issues noted (some journal replay action but nothing else to report), smart report is also completely clear.

 

It's not obvious how your plugin could trigger this (as opposed to some previously unexploited weakness in my system) but thought I'd mention it anyway.

 

Thanks for your observation.

 

I have seen in the past that RFS can have issues in V6 when writing extended attributes and once in a blue moon completely hangs. I am going to put a BIG warning that using this plugin in combination with RFS may lead to instability.

 

WARNING: USING THIS PLUGIN ON DISKS FORMATTED IN REISERFS MAY LEAD TO SYSTEM INSTABILITY. IT IS ADVISED TO USE XFS.

 

Is this due to use of the extended attributes (I haven't read every post yet)?  Is there an option to turn that part off and just create hash files?  Is par2 addition to the plugin being considered?  Thanks

Link to comment

The essence of this plugin is to use the extended attributes, which allows to store the checksum information together with the file instead of storing it in separate files.

 

I've put out the warning so that people are aware of this potential instability when ReiserFS is used as file system. Personally I moved over to XFS a long time ago and find this to work absolutely stable.

 

Sorry par2 is not considered, the intention of this plugin is to detect only.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.