July 4, 201511 yr I would LOVE a docker than maintains a table of fileLocation/File/MD5. Runs for all NEW files on schedule, just like the mover Runs a check on all files' Checksums prior to scheduled parity check for a comparison to the original checksum, and if a problem notifies the user. Option to disable scheduled parity check should a problem be found. Would need an exclusion list, fileLocation/file and fileLocation/* capability Option to run a Checksum check post parity too
July 4, 201511 yr ... I would LOVE a docker than maintains a table of fileLocation/File/MD5. Runs for all NEW files on schedule, just like the mover ... This would indeed be a nice feature ... although it needs to do this not only for NEW files but also for MODIFIED files (since the MD5's will no longer be correct). ... Runs a check on all files' Checksums prior to scheduled parity check for a comparison to the original checksum, and if a problem notifies the user. Why? Unless a parity check finds errors, you don't need to spend the time to check all the MD5's. I presume you're aware that a complete check of all the MD5's would take FAR longer than a parity check (several times as long). A parity check reads all disks at once and simply does an XOR of the data; an MD5 check has to read EVERY file and compute the MD5 ... not counting the added computational overhead, the disk operations alone will take roughly N times as long as a parity check, where N = the number of data disks in your array. Option to run a Checksum check post parity too Running a checksum validation on demand is clearly a necessary feature -- otherwise the utility to create them wouldn't be of much use You need to be able to select which files to check; which disk to check; etc. Doing a complete check of the entire array isn't something you'd want to do very often [i do mine once/year]. In general, you'd want to check a rebuilt disk; a disk that had reported errors; or any file(s) that you suspect may have been corrupted for some reason (perhaps an errant program).
July 7, 201510 yr Author @Gary 1 – You are right: new and modified files 2 – Good point. Setup a scheduler to run every X days and leave it at that. 3 – So 3 parts. 1 – Calc checksum on New/Modifed files on schedule. 2 - Calc Validation Checksum on All files every X days. 3 – Check Validation Checksum by Share/Folder on Demand.
July 11, 201510 yr Author it appears work has already been started on a file integrity checker: bunker http://lime-technology.com/forum/index.php?topic=37290.0
July 11, 201510 yr One tricky issue with automatic updating of the checksums => if a file is written to by an errant process and the checksum is automatically updated as a result, you'll have no way of knowing that the contents have been inadvertently altered ... indeed a validation check would PASS. On the other hand, if you're updating a file, you DO want the checksum updated. That's one reason I like the Corz utility (unfortunately it's effectively Windows only ... there's been some work on a Linux version, but it's not complete) => when you select "Create checksums, if it finds existing ones YOU have to tell it whether or not to update them -- so if you didn't make a change to the file you likely don't want to recalculate the checksum. How convenient (or not) this is depends on how volatile your files are; but for media collections where you rarely change the files, I find this very convenient.
Archived
This topic is now archived and is closed to further replies.