Jump to content

Docker Request: Checksum - Catch bitrot and identify files that failed parity


jumperalex

Recommended Posts

Checksum (http://corz.org/windows/software/checksum/) is an awesome utility that allows you to create, manage, and verify checksums of all your files so you can identify bitrot and corrupt files after a parity check failure.  It has both a command line and gui function.

 

The problem is that it is windows only. Technically there is a linux version that works on unraid CLI but it is out of date and is feature-incomplete (not able to update changed / remove missing files).  That means running it from your windows desktop over your network. 

 

Comparing a run on a single instance with a 32g/6,600 file folder on unraid (that cli I mentioned) and my desktop showed a 25% reduction in time.  After growing to three+ instances (like one for each disk on an array) any network will be more than saturated and the native run should be much much faster than running it on the desktop.  Regardless of a checksum or verification run, doing it over the network means moving your entire array over the network ... I leave it to you to figure out how long that might take right after a parity failure :o

 

So the solution seems to be a docker container with WINE and checksum.  Anyone wanna take up that challenge?

 

As best I can tell the license might allow it: http://corz.org/public/machine/source/beta/windows/checksum/license.txt the salient part being

You may not distribute this software on the web. You may not charge for this

software under any circumstances. You may distribute this software (in its

original zipped form) as part of compilation, so long as the compilation is not

sold for profit, and a valid link back to the checksum home page is provided.

I'm not sure how feasible it is for the container to retain the .zip and then expand it later on start (wine boot script?)  That should stay within the spirit of the license and also retain his notifications asking for "shirtware" compensation.  Perhaps a conversation with the author could work something out.  Afterall, being posted to Dockers git would certainly increase his user base right [shrug]  He does seem like a reasonable guy and he certainly deserves to have his work respected and compensated.

 

I don't even have docker setup yet (arch vm is working great for me), but this would be one thing that certainly gets me to consider it and I really think this could be worth while for the community. 

 

 

Link to comment

Very interesting idea! Corz checksum is a fantastic program that everyone should be using IMHO! I could not make this into a Docker but would consider adding it if it was made into a Docker. Question for you, have you installed Corz via Wine on your VM? I currently have a Ubuntu server KVM setup and may give installing Corz a shot if you have had success with it.

Link to comment

I have not, but I suppose I should [derp] ... I'm just so averse to bloating my arch VM I swear it didn't even occur to me.  I think I also didn't want to mess with trying to get a gui up via VNC, but I'd probably only care about the cli anyway so it might be a decent experiment [shrug].  I wish I could snapshot my XEN-Arch VM, but I guess I'll just make a copy of it (on my cache since my SSD is full) and test it there.

Link to comment

cloned my VM and ... hmmm well looks like I'll have to learn about X and get pass-through working etc.  Not sure I'm up to the task, or care enough at the moment.  Oh yeah and I should be sending out resumes anyway :o

 

I remember pounding my head against the wall trying to get a remote desktop and failing, but I was trying to do it headless with forwarding something or other.  that was before I had a gpu / motherboard that was even capable of pass-thru, and I'm not even sure I do now.

 

That's why I wanted someone smarter than me to play about with it as a docker.

Link to comment

I'll just quote what I posted here: http://lime-technology.com/forum/index.php?topic=27051.msg322702#msg322702 and follow up with this http://lime-technology.com/forum/index.php?topic=13033.msg155780#msg155780 which is the "someone else figured out" I refer to below.

 

I started down the path as I said but i decided to play around a bit by installing it into ArchVM and doing what i was trying to avoid doing; share and mount all my disks (all = three ... but still).  So with that done I started playing with hashdeep and quickly realized what someone else on the forums figured out (after some more specific searching): hashdeep really is not what "we" want if our intent is to quickly and easily:

 

1) Generate initial checksums (ok it is good at this actually)

 

Synchronize:

2) Add new file checksum (technically it can but only by regenerating ALL of them again)

3) Remove checksums of deleted files (no file name match)

 

Verify:

4) Identify missing files (no name match, checksum exists)

5) Identify bad files (file name match, checksum fail, no date change)

6) Identify modified files (name match, checksum fail, date change)

 

The app that can do all that is really only on windows which means running over the network like I am right now :( or creating a Windows VM / Docker.  Technically he has an alpha version that can run in Linux under KDE, but no command line version ... well I don't think there is one.  I am going to dig into it and see what I can find.

 

checksum simply provides more functionality. I mean sure all it is doing is scripting operating a hash function, work that could be scripted using hashdeep, but I'm not the guy to go recreate that scripting work.  I have neither the time nor the skills.

Link to comment

I could be wrong, but the functionality of Checksum I just couldn't replicate with md5sum or other tools.

 

My windows machine currently runs Checksum against the entire array of both my servers weekly.  Checksum creates a separate .hash file (or .md5 file) for each folder.  If there is a new file added, then it only computes the checksum of the additional file and adds it to the md5 file.  (I wanted one md5 file for each folder, rather than an md5 for each file)

 

When I was playing with md5sum, I couldn't manage to get it to do the same thing, but it may be possible with some fancy scripting.

 

That being said, md5sum I found to be ALOT faster to create the initial checksums for each folder (10x)

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...