MD5Recurse hash program built for unRAID


Recommended Posts

I have created a recursive hashing software in Scala (running on JVM). I have designed scripts around it that fit the usages I had for unRaid.

 

Md5Recurse

  • Hash algorithm: MD5 (not others like sha-1)
  • Hash Storage options 
    • File-attributes
    • Global md5data text file and/or
    • Local per-directory .md5 files (with comments containing lastmodified timestap (thx to corz.org for that idea)) and/or
  • By default it rescans hashes of all new files and all files with modified timestamps.
  • It supports check all files with identical last-modified filestamps against last seen hash (bitrot-detection)
  • It supports printing missing and modified files

 

Thoughts on storage options

  • file-attributes are kept when moving and renaming file in linux, so the Md5Recurse program knows the previous hash for the file, even if the file has been moved or renamed.
  • file-attributes and Windows file share (Samba): file-attributes are kept when moving within same share via Samba (windows) but are not kept when moving file via samba from one share to another. In the latter case a rescan will occur.
  • Global md5data: The global md5data files is sorted by dirs, and is a text file. This makes it possible to do manual verifications using diff with old version of file, should the user be so inclined. I have found that useful.
  • Local .md5 files with comments: The files are in the standard format, thus can be used to verify checksums in other programs, such as Total Commander.

 

Thoughts on hash-algorithm

  • The MD5Recurse has been built to detect corruption trickered by non-malicious users, such as failing disks, bad restore from backup and similar. For this purpose MD5 is perfect, because its fast and the collision probability is negligible.
  • For protection against malicious change of files by users who want to steal your Dash or Bitcoins, sha-1 would be better.

 

unRAID script (md5recurse_unraid)

The release contains a unRAID specific bash-script which

  • Scans /mnt/cache, /mnt/disk* in parallel, and places all md5data in one global file for each disk
  • Scan /mnt/user and places md5data in global md5data for the /mnt/user. The /mnt/user scan will detect hashes from previous cache and disk* scans via file-attributes (and local-dir .md5 files if enabled)
  • Makes daily and weekly backup of last global md5data files, and places these on two different disks

 

Thoughts on usages in case of data-loss

  • If a disk crashes I might restore the data from remote backup or from unRAID parity or by fixing filesystem. In that case i would like to be able to verify only that disk, not the entire user-folder. Therefore it is valuable to have a separate scan for that disk.
    • If I restore from remote backup, I would probably use the local pr-dir .md5 to check files, because it indicates state at the time of backup. I might compare that with current md5data from global file.
  • If i accidentally delete a folder and restores it, I would like to verify the checksums based on the all files in that folder, and for that I need the scan of the user-folder.
  • If I want to manually verify a folders content using local .md5 files, then I'm interesting in using the local .md5 data file.

 

How I use it

  • I have a daily cron-job to scan all disk
  • I have a quaterly cron-job to check for bitrot
  • These cron-jobs are included in the release, but need to be modified to fit your environment

 

Any comments are welcome. I'm not going to make it into a plugin, but others are welcome to help with that. Currently it requires some manually configuration of the cron scripts and md5recurse_unraid.cfg (and maybe more that I am not aware of)

 

GitHub: https://github.com/arberg/md5recurse

Releases: https://github.com/arberg/md5recurse/releases

 

Alternative solutions and threads (others solution to the same problem)

 

Edited by Alex R. Berg
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.