[SOLVED]Help with de-duplication / hardlinking


airy52

Recommended Posts

Like many of you, I use my unraid server for plex and media downloading/management. Recently I discovered that hardlinks weren't working properly and found out it was because I was downloading to a different folder mapping than the storage mapping. I set everything to the same /Media path and its working now, but I have a LOT of old data that is now duplicated in the downloads folder(/Media/Sonarr/Downloads) and in the place that Sonarr then organized it after it finished(/Media/TV and /Media/Anime). I've read about some tools like Czkawka(https://github.com/qarmin/czkawka) and DupeGuru(https://github.com/arsenetar/dupeguru) that will help me find the duplicate files, hardlink(or symlink? softlink?) them, and remove the duplicate. 

I want to do this but I only have enough linux knowledge to do basics or follow instructions. My main concerns are that some files in the downloads folder might be a duplicate, but not be on the same drive anymore (I have 2 drives + 1 parity + cache), and I think that will be an issue? Also I'm not familiar with the inner workings of unraid and how it presents multiple drives as one folder in /mnt/user/, and I don't want to break it running something not intended for this configuration on it.

So my question, can any of you help me figure out how to do this properly with any of these(or other) tools?

Edited by airy52
solved
  • Like 1
Link to comment

For anyone that finds this in a search, I used jdupes included in nerdpack/nerdtools. Command I used was:

jdupes -QrL -X size+=:100000k /mnt/user/Media/

 

get rid of -Q(quick, uses file hashes instead of direct binary comparison) if you don't care if it takes longer or your data is nuclear launch codes or something. 

  • Like 2
Link to comment
  • airy52 changed the title to [SOLVED]Help with de-duplication / hardlinking
  • 2 years later...

Apologies for the necro post but this was one of the few threads I've found that has a solution for deduplication and I wanted to expand on the command above from OP. Mainly just adding the flags from their command so you know what it's doing without blindly running it.

 

jdupes Usage: https://codeberg.org/jbruchon/jdupes#usage

 

jdupes -QrL -X size+=:100000k /mnt/user/<share>

-Q --quick             
	skip byte-by-byte duplicate verification. WARNING:
    this may delete non-duplicates! Read the manual first!

-r --recurse
	for every directory, process its subdirectories too

-R --recurse:
	for each directory given after this option follow
	subdirectories encountered within (note the ':' at
	the end of the option, manpage for more details)

-L --link-hard
	hard link all duplicate files without prompting

-X --ext-filter=x:y
	filter files based on specified criteria
	Use '-X help' for detailed extfilter help
	
	size[+-=]:size[suffix]
		Only Include files matching size criteria
		Size specs: + larger, - smaller, = equal to
		Specs can be mixed, i.e. size+=:100k will
		only include files 100KiB or more in size.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.