[SOLVED]Help with de-duplication / hardlinking

airy52 · November 12, 2021

Like many of you, I use my unraid server for plex and media downloading/management. Recently I discovered that hardlinks weren't working properly and found out it was because I was downloading to a different folder mapping than the storage mapping. I set everything to the same /Media path and its working now, but I have a LOT of old data that is now duplicated in the downloads folder(/Media/Sonarr/Downloads) and in the place that Sonarr then organized it after it finished(/Media/TV and /Media/Anime). I've read about some tools like Czkawka(https://github.com/qarmin/czkawka) and DupeGuru(https://github.com/arsenetar/dupeguru) that will help me find the duplicate files, hardlink(or symlink? softlink?) them, and remove the duplicate.

I want to do this but I only have enough linux knowledge to do basics or follow instructions. My main concerns are that some files in the downloads folder might be a duplicate, but not be on the same drive anymore (I have 2 drives + 1 parity + cache), and I think that will be an issue? Also I'm not familiar with the inner workings of unraid and how it presents multiple drives as one folder in /mnt/user/, and I don't want to break it running something not intended for this configuration on it.

So my question, can any of you help me figure out how to do this properly with any of these(or other) tools?

Edited November 19, 2021 by airy52
solved

airy52 · November 19, 2021

For anyone that finds this in a search, I used jdupes included in nerdpack/nerdtools. Command I used was:

jdupes -QrL -X size+=:100000k /mnt/user/Media/

get rid of -Q(quick, uses file hashes instead of direct binary comparison) if you don't care if it takes longer or your data is nuclear launch codes or something.

SkilledAlpaca · April 4

Apologies for the necro post but this was one of the few threads I've found that has a solution for deduplication and I wanted to expand on the command above from OP. Mainly just adding the flags from their command so you know what it's doing without blindly running it.

jdupes Usage: https://codeberg.org/jbruchon/jdupes#usage

jdupes -QrL -X size+=:100000k /mnt/user/<share>

-Q --quick             
	skip byte-by-byte duplicate verification. WARNING:
    this may delete non-duplicates! Read the manual first!

-r --recurse
	for every directory, process its subdirectories too

-R --recurse:
	for each directory given after this option follow
	subdirectories encountered within (note the ':' at
	the end of the option, manpage for more details)

-L --link-hard
	hard link all duplicate files without prompting

-X --ext-filter=x:y
	filter files based on specified criteria
	Use '-X help' for detailed extfilter help
	
	size[+-=]:size[suffix]
		Only Include files matching size criteria
		Size specs: + larger, - smaller, = equal to
		Specs can be mixed, i.e. size+=:100k will
		only include files 100KiB or more in size.

[SOLVED]Help with de-duplication / hardlinking

Recommended Posts

airy52

Link to comment

airy52

Link to comment

SkilledAlpaca

Link to comment

Join the conversation