[SOLVED]Help with de-duplication / hardlinking

Followers

November 12, 20214 yr

Like many of you, I use my unraid server for plex and media downloading/management. Recently I discovered that hardlinks weren't working properly and found out it was because I was downloading to a different folder mapping than the storage mapping. I set everything to the same /Media path and its working now, but I have a LOT of old data that is now duplicated in the downloads folder(/Media/Sonarr/Downloads) and in the place that Sonarr then organized it after it finished(/Media/TV and /Media/Anime). I've read about some tools like Czkawka(https://github.com/qarmin/czkawka) and DupeGuru(https://github.com/arsenetar/dupeguru) that will help me find the duplicate files, hardlink(or symlink? softlink?) them, and remove the duplicate.

I want to do this but I only have enough linux knowledge to do basics or follow instructions. My main concerns are that some files in the downloads folder might be a duplicate, but not be on the same drive anymore (I have 2 drives + 1 parity + cache), and I think that will be an issue? Also I'm not familiar with the inner workings of unraid and how it presents multiple drives as one folder in /mnt/user/, and I don't want to break it running something not intended for this configuration on it.

So my question, can any of you help me figure out how to do this properly with any of these(or other) tools?

Edited November 19, 20214 yr by airy52
solved

Quote

November 19, 20214 yr

Author

For anyone that finds this in a search, I used jdupes included in nerdpack/nerdtools. Command I used was:

jdupes -QrL -X size+=:100000k /mnt/user/Media/

get rid of -Q(quick, uses file hashes instead of direct binary comparison) if you don't care if it takes longer or your data is nuclear launch codes or something.

Quote

4 yr4 yr airy52 changed the title to [SOLVED]Help with de-duplication / hardlinking
2 years later...

April 4, 20242 yr

Apologies for the necro post but this was one of the few threads I've found that has a solution for deduplication and I wanted to expand on the command above from OP. Mainly just adding the flags from their command so you know what it's doing without blindly running it.

jdupes Usage: https://codeberg.org/jbruchon/jdupes#usage

jdupes -QrL -X size+=:100000k /mnt/user/<share>

-Q --quick             
	skip byte-by-byte duplicate verification. WARNING:
    this may delete non-duplicates! Read the manual first!

-r --recurse
	for every directory, process its subdirectories too

-R --recurse:
	for each directory given after this option follow
	subdirectories encountered within (note the ':' at
	the end of the option, manpage for more details)

-L --link-hard
	hard link all duplicate files without prompting

-X --ext-filter=x:y
	filter files based on specified criteria
	Use '-X help' for detailed extfilter help
	
	size[+-=]:size[suffix]
		Only Include files matching size criteria
		Size specs: + larger, - smaller, = equal to
		Specs can be mixed, i.e. size+=:100k will
		only include files 100KiB or more in size.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

[SOLVED]Help with de-duplication / hardlinking

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)