how do I find duplicates on multiple discs?


Recommended Posts

I used unbalance to move some files, and it failed to delete the old files, so now I have duplicates on multiple discs.  how do i find and remove the duplicates?

 

I installed dupeguru, but it only looks at duplicates on shares, which these are not.  The share only shows one of the files, no duplicates.

 

i have started using MC to move from one disc to another, which is working, but this will require me to literally move every file from every disc to another disc to catch the duplicates, and this is just not practical.

 

Any help is much appreciated.  I've got terrabytes of duplicates now, and need to free up the space.

Link to comment

If you look into Dupeguru a bit more I think you'll find it can do what you need.

I have successfully used it to achieve exactly what you need - I believe the trick is to set your /storage/ point to say /mnt/ then you'll have access to all the disks. Then remember to address them through the /mnt/disk# within the app. 

Another way you could approach this is - lets say I have two directories /mnt/disk1/movies and /mnt/disk2/movies and I need to show duplicate file names. I'd do this:

find /mnt/disk1/movies -type f >> /root/duplicates

find /mnt/disk2/movies -type f >> /root/duplicates

 

What these steps do it list ALL files in the directories - putting them in a text file at /root/ - it's possible you might need to stick this file on a disk depending on your ram situation and how big it gets. Then it's simple matter of sorting and figuring out duplicates - however the pitfall here is we need to clear the disk info or it won't match. There's many ways to do this but for my simple situation I'll use cut (if you have more than ten subdirectories increase the 10 below), so:

cut -d'/' -f 4-10 /root/duplicates | sort | uniq -d

 

That'll output a list like:

movies/my duplicate file here.mkv

movies/another duplicate.mkv

 

Now this is the part that can go horribly wrong if you don't think about what you're doing.

If this list seems good, and you've got good backups you can just delete them via a script.

FAIR WARNING - test this thoroughly before proceeding using echo instead of rm. First move to the disk you want to remove the duplicates from (this will limit your deletions to disk1 for now):

cd /mnt/disk1

 

Note that the path output from the command above would 'work' from here? If it doesn't, or that doesn't make sense, stop and reassess. Once the path makes sense and looks correct we can script the deletions but first we verify it all makes sense using echo:

for a in $(cut -d'/' -f 4-10 /root/duplicates | sort | uniq -d); do echo rm -f $a; done

 

This should output something similar to:

rm -f movies/my duplicate file here.mkv

rm -f movies/another duplicate.mkv

 

--Stop here and assess if this will do what you think it will - you can try copying one of the lines and seeing if it works as intended - if so excellent! remove the 'echo' from the command above and instead of printing, it'll just delete.

For ultimate safety you could change the rm to say mv - if you had a backup or unassigned disk available - the command could be:

for a in $(cut -d'/' -f 4-10 /root/duplicates | sort | uniq -d); do mv $a /mnt/disks/myunassigneddisk/ ; done

 

Just a few ideas - you may need to adjust some of the commands to meet your use case.

Hope it helps,

Del

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.