Jump to content
CowboyRedBeard

1 TB Of File Mess

12 posts in this topic Last Reply

Recommended Posts

So.... My wife, photographer on the side, has over the years taken thousands and thousands of pictures. She's got folder after folder of iPhone camera roll backup, photo shoots, etc. And she's made a total mess with nested copies of folders, sometimes the same file in like 3 different formats (CR2, RAW, JPG, etc).

 

It's finally filled up her computer, to the tune of 1TB of this stuff. Literally.

 

I've backed it all up to my unraid server..

 

There's no way she'll ever be able to go through all of it manually, there's literally over 250,000 files.

 

Is there a decent "duplicate file" or even "duplicate image" docker out there that would make life easier for me to get rid of stuff?

 

My first pass at this will be to consolidate her 8 years or more of iPhone photo backups, of which I'm sure there are multiple copies of things. But after that, wondering if there are any good tools to help me get started straightening this mess out once and for all?


I'd like to be diligent about it, since pictures of our kids are mixed in there with peoples kids I don't even know.

 

Ughh...

 

Thoughts?

 

6vib.jpg

Share this post


Link to post
Posted (edited)
59 minutes ago, CowboyRedBeard said:

Is there a decent "duplicate file" or even "duplicate image" docker out there that would make life easier for me to get rid of stuff?

 

Yup, DupeGuru is a good place to start -- it's a duplicate picture finder with options to delete. One caveat - I don't know how many different picture formats it supports, been awhile since I've used it.

 

There's a unRAID docker template for it Template URL:  https://raw.githubusercontent.com/jlesage/docker-templates/master/jlesage/dupeguru.xml

Support thread: https://forums.lime-technology.com/topic/56392-support-dupeguru/ 

 

URL's just copy/pasted from my template, but I'm sure I got it from Community Apps plug-in. 

 

As for a meta-data album and cataloging application, sorry I don't have a good answer there.

Edited by Jcloud
  • Like 1

Share this post


Link to post

But the question here is if she should look for duplicate pictures as in "showing the same image" or just look for duplicated binary files.

 

If she have both jpeg and raw image files of the same photo, then she is likely to want to keep both formats.

 

But if she post-processes raw files and have made 10 different backups while she gradually processes an image, then it's quite likely that she do not want to keep the half-edited photos. So then it would be good with a software that compares the image as displayed instead of doing a binary compare of the files.

Share this post


Link to post
Posted (edited)

All good questions/points, I don't know if I have a good response, other than try/look-at DupuGuru and see if it's useful to you. It does have a number of detection methods and option to just send dups to a "trashcan" if done right (give a chance for "woops" wrong category).  Trying to recall off the top of my head but it has EXIF compare, same/similar file names, and picture blocks. 

 

Also the program steps through everything so you'll get a chance to review files before they're nuked by the app.

Edited by Jcloud

Share this post


Link to post

Perhaps your problem is less severe than you think? I have about 10 tb of dear wife images on my server. Many millions taken in RAW, JPG and edited versions in PSD and Tiff.

 

Trying to delete duplicates is an exercise in futility. Space is cheap.

 

Trying to organize it is also a waste of time. Every way you try to do it is wrong in some way. What I need is a image search engine that makes it easy to find what you are looking for. Face recognition helps, but it would be nice to be able to query "Show me all the images of the cousins together"

 

I am still looking....

 

 

 

 

 

 

Share this post


Link to post

Thanks I'll take a look at that, I think it'd be good for the iPhone image directories for sure... You make good points about the formats, but half of these are a decade old at this point so I doubt she'll be editing again. Probably just keep the JPEG and RAW in case someone wants a copy of something redone.

Share this post


Link to post

OK, reporting in that dupeGuru is A W E S O M E

 

Also, the Krusader disk usage tool is VERY handy to quickly visualize where the biggest files are.

  • Like 1

Share this post


Link to post
3 hours ago, CowboyRedBeard said:

Also, the Krusader disk usage tool is VERY handy to quickly visualize where the biggest files are.

If you're looking for visualization of biggest files, take a look at QDirStat a WinDirStat/GrandPerspective clone - if you think that will help you. 

Glad to read the solution set is working for you.

 

Have a good one.

Share this post


Link to post
3 minutes ago, Jcloud said:
3 hours ago, CowboyRedBeard said:

Also, the Krusader disk usage tool is VERY handy to quickly visualize where the biggest files are.

If you're looking for visualization of biggest files, take a look at QDirStat a WinDirStat/GrandPerspective clone - if you think that will help you. 

Potatoe, potato. The krusader disk usage tool IS a visual tree size viewer. No real need for others when it's built right in. No, it doesn't show the entire volume at once, but it shows the relative size of each item in the current path, and allows you to drill down.

 

I tend to use the other tree viewer tools exactly the same way, find the first largest area I'm curious about, and drill down.

Share this post


Link to post
Posted (edited)
16 minutes ago, jonathanm said:

Potatoe, potato. The krusader disk usage tool IS a visual tree size viewer

I'll have to go looking for that, I didn't know it was there, hence my suggestion. Your other post where guy was looking for WinDirstat makes a whole lot more sense to me. 

I guess we suggest the tools we know and use. Some people just have a better tool-kit/knowledge base  (suggesting yours is >> mine). :D

 

Edited by Jcloud

Share this post


Link to post

A million photos??! This means like 100 photos a day for 2.5 years.

 

I would imagine this is what she does for a living. I would think there are high end image cataloging database type software packages. But I assume that she would have to be diligent about entering metadata. Otherwise it would be finding needles in the haystack.

 

I recall trying out Piwigo and Razuna to attempt cataloging my family photos. It was not easy/reliable and I gave up. But I didn’t try that hard either.

 

H.

 

 

Share this post


Link to post
33 minutes ago, hernandito said:

A million photos??! This means like 100 photos a day for 2.5 years.

100 photos/day and 2.5 years would only be just over 90,000 photos. But if the camera produces JPEG+RAW and there are 3-5 edits/image then it's over 500,000. And for people taking photos of animals or humans, it's common to set the camera to take multiple shots - helps a lot to get images where people doesn't blink or where the animal/baby looks most goofy. So not so much problem getting millions of image files even if the actual output might be 1-10 images/day.

 

40 minutes ago, hernandito said:

I would think there are high end image cataloging database type software packages.

Lots of people using Lightroom just because it contains a good database for handling the photos. The software is cheap, and it isn't necessary to use Lightroom for actually editing the photos.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now