1 TB Of File Mess


Recommended Posts

So.... My wife, photographer on the side, has over the years taken thousands and thousands of pictures. She's got folder after folder of iPhone camera roll backup, photo shoots, etc. And she's made a total mess with nested copies of folders, sometimes the same file in like 3 different formats (CR2, RAW, JPG, etc).

 

It's finally filled up her computer, to the tune of 1TB of this stuff. Literally.

 

I've backed it all up to my unraid server..

 

There's no way she'll ever be able to go through all of it manually, there's literally over 250,000 files.

 

Is there a decent "duplicate file" or even "duplicate image" docker out there that would make life easier for me to get rid of stuff?

 

My first pass at this will be to consolidate her 8 years or more of iPhone photo backups, of which I'm sure there are multiple copies of things. But after that, wondering if there are any good tools to help me get started straightening this mess out once and for all?


I'd like to be diligent about it, since pictures of our kids are mixed in there with peoples kids I don't even know.

 

Ughh...

 

Thoughts?

 

6vib.jpg

Link to comment
59 minutes ago, CowboyRedBeard said:

Is there a decent "duplicate file" or even "duplicate image" docker out there that would make life easier for me to get rid of stuff?

 

Yup, DupeGuru is a good place to start -- it's a duplicate picture finder with options to delete. One caveat - I don't know how many different picture formats it supports, been awhile since I've used it.

 

There's a unRAID docker template for it Template URL:  https://raw.githubusercontent.com/jlesage/docker-templates/master/jlesage/dupeguru.xml

Support thread: https://forums.lime-technology.com/topic/56392-support-dupeguru/ 

 

URL's just copy/pasted from my template, but I'm sure I got it from Community Apps plug-in. 

 

As for a meta-data album and cataloging application, sorry I don't have a good answer there.

Edited by Jcloud
  • Like 1
Link to comment

But the question here is if she should look for duplicate pictures as in "showing the same image" or just look for duplicated binary files.

 

If she have both jpeg and raw image files of the same photo, then she is likely to want to keep both formats.

 

But if she post-processes raw files and have made 10 different backups while she gradually processes an image, then it's quite likely that she do not want to keep the half-edited photos. So then it would be good with a software that compares the image as displayed instead of doing a binary compare of the files.

Link to comment

All good questions/points, I don't know if I have a good response, other than try/look-at DupuGuru and see if it's useful to you. It does have a number of detection methods and option to just send dups to a "trashcan" if done right (give a chance for "woops" wrong category).  Trying to recall off the top of my head but it has EXIF compare, same/similar file names, and picture blocks. 

 

Also the program steps through everything so you'll get a chance to review files before they're nuked by the app.

Edited by Jcloud
Link to comment

Perhaps your problem is less severe than you think? I have about 10 tb of dear wife images on my server. Many millions taken in RAW, JPG and edited versions in PSD and Tiff.

 

Trying to delete duplicates is an exercise in futility. Space is cheap.

 

Trying to organize it is also a waste of time. Every way you try to do it is wrong in some way. What I need is a image search engine that makes it easy to find what you are looking for. Face recognition helps, but it would be nice to be able to query "Show me all the images of the cousins together"

 

I am still looking....

 

 

 

 

 

 

Link to comment

Thanks I'll take a look at that, I think it'd be good for the iPhone image directories for sure... You make good points about the formats, but half of these are a decade old at this point so I doubt she'll be editing again. Probably just keep the JPEG and RAW in case someone wants a copy of something redone.

Link to comment
3 hours ago, CowboyRedBeard said:

Also, the Krusader disk usage tool is VERY handy to quickly visualize where the biggest files are.

If you're looking for visualization of biggest files, take a look at QDirStat a WinDirStat/GrandPerspective clone - if you think that will help you. 

Glad to read the solution set is working for you.

 

Have a good one.

Link to comment
3 minutes ago, Jcloud said:
3 hours ago, CowboyRedBeard said:

Also, the Krusader disk usage tool is VERY handy to quickly visualize where the biggest files are.

If you're looking for visualization of biggest files, take a look at QDirStat a WinDirStat/GrandPerspective clone - if you think that will help you. 

Potatoe, potato. The krusader disk usage tool IS a visual tree size viewer. No real need for others when it's built right in. No, it doesn't show the entire volume at once, but it shows the relative size of each item in the current path, and allows you to drill down.

 

I tend to use the other tree viewer tools exactly the same way, find the first largest area I'm curious about, and drill down.

Link to comment
16 minutes ago, jonathanm said:

Potatoe, potato. The krusader disk usage tool IS a visual tree size viewer

I'll have to go looking for that, I didn't know it was there, hence my suggestion. Your other post where guy was looking for WinDirstat makes a whole lot more sense to me. 

I guess we suggest the tools we know and use. Some people just have a better tool-kit/knowledge base  (suggesting yours is >> mine). :D

 

Edited by Jcloud
Link to comment

A million photos??! This means like 100 photos a day for 2.5 years.

 

I would imagine this is what she does for a living. I would think there are high end image cataloging database type software packages. But I assume that she would have to be diligent about entering metadata. Otherwise it would be finding needles in the haystack.

 

I recall trying out Piwigo and Razuna to attempt cataloging my family photos. It was not easy/reliable and I gave up. But I didn’t try that hard either.

 

H.

 

 

Link to comment
33 minutes ago, hernandito said:

A million photos??! This means like 100 photos a day for 2.5 years.

100 photos/day and 2.5 years would only be just over 90,000 photos. But if the camera produces JPEG+RAW and there are 3-5 edits/image then it's over 500,000. And for people taking photos of animals or humans, it's common to set the camera to take multiple shots - helps a lot to get images where people doesn't blink or where the animal/baby looks most goofy. So not so much problem getting millions of image files even if the actual output might be 1-10 images/day.

 

40 minutes ago, hernandito said:

I would think there are high end image cataloging database type software packages.

Lots of people using Lightroom just because it contains a good database for handling the photos. The software is cheap, and it isn't necessary to use Lightroom for actually editing the photos.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.