January 15, 20233 yr I recently moved to a new array with drives and use unbalance to copy data on to the new drives. I now notice that there are duplicate files on different disks as shown in the attachments. Is there a script or plugin to sort this out or do I manually go in there and find them and delete them from the disks? thanks!
January 18, 20233 yr Author No, not my issue. I used it and it will find duplicates within the shares. Unraid has several of the same files spread across the actual disks. In the shares it shows only the 1 file. for example, 'file1' is on Disk 1 and Disk 2 under Videos. Looking at the Share it only shows 'file1'. Yet it is taking 2x the disk space. Does that make sense?
January 18, 20233 yr Community Expert Solution https://forums.unraid.net/topic/74760-how-do-i-find-duplicates-on-multiple-discs/?do=findComment&comment=688886
January 18, 20233 yr Author Well thats not gonna happen lol. I will work on my plan b. Move the directories with unbalance to a new drive and overwrite. Thanks again though.
January 18, 20233 yr That link contained info for Dupegugu, and there was also a second link (one I was looking for to provide you and couldn't find) for a script itimpi created you can find here: https://forums.unraid.net/topic/33535-unraidfindduplicatessh/
January 19, 20233 yr Author 19 hours ago, klepel said: That link contained info for Dupegugu, and there was also a second link (one I was looking for to provide you and couldn't find) for a script itimpi created you can find here: https://forums.unraid.net/topic/33535-unraidfindduplicatessh/ Thanks! It helped me find them when I missed them. Appreciate it!
March 10, 20233 yr On 1/18/2023 at 11:23 AM, JorgeB said: https://forums.unraid.net/topic/74760-how-do-i-find-duplicates-on-multiple-discs/?do=findComment&comment=688886 As a FYI I find Czkawka in Dockerhub works best.
February 18, 20242 yr Thanks for the clarification on shares vs disks usage. It helped validate that this was the tool I wanted to use to try this, rather than an external one which would have inherent limitations. I thought I didn't need to do this and I was just checking. My process is very clean, so I shouldn't have needed to do this. Despite that, I still found a fair number of dupes. This was a good reminder of the old accountant's saying: 99% correct is 1% wrong. It doesn't mean your process doesn't work. It's a reminder of why there are checks and balances. I definitely think it's worth running occasionally, probably first on shares then secondly on disks. Observations: this takes a few gigs of memory with larger data sets just against filenames, and can probably easily consume even more with more complicated data sets or filters. It takes some time to process, so it's probably best to fire it off (before you run out of space ) and return later. Delete Dedupe with care.... Good luck. Edit: Most of these were simple name collision dupes (i.e. "Book 1") 😂🤦♂️but enough were real to be worth the effort. Edited February 18, 20242 yr by ixit
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.