unRAIDFindDuplicates.sh


Recommended Posts

Did the test duplicate file that you created exist in the same place on different disks.

 

So if you have a duplicate file at /mnt/disk1/TV/series/file and /mnt/disk2/TV/series/file (just the name has to match, not the contents) then this script will report it as a duplicate.   This often happens if you re-organise the data in a share by copying rather than moving the files between disks.

 

As the contents of the 2 disks in the above example would be merged together under /mnt/user/TV to create the TV share only one version of the file will be visible at /mnt/user/TV/series/file, which is why it is useful to identify which files are duplicated and remove the duplicate.

 

i suspect that you were expecting it to report files which have the same name and/or content in random locations .... unfortunately this script does not address this problem.

Link to comment
  • 7 months later...
6 hours ago, EvilSpice said:

Thanks for this script. after a weeklong effort to convert several drives to xfs i found myself with multiple copies of many of my files. this tool made taking care of this issue simple and painless

 

Thanks!

That was exactly the reason I wrote it in the first place as I ended up in the same situation :)

Link to comment
  • 5 months later...

Thanks for this.

 

I ended up creating a real problem for myself with the unbalance plugin (cancelling it in progress causes it to leave copies of files in place) and I needed a real utility to deal with the mess.

 

Unfortunately for me, while the output of this app is useful, I had far, far too many duplicates to deal with by hand.  So I wrote this little Windows (sorry) utility to automate the process for me:

 

Loading file... 
Buffy S01E10.mkv size mismatch!
Chef S01E04.mkv size mismatch!
Read 1515 / 2589 lines.
Finding size mismatches...
/mnt/disk11/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 99,975,168
/mnt/disk12/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 1,981,632,213
/mnt/disk6/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 1,981,632,213
/mnt/disk12/media/series/Chef/Chef S01E04.mkv : 487,030,784
/mnt/disk6/media/series/Chef/Chef S01E04.mkv : 1,045,681,530
Generating list of files that are safe to delete...
Script will delete 955 out of 1,515 files. Total space saved: 1,317,440,808,558 bytes (1.2 TB)
File C:\zzz\delete.txt successfully created.

Essentially, it parses the script's log file, highlights the files that are safe to leave in place and those that are risky and creates a script file that you can execute from the UnRAID console.  The output of the script looks like this:

echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E08.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E08.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E08.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E09.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E09.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E09.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E10.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E10.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E10.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E11.mkv"

In the hope that I can save others some time, I have included the app (binary and source) in this post.

 

Be careful, folks! I sincerely hope that no one loses their data with this thing.

UnRAIDdeDupe.zip

Edited by Excessus
Link to comment
  • 2 months later...
On 3/11/2019 at 2:57 PM, Excessus said:

Thanks for this.

 

I ended up creating a real problem for myself with the unbalance plugin (cancelling it in progress causes it to leave copies of files in place) and I needed a real utility to deal with the mess.

 

Unfortunately for me, while the output of this app is useful, I had far, far too many duplicates to deal with by hand.  So I wrote this little Windows (sorry) utility to automate the process for me:

 


Loading file... 
Buffy S01E10.mkv size mismatch!
Chef S01E04.mkv size mismatch!
Read 1515 / 2589 lines.
Finding size mismatches...
/mnt/disk11/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 99,975,168
/mnt/disk12/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 1,981,632,213
/mnt/disk6/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 1,981,632,213
/mnt/disk12/media/series/Chef/Chef S01E04.mkv : 487,030,784
/mnt/disk6/media/series/Chef/Chef S01E04.mkv : 1,045,681,530
Generating list of files that are safe to delete...
Script will delete 955 out of 1,515 files. Total space saved: 1,317,440,808,558 bytes (1.2 TB)
File C:\zzz\delete.txt successfully created.

Essentially, it parses the script's log file, highlights the files that are safe to leave in place and those that are risky and creates a script file that you can execute from the UnRAID console.  The output of the script looks like this:


echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E08.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E08.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E08.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E09.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E09.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E09.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E10.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E10.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E10.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E11.mkv"

In the hope that I can save others some time, I have included the app (binary and source) in this post.

 

Be careful, folks! I sincerely hope that no one loses their data with this thing.

UnRAIDdeDupe.zip 65.16 kB · 7 downloads

Can you provide a simple guide on how to use this feature? I have the same issue that's to unbalance and have 500+ GB of duplicated steam games. I have downloaded your zip file, but am unsure what to do at this time.

 

EDIT:

 

I was able to successfully import the code into VS, and get it to run.

 

Thanks for this, just waiting on my first log file to finish. May have questions then!

Edited by mihcox
Link to comment
  • 4 months later...
On 3/11/2019 at 8:57 PM, Excessus said:

Thanks for this.

 

I ended up creating a real problem for myself with the unbalance plugin (cancelling it in progress causes it to leave copies of files in place) and I needed a real utility to deal with the mess.

 

Unfortunately for me, while the output of this app is useful, I had far, far too many duplicates to deal with by hand.  So I wrote this little Windows (sorry) utility to automate the process for me:

 


Loading file... 
Buffy S01E10.mkv size mismatch!
Chef S01E04.mkv size mismatch!
Read 1515 / 2589 lines.
Finding size mismatches...
/mnt/disk11/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 99,975,168
/mnt/disk12/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 1,981,632,213
/mnt/disk6/media/series/Buffy the Vampire Slayer/Buffy S01E10.mkv : 1,981,632,213
/mnt/disk12/media/series/Chef/Chef S01E04.mkv : 487,030,784
/mnt/disk6/media/series/Chef/Chef S01E04.mkv : 1,045,681,530
Generating list of files that are safe to delete...
Script will delete 955 out of 1,515 files. Total space saved: 1,317,440,808,558 bytes (1.2 TB)
File C:\zzz\delete.txt successfully created.

Essentially, it parses the script's log file, highlights the files that are safe to leave in place and those that are risky and creates a script file that you can execute from the UnRAID console.  The output of the script looks like this:


echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E08.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E08.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E08.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E09.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E09.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E09.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E10.mkv"
rm "/mnt/disk12/media/series/Angel/Season 1/Angel S01E10.mkv"
rm "/mnt/disk6/media/series/Angel/Season 1/Angel S01E10.mkv"
echo "Keeping /mnt/disk11/media/series/Angel/Season 1/Angel S01E11.mkv"

In the hope that I can save others some time, I have included the app (binary and source) in this post.

 

Be careful, folks! I sincerely hope that no one loses their data with this thing.

UnRAIDdeDupe.zip 65.16 kB · 16 downloads

I cant get it to work. 
when i open the "UnRAIDdeDupe\bin\Debug\UnRAIDdeDupe.exe" and import the file i get an error message:

************** Ausnahmetext **************
System.ArgumentOutOfRangeException: Der Index lag außerhalb des Bereichs. Er darf nicht negativ und kleiner als die Sammlung sein.

how exactly should i run the script to generate the right output file for your program ? 
 

Edited by Wuast94
Link to comment
31 minutes ago, Wuast94 said:

I cant get it to work. 
when i open the "UnRAIDdeDupe\bin\Debug\UnRAIDdeDupe.exe" and import the file i get an error message:


************** Ausnahmetext **************
System.ArgumentOutOfRangeException: Der Index lag außerhalb des Bereichs. Er darf nicht negativ und kleiner als die Sammlung sein.

how exactly should i run the script to generate the right output file for your program ? 
 

Your source file should look like this:

 

Quote

Duplicate Files
---------------

-rw-rw-rw- 1 nobody users 2050156830 Feb  7  2017 /mnt/disk11/media/series/Angel/Season 1/Angel S01E08.mkv
-rw-rw-rw- 1 nobody users 2050156830 Feb  7  2017 /mnt/disk12/media/series/Angel/Season 1/Angel S01E08.mkv

 

and not, like this:

 

Quote

Duplicate Files
---------------

media/series/Angel/Season 1/Angel S01E08.mkv


media/series/Angel/Season 1/Angel S01E09.mkv


media/series/Angel/Season 1/Angel S01E10.mkv
 

 

It's been a while since I've need to use the script, so I forgot what options I had to use to generate the former.

Link to comment
  • 5 months later...
On 10/10/2017 at 3:41 AM, itimpi said:

Have you given the script execute permission?     If you downloaded it to the flash drive this should be automatic (because it is FAT32 format) but will not be the case if put elsewhere.  Alternatively run it using the ‘sh’ command which does not require the script to have ‘execute’ permission.

I'm having the same issue today.  I have this script on my flash drive, but get permissions denied error also.

 

root@media:/boot/scripts# unRAIDFindDuplicates.sh -v
-bash: ./unRAIDFindDuplicates.sh: Permission denied

 

What am I doing wrong, how to fix?

 

Link to comment
1 hour ago, JustinChase said:

I'm having the same issue today.  I have this script on my flash drive, but get permissions denied error also.

 

root@media:/boot/scripts# unRAIDFindDuplicates.sh -v
-bash: ./unRAIDFindDuplicates.sh: Permission denied

 

What am I doing wrong, how to fix?

 

This is due to a change in the security that came in with Unraid 6.8.x series where files on the flash drive are no longer allowed to hav execute permission.   You can get around this by preceding the script name with the ‘bash’ command.  E.g.
 

bash unRAIDFindDuplicates.sh -v

 

 

Link to comment
15 minutes ago, itimpi said:

This is due to a change in the security that came in with Unraid 6.8.x series where files on the flash drive are no longer allowed to hav execute permission.   You can get around this by preceding the script name with the ‘bash’ command.  E.g.
 

bash unRAIDFindDuplicates.sh -v

 

 

That makes sense, thanks for letting me know.

 

Could you please add this to the first post, because I'll forget in a year or 2 when I try to do this again, and I always look to the first post for instructions, which will save me having to search the thread to find this tidbit.

 

Thanks again, the script works great (once I get it to run).

Link to comment
  • 1 year later...

hi,

 

I stumbled across this, and it looks like what i need to detect duplicates across disks under the logical share - so thanks.

 

When I run without options, i get no dupes error after about 30s - all good (i assume!).

 

I then went to run it with -c to double check

 

bash unRAIDFindDuplicates.sh -c

 

And it immediately responds with the help text.

 

Q1: Is this a defect or am I doing something wrong?

 

I then tried:

 

bash unRAIDFindDuplicates.sh -z

 

It said no duplicates (again after about 30s), then sat there reporting nothing for a few mins and eventually came back with lots of errors such as: (this is a subset):

 

ls: cannot access '/mnt/disk*//appdata/ESPHome/hot_water_system/.piolibdeps/hot_water_system/ESPAsyncTCP-esphome/examples/SyncClient/.esp31b.skip': No such file or directory
ls: cannot access '/mnt/disk*//appdata/FileBot/log/nginx/error.log': No such file or directory
ls: cannot access '/mnt/disk*//appdata/FileBot/xdg/cache/openbox/openbox.log': No such file or directory
ls: cannot access '/mnt/disk*//appdata/FileBot/.licensed_version': No such file or directory
ls: cannot access '/mnt/disk*//appdata/FileBot/error.log': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/influxdb/wal/telegraf/autogen/250/_01402.wal': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/influxdb/wal/_internal/monitor/258/_00094.wal': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/influxdb/wal/home_assistant/autogen/255/_00003.wal': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/loki/index/index_2573': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/loki/index/index_2520': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/loki/index/index_2525': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/loki/index/index_2551': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/loki/index/index_2552': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/loki/index/index_2579': No such file or directory
ls: cannot access '/mnt/disk*//appdata/Grafana-Unraid-Stack/data/loki/index/index_2609': No such file or directory

 

Q2: What is it trying to do when checking for zero length dupes that it isn't when running with no options?

 

I then ran in verbose out of interest

bash unRAIDFindDuplicates.sh -v

 

I noticed two things:

 

1 - this error half way through:

List duplicate files
unRAIDFindDuplicates.sh: line 373: verbose_to_bpth: command not found
checking /mnt/disk1

 

Q3: Will this error affect the actual dupe check? - I assume not.

 

2 - it doesn't seem take into consideration additional cache drives that are an option to define in v6.9 (I have a second called 'scratch')

 

Q4: Would you be willing to add something that can dynamically check for additional cache drive config and include in the no option execution?

 

I then tried the -D option to add the additional cache drive (/mnt/scratch) to be treated as an array drive, and it went a bit screwy!

 

bash unRAIDFindDuplicates.sh -v -D /mnt/scratch

 

Output (killed with ctrl-c in the end)

 

============= STARTING unRAIDFIndDuplicates.sh ===================

Included disks:
   /mnt/disk/mnt/scratch
   /mnt/disk1
   ...
...
List duplicate files
unRAIDFindDuplicates.sh: line 373: verbose_to_bpth: command not found
unRAIDFindDuplicates.sh: line 404: cd: /mnt/disk/mnt/scratch: No such file or directory
checking /mnt/disk/mnt/scratch
    [SHARENAMEREDACTED]
	...
Duplicate Files
---------------
**Looks like it's now listing every file below here (these may be genuine - TBC)**
...

 

I'm running 6.9.2 if that helps in anyway.

 

Thanks!

John

Edited by johner
further tests, and clear questions call out, extra update on 2nd cache drive
Link to comment
  • 1 year later...
12 hours ago, ssean said:

Is this script still the best solution for finding duplicate files?

 

Please note that we are talking about files with the same name that are present in more than one location and are thus wasting space. 

Be interested in the response you get :)

 

I wrote that script a long time ago, but if there is still interest in it and it’s capabilities are not superseded by something else I may look at reworking it to be a plugin which should make it friendlier to use.

Link to comment
  • 4 months later...
6 hours ago, flyize said:

Since that uses /mnt/user by default, how can it be setup to search each disk? One of the first replies in that thread seems to suggest that won't, but maybe I'm missing something?

Change the /mnt/user mapping to /mnt, but be VERY careful not to go into the /mnt/user folder for finding dupes. It's either / or, and if you open up /mnt you are setting yourself up for a world of hurt if you allow it to search for dupes in /mnt/user AND /mnt/diskX or /mnt/poolname at the same time.

  • Like 1
Link to comment
  • 9 months later...
  • 3 months later...

Just wanted to do two things:

 

1. A big thankyou to itimpi for the unRAIDFindDuplicates script. I have had a few copy/move errors over the last decade and itempi's script just found nearly 400GB of dupes scattered over my 42GB unraid array.

 

2. I banged together a little script that looks at the output of the itimpi's script, and deletes the dupes. Note that you must do a bit of cleaning of itimpi's output file first - delete everything except the file paths. That is, remove the lines at beginning of duplicates.txt that look like this: (also delete file size warnings, and the lines for files associated to the warnings)

COMMAND USED:  ./unRAIDFindDuplicates.sh

Duplicate Files
---------------

 

Here is my script - I called it 'delete-dupes.sh'. Execute it like this: bash ./delete-dupes.sh '/boot/duplicates.txt'

#!/bin/bash

# Check if the file exists
if [ ! -f "$1" ]; then
    echo "File not found!"
    exit 1
fi

# Read the file line by line
while IFS= read -r line; do
    # Check if the line is empty
    if [ -n "$line" ]; then
        # Prepend "/mnt/user/" to the line
        path="/mnt/user/$line"
        # Delete the file path
        rm -v "$path"
    fi
done < "$1"

 

Be careful. If you execute the delete-dupes script twice in a row it will delete the remaining (now unique) files.

 

I had thousands of files that were duplicated, without the script I would have been manually deleting duplicate files for weeks.

 

Thanks again itempi!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.