Jump to content

Project: Duplicate file handling tool

7 posts in this topic Last Reply

Recommended Posts

Forked from a thread that was going OT.


I predict Joe L will post a clever SED script any time now :)

Share this post

Link to post

Ok, copied from the other thread...


I made it into two lines for readability.  You can put it all on one line

grep "duplicate object" /var/log/syslog | cut -d" " -f8- | 
    sed -e "s/^\/[^\/]*\/[^\/]*\/\(.*\)/ls -l \/*\/*\/'\1'/" | sort -u | sh -


This should list the duplicate files as found by user-shares in parallel folders in the /mnt/disk?? shares.


If your syslog is HUGE, perhaps you need to just take the tail end of the syslog like this:

tail -10000 /var/log/syslog | grep "duplicate object" | cut -d" " -f8- | 
    sed -e "s/^\/[^\/]*\/[^\/]*\/\(.*\)/ls -l \/*\/*\/'\1'/" | sort -u | sh -


The trick to regular expressions is all in knowing where to put the backslashes.


Joe L.

Share this post

Link to post

Added this to the UnRAID Add Ons wiki page, here.  Feel free to edit.


This needs more instruction I think, and an example, for new users.  Could use a link to the original thread too.

Share this post

Link to post

This topic originally began here, in the "Spin down timers - are they in HDD firmware or stored in slackware" thread.


The easiest way to identify duplicates is to install the UnMENU addon, and use its Dupe files plugin.


If you don't install UnMENU, the syslog gives you *some* information, enough to figure out where they are.  You can manually locate the duplicates by finding a particular file listed in the syslog as a "duplicate object", making a note of its drive and path, then searching the syslog for additional copies on other drives.  That will provide you with a list of all but the first, which you can assume has the same path, but is on one of the drives that are LOWER than the lowest drive you have found listed in the syslog.  An example:


  /mnt/disk2/Movies/Action/Terminator.mpg  (first one is never a duplicate, will not be in syslog)

  /mnt/disk3/Movies/Action/Terminator.mpg  (found in syslog as "duplicate object")

  /mnt/disk6/Movies/Action/Terminator.mpg  (found in syslog as "duplicate object")


The syslog will indicate that Terminator is duplicated twice, with copies on Disk 3 and Disk 6, and you can conclude that there is a third copy, and that it is on either Disk 1 or Disk 2, with the same path as the others.

Share this post

Link to post

How does the duplicate handling work?  Just checking file names?  or some kind of hash check?

It just checks the names.  If they are in parallel folders on different disks, with the same name, only the lower numbered disk file is accessible in the user-share.  The others are logged to the syslog, but the log entry does not tell you where the first one was located, only the subsequent ones... The script above in this thread finds the file with the similar name in the parallel path on each of the disks.


The files can be completely different, or identical... It is up to you to figure out what to do with them.

Share this post

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.