Duplicate file logging


Recommended Posts

Can someone sanity check this request as I dont have a 5 series to check at hand.

 

When unRAID detects a duplicate file it create a log entry. This is very useful. However the log entry only contains one side of the duplicate and not both.

 

Request that the log entry shows both duplicates for easy correction

Link to comment

Great idea...

 

but you can always delete the one it gives you and hope the other one (invisible) exists and is same quality ....

that's why i also would like to see both positions...

so you can compare/ check if the other one is even good....

I think you have it backwards, from my recollection, it will list in the syslog the location of the duplicate that is not visible.  (the file on the lowest numbered disk is the one visible in the user-share)

 

 

Link to comment

Joe.

 

I am not the one to doubt you as you have much more experience then i have

but i think it is the latest added that shows up in your logs

 

i had an small hickup due to my psu and while i was rebuilding my disk with reiserfs rebuild tree my program sickbeard had updated all missing folder.jpg / banner.jpg/ tvshow.jpg  on other disks

when my disk was rebuild an brought back online the syslog had literraly dozens of duplicates on the disk i brought back....

now it was disk 4... so you might be right ...  anyway i deleted all these jpg files as i knew sickbeard would put the missing back :P

 

 

Link to comment

Good to know.

 

In a perfect world we would md5sum compare them and just delete 100% match dupes.

 

What i know is thats its a complete pain in the ass to do manually so there has to be a slicker way

No. Just no. I do not want my array automatically deleting files for me. It would be nice if it did a binary compare and informed me of whether or not they match, but leave the management decision to me.
Link to comment

Each to there own but as far as I am concerned a file with the same md5sum, filename and path is just noise.

 

I estimate I have over 2000 duplicate files this now for some reason

Noise to you, backup duplication for me. It provides a way to keep a copy of a file hidden from a user share so it won't get altered or deleted, and it means I'd have to lose 3 physical disks to lose that data. Like you said, to each their own. It's a useful function to me.
Link to comment

Good to know.

 

In a perfect world we would md5sum compare them and just delete 100% match dupes.

 

What i know is thats its a complete pain in the ass to do manually so there has to be a slicker way

No. Just no. I do not want my array automatically deleting files for me. It would be nice if it did a binary compare and informed me of whether or not they match, but leave the management decision to me.

 

 

It should be a separate plugin... but ... which file should get deleted?

 

 

Perhaps we should lobby that shfs is altered so that the source file AND it's duplicate are listed.

 

 

I would rather see which files were duplicated and let me delete what's needed.

Link to comment

What about: "ls /mnt/disk*/path/filename"?

or the only slightly  ;) less intuitive:

tail -30000 /var/log/syslog | grep "duplicate object" | cut -d":" -f5-  | sed "s/^ //" | sort -u |  sed -e "s/^\\/[^\\/]*\\/[^\\/]*\\/\\(.*\\)/ls -lad \\/mnt\\/*\\/\"\\1\"/" | sh - |  grep -v "/mnt/user/" 

Enter it all as one line...

Link to comment

How about a method of including the duplicate files in a directory listing, e.g., by adding either a prefix or a suffix of some sort to the file name?  For example, suppose we have:

 

disk3/backup/afile

disk5/backup/afile

 

User share directory listing of 'backup' could show this:

 

afile

afile~disk5~

 

In addition, I can permit all file operations on 'afile~disk5~' such as rename, copy, delete, etc.

 

Sound reasonable?  Should there be a different nomenclature for the suffix?

Link to comment

So there are two issues here:

 

1. Making it easy for users to locate and deal with duplicates. This suffix makes this very easy and at same time doesn't cause issues for things like XBMC. Nice

2. Log load. For cache_dirs users these duplicates create a load on logging.

 

root@TOWER:/mnt/user/TV/D# grep -i duplicate /var/log/syslog* | wc -l

14726

 

Thats from one day.

 

I am not sure we have a solution to that yet.

Link to comment
  • 5 months later...

What about: "ls /mnt/disk*/path/filename"?

or the only slightly  ;) less intuitive:

tail -30000 /var/log/syslog | grep "duplicate object" | cut -d":" -f5-  | sed "s/^ //" | sort -u |  sed -e "s/^\\/[^\\/]*\\/[^\\/]*\\/\\(.*\\)/ls -lad \\/mnt\\/*\\/\"\\1\"/" | sh - |  grep -v "/mnt/user/" 

Enter it all as one line...

small edit to stop user0 share being output:

tail -30000 /var/log/syslog | grep "duplicate object" | cut -d":" -f5-  | sed "s/^ //" | sort -u |  sed -e "s/^\\/[^\\/]*\\/[^\\/]*\\/\\(.*\\)/ls -lad \\/mnt\\/*\\/\"\\1\"/" | sh - |  grep -v "/mnt/user" 

 

I wonder how difficult it would be to find the duplicates which sit on a disk alone, or at least with the least other data.  My application is media files where I expect all files related to a movie or TV season to be on the same disk.

I thinking a small mod to the mover script would be able to achieve this - albeit beyond my feeble abilities  :-[

 

Link to comment

 

User share directory listing of 'backup' could show this:

 

afile

afile~disk5~

 

In addition, I can permit all file operations on 'afile~disk5~' such as rename, copy, delete, etc.

 

Sound reasonable?  Should there be a different nomenclature for the suffix?

I would love an easy way to find out on what disk a file in a user share has been placed regardless of whether it is a duplicate.  At the moment I might have to search all my disks (20 in total) if I want to find the physical location of a file.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.