Possible sources of duplicate files on my unRAID?


Recommended Posts

I ran the CA Fix Common Problems plug-in and it found thousands of duplicate files, almost exclusively on disk5 and disk6 of my 8 disk array.

I've been going though and deleting them off of disk6 since I read that the file on the lowest disk is the one that will be used (and any others will be ignored). 

 

I do not do any work at the disk level, so I'm trying to track down the source of these duplicate files.

The vast majority of the duplicates are music files. My normal way of organizing music is to upload the files to a temp share on unRAID using an SFTP client. Then I run Musicbrainz Picard to tag and move the music files to my Music share. All I can think of is that the files are getting put on separate disks in this process. Does any of this make sense?

 

My diagnostics are attached, if they are helpful in any way.

 

An old thread is linked below - it wasn't that helpful in finding a source for this problem, but I did just post how I am dealing with the duplicates.

 

 

tower-diagnostics-20220103-0914.zip

Link to comment

Like others in the thread that you linked to, as a long time user of Unraid (almost 11 years) I have never seen unexpected duplicates.  Given that you are certain that you have not been manipulating files at the /mnt/disk level, this points to problems in the use or application of some other tool.  I have no experience of Musicbrainz Picard so cannot comment on whether that may or may not have been responsible.   

 

As for which you copy of a file you choose to delete, are the duplicates that you see on multiple drives in the same share and in the same folder locations?  I would check there before determining which copy to remove.  If the paths to the files are the same on multiple disks then I would still do binary comparisons before deleting, although it should not then matter which is deleted.

 

One other question - do you have a backup on a separate device that you can go back to if you delete a duplicated file and for some reason its copy also then disappears (not sure how that would happen, but just in case...).  

Link to comment
1 minute ago, S80_UK said:

Like others in the thread that you linked to, as a long time user of Unraid (almost 11 years) I have never seen unexpected duplicates.  Given that you are certain that you have not been manipulating files at the /mnt/disk level, this points to problems in the use or application of some other tool.  I have no experience of Musicbrainz Picard so cannot comment on whether that may or may not have been responsible.   

 

As for which you copy of a file you choose to delete, are the duplicates that you see on multiple drives in the same share and in the same folder locations?  I would check there before determining which copy to remove.  If the paths to the files are the same on multiple disks then I would still do binary comparisons before deleting, although it should not then matter which is deleted.

 

One other question - do you have a backup on a separate device that you can go back to if you delete a duplicated file and for some reason its copy also then disappears (not sure how that would happen, but just in case...).  

 

Thank you for this.

 

I don't have a backup for files that are not that mission critical (flacs, for instance) or .nfo or jpg art files for movies and TV shows. I have several backups for things like family photos and videos, etc. (and none of the dups are of those critical files)

 

I am doing a binary compare with Czkawka, and I am confirming that the path for each is identical on both disks, and then I'm deleting the one on the higher-numbered disk. For the most part, it has been whole albums that have been duplicated, making me think that this is some combination of my SFTP client and Picard and how it moves music files to another share. 

 

Maybe I should take a closer look at how the shares are set up. Maybe the mover is doing something wonky with the files.

I assume that any time I copy a file to a share - whether it's from an SFTP client or via SMB, it goes first to the cache drive and then is handled by the mover, right?

 

Thanks for your help with this.

Link to comment
1 hour ago, volcs0 said:

Maybe I should take a closer look at how the shares are set up. Maybe the mover is doing something wonky with the files.

I assume that any time I copy a file to a share - whether it's from an SFTP client or via SMB, it goes first to the cache drive and then is handled by the mover, right?

 

Thanks for your help with this.

Again, I have never seen this, and I use the standard Mover almost daily (today I am ripping Christmas present CDs and Blu-rays, so plenty going on). 

 

When you copy a new file to a share that will go to the cache if that share is set to use the cache - "Cache = Yes" in the share settings.  (Do not confuse that with "Cache = Prefer" which will try to keep files on the cache.)  The Mover will then move the file to the array according to its scheduler settings and parity will be updated as the move operation progresses, or you can run the Mover manually of course.  If you are applying updates to an existing file then the updates will be applied directly to the disk in the array that holds the file and parity will be updated at the same time as the file.   

 

I would double check that none of your plugins or dockers, etc are specifying /mnt/diskx/sharename instead of /mnt/user/sharename - In particular it is very important never to move or copy between one path structure and the other. 

Link to comment
4 minutes ago, S80_UK said:

Again, I have never seen this, and I use the standard Mover almost daily (today I am ripping Christmas present CDs and Blu-rays, so plenty going on). 

 

When you copy a new file to a share that will go to the cache if that share is set to use the cache - "Cache = Yes" in the share settings.  (Do not confuse that with "Cache = Prefer" which will try to keep files on the cache.)  The Mover will then move the file to the array according to its scheduler settings and parity will be updated as the move operation progresses, or you can run the Mover manually of course.  If you are applying updates to an existing file then the updates will be applied directly to the disk in the array that holds the file and parity will be updated at the same time as the file.   

 

I would double check that none of your plugins or dockers, etc are specifying /mnt/diskx/sharename instead of /mnt/user/sharename - In particular it is very important never to move or copy between one path structure and the other. 

 

Thank you - this is very helpful. I will investigate further.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.