JonathanM Posted March 7, 2016 Share Posted March 7, 2016 Is it possible the size being compared is the actual size on disk? If so, a sparse file will be incorrectly reported as failing the size comparison, when the content of the file if read out or checksummed is identical. Possible. The size is being obtained using a command of the form du -s filename so you could see if they are reported the same on your system? If that is the case then an alternative command could be used. Trydu -sb filename Quote Link to comment
itimpi Posted March 7, 2016 Author Share Posted March 7, 2016 Is it possible the size being compared is the actual size on disk? If so, a sparse file will be incorrectly reported as failing the size comparison, when the content of the file if read out or checksummed is identical. Possible. The size is being obtained using a command of the form du -s filename so you could see if they are reported the same on your system? If that is the case then an alternative command could be used. Trydu -sb filename If you change the script on your system (it should be easy enough to find the du -s command in the script) does it fix the issue on your system? As I currently do not have an example of it going wrong then it is a bit harder to test here. Actually looking at the du options I am not sure why I used the -s option - it looks as if simply using the -b option instead would work? I find it intriguing that the script has been available for over a year and this is the first time this issue has come up. It is just shows how difficult it can be to allow for all the edge cases. Quote Link to comment
rob Posted March 8, 2016 Share Posted March 8, 2016 du -b That seems to solve the issue I was having with this script. Thank you! Quote Link to comment
itimpi Posted March 8, 2016 Author Share Posted March 8, 2016 du -b That seems to solve the issue I was having with this script. Thank you! Thanks for confirming that works. I will do a bit of testing at my end and assuming nothing shows up I will upload an updated version of the script to the first post in this thread. Quote Link to comment
spall Posted May 30, 2016 Share Posted May 30, 2016 First, thanks for this script I found this because I stumbled upon an empty directory in one of my shares and was looking for something to locate those. The good news: I found about 250GB of duplicate files I had no idea existed. So again, thank you. The bad news: I cannot for the life of me get the -f/-F args to produce anything. I created a duplicate folder on the chance that the one I stumbled on was the only one, but it is not being reported by the script. Any help appreciated, Thanks! Quote Link to comment
mgladwin Posted January 17, 2017 Share Posted January 17, 2017 Sorry for reviving an old thread but just tried this out today and realised the script doesn't like apostrophes in file names. Not sure if it's worth fixing or not? ls: cannot access '/mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example': No such file or directory The full directory is /mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example's/ Quote Link to comment
Squid Posted January 17, 2017 Share Posted January 17, 2017 Sorry for reviving an old thread but just tried this out today and realised the script doesn't like apostrophes in file names. Not sure if it's worth fixing or not? ls: cannot access '/mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example': No such file or directory The full directory is /mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example's/ Not sure if the script is still being maintained but fix common problems plugin on the extended tests also checks for dupes Sent from my LG-D852 using Tapatalk Quote Link to comment
mgladwin Posted January 17, 2017 Share Posted January 17, 2017 Yeah I guess it's not. I have fix common problems but my issue was duplicate empty directories on random disks which this script picked up very well and I manually deleted. I had no duplicate files. Sent from my SM-G930F using Tapatalk Quote Link to comment
itimpi Posted January 17, 2017 Author Share Posted January 17, 2017 Sorry for reviving an old thread but just tried this out today and realised the script doesn't like apostrophes in file names. Not sure if it's worth fixing or not? ls: cannot access '/mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example': No such file or directory The full directory is /mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example's/ i'll look to see if this can be fixed simply and if so I will apply an update. However if it is going to be hard to do without major changes to the script I probably will not bother. Quote Link to comment
mgladwin Posted January 17, 2017 Share Posted January 17, 2017 Sorry for reviving an old thread but just tried this out today and realised the script doesn't like apostrophes in file names. Not sure if it's worth fixing or not? ls: cannot access '/mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example': No such file or directory The full directory is /mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example's/ i'll look to see if this can be fixed simply and if so I will apply an update. However if it is going to be hard to do without major changes to the script I probably will not bother. No worries. I didn't expect anything I just thought it was worth mentioning in case others were having issues now or in the future. Thanks itimpi and great script! Sent from my SM-G930F using Tapatalk Quote Link to comment
itimpi Posted January 19, 2017 Author Share Posted January 19, 2017 Sorry for reviving an old thread but just tried this out today and realised the script doesn't like apostrophes in file names. Not sure if it's worth fixing or not? ls: cannot access '/mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example': No such file or directory The full directory is /mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example's/ i'll look to see if this can be fixed simply and if so I will apply an update. However if it is going to be hard to do without major changes to the script I probably will not bother. No worries. I didn't expect anything I just thought it was worth mentioning in case others were having issues now or in the future. Thanks itimpi and great script! Sent from my SM-G930F using Tapatalk I've worked out what line the message comes from. It appears to be a quirk of the way bash handles wildcard expansion. I will have to look up my bash special character handling to see if I can see a way of avoiding the issue. Quote Link to comment
mgladwin Posted January 19, 2017 Share Posted January 19, 2017 Sorry for reviving an old thread but just tried this out today and realised the script doesn't like apostrophes in file names. Not sure if it's worth fixing or not? ls: cannot access '/mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example': No such file or directory The full directory is /mnt/disk*/STORAGE/laptop backup/Dropbox/EMPLOYEES/example's/ i'll look to see if this can be fixed simply and if so I will apply an update. However if it is going to be hard to do without major changes to the script I probably will not bother. No worries. I didn't expect anything I just thought it was worth mentioning in case others were having issues now or in the future. Thanks itimpi and great script! Sent from my SM-G930F using Tapatalk I've worked out what line the message comes from. It appears to be a quirk of the way bash handles wildcard expansion. I will have to look up my bash special character handling to see if I can see a way of avoiding the issue. Thanks itimpi. Please don't waste your time if it's to much work though. It's only my OCD that makes me want a clean error free log file. Cheers Sent from my SM-G930F using Tapatalk Quote Link to comment
jmello Posted September 19, 2017 Share Posted September 19, 2017 Does anyone know a good way to automatically delete the duplicate files that this script finds? I had a 6TB drive become unmountable, recovered over 5TB of files from the emulated drive, then realized I hadn't tried repairing the filesystem, and got my drive back, so now I've got 5+TB of duplicate files, spread all across my array. Quote Link to comment
mbc0 Posted October 9, 2017 Share Posted October 9, 2017 when trying to run this script I get permission denied? I am logged on as root? Quote Link to comment
itimpi Posted October 10, 2017 Author Share Posted October 10, 2017 (edited) Have you given the script execute permission? If you downloaded it to the flash drive this should be automatic (because it is FAT32 format) but will not be the case if put elsewhere. Alternatively run it using the ‘sh’ command which does not require the script to have ‘execute’ permission. Edited October 10, 2017 by itimpi 1 Quote Link to comment
mbc0 Posted October 10, 2017 Share Posted October 10, 2017 Many thanks, that worked :-) Quote Link to comment
Joseph Posted January 21, 2018 Share Posted January 21, 2018 On 9/16/2014 at 3:03 PM, itimpi said: I had thought to add an option to automatically delete duplicates, but certainly did not want it in an initial iteration of the tool. +1 to automatically remove duplicates. Hi, I just discovered your tool and it turns out I have numerous duplicates because one of my disks went off line and emby rebuilt the metadata elsewhere. Because its metadata the filenames match, but they may not match at the binary level. I know which disk contains the duplicate data so it would be great if you could add a switch to your script to specify which disk to remove the duplicate if/when you ever add a delete option to it. Thanks. Quote Link to comment
JustinChase Posted January 21, 2018 Share Posted January 21, 2018 (edited) ha, funnily enough, I'm working on finding duplicates right now. I noticed in the file integrity plugin, there's an option to do so, and I didn't realize how many of my files have gotten duplicated (not sure how). I checked community applications for "duplicate" and found dupeguru docker, and it just finished installing. I'm running it for the first time right now, but so far it doesn't seem to be doing much/anything. I'd prefer if this tool would do it for me instead; less to install/maintain. **it only looks for duplicate files in the shares, not duplicates on multiple disks, so not a solution anyway Edited January 21, 2018 by JustinChase add info Quote Link to comment
itimpi Posted January 21, 2018 Author Share Posted January 21, 2018 19 minutes ago, JustinChase said: **it only looks for duplicate files in the shares, not duplicates on multiple disks, so not a solution anyway Not quite sure what you mean by this? It looks for duplicates with the same relative path (I.e. the same name) that exist on more than one disk. This can happen if you have been copying files between disks on unRAID and have not deleted the source files. That can be a bit confusing on unRAID as when you look at the User Share Level you will only see one instance. It does NOT look for files with different paths/names that have the same contents if that is what you are looking for. Quote Link to comment
itimpi Posted January 21, 2018 Author Share Posted January 21, 2018 36 minutes ago, Joseph said: +1 to automatically remove duplicates. Hi, I just discovered your tool and it turns out I have numerous duplicates because one of my disks went off line and emby rebuilt the metadata elsewhere. Because its metadata the filenames match, but they may not match at the binary level. I know which disk contains the duplicate data so it would be great if you could add a switch to your script to specify which disk to remove the duplicate if/when you ever add a delete option to it. Thanks. I am afraid that I am unlikely to add a delete switch as that is too dangerous and could easily result in data loss if you are not very careful. I thought about it when developing the script but then decided against it as I could easily delete the wrong copy if the files are not actually identical at the binary level. The easiest way to handle this would be to output the results to a file and then edit them to make it into a shell script with ‘rm’ commands for the files that are not wanted. Depending on how your duplicates are located you might be able to use a ‘rm -r’ type command on containing directories. Quote Link to comment
JustinChase Posted January 21, 2018 Share Posted January 21, 2018 11 minutes ago, itimpi said: Not quite sure what you mean by this? It looks for duplicates with the same relative path (I.e. the same name) that exist on more than one disk. This can happen if you have been copying files between disks on unRAID and have not deleted the source files. That can be a bit confusing on unRAID as when you look at the User Share Level you will only see one instance. It does NOT look for files with different paths/names that have the same contents if that is what you are looking for. It looks like it only showed duplicate files with different paths/names, and has no reference to the disk they are on. the file integrity plugin shows dups on different disks, like 2nd screenshot. This is what I was looking for. It just seems this tool doesn't update in real-time, as some of the dups it reported have been corrected, but it's still showing them. Oh well. Quote Link to comment
Squid Posted January 21, 2018 Share Posted January 21, 2018 44 minutes ago, JustinChase said: **it only looks for duplicate files in the shares, not duplicates on multiple disks, so not a solution anyway FYI, FCP running the extended tests looks for duplicated files. (same file name existing on multiple disks in the same folder). Quote Link to comment
itimpi Posted January 21, 2018 Author Share Posted January 21, 2018 12 minutes ago, JustinChase said: t looks like it only showed duplicate files with different paths/names, and has no reference to the disk they are on. the file integrity plugin shows dups on different disks, like 2nd screenshot. This is what I was looking for. It just seems this tool doesn't update in real-time, as some of the dups it reported have been corrected, but it's still showing Not sure what makes you think this - it definitely shows which disks duplicated files are on! I developed it and used it to identify where I had duplicate files on my own disks. it was developed a long time ago (on unRAID v5 although it also works on v6) before plugins like the File Integrity plugin, so feel free to use that if you prefer. Quote Link to comment
JustinChase Posted January 21, 2018 Share Posted January 21, 2018 24 minutes ago, itimpi said: Not sure what makes you think this - it definitely shows which disks duplicated files are on! I developed it and used it to identify where I had duplicate files on my own disks. it was developed a long time ago (on unRAID v5 although it also works on v6) before plugins like the File Integrity plugin, so feel free to use that if you prefer. My bad, I didn't realize which thread this was in, I thought I was in a different thread. My mistake, sorry. Quote Link to comment
Joseph Posted January 22, 2018 Share Posted January 22, 2018 2 hours ago, itimpi said: I am afraid that I am unlikely to add a delete switch as that is too dangerous and could easily result in data loss if you are not very careful. I thought about it when developing the script but then decided against it as I could easily delete the wrong copy if the files are not actually identical at the binary level. The easiest way to handle this would be to output the results to a file and then edit them to make it into a shell script with ‘rm’ commands for the files that are not wanted. Depending on how your duplicates are located you might be able to use a ‘rm -r’ type command on containing directories. not sure how to create a shell script, but I'll look around to figure it out and try that...thanks for the tip! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.