Jump to content

itimpi

Moderators
  • Posts

    19,853
  • Joined

  • Last visited

  • Days Won

    54

Everything posted by itimpi

  1. Fair enough. At least once you have the output you can look at the scale of the problem and decide on an appropriate course of action. I had thought to add an option to automatically delete duplicates, but certainly did not want it in an initial iteration of the tool. Deleting data is dangerous as if something went wrong it would lead to data loss. If I do decide to add it, I will enforce the binary check on such duplicates before doing a delete, and also make the user confirm it at tool start up. However I am only going to add such an option if users think it would be useful enough and safe enough to do so. At the moment I am considering that having the list of files is a good start.
  2. Thanks for pointing that out. Appears to be a missing space. However it just means that a -d option is not validated properly so has minimal side-effect. I must admit I was also surprised at first by how fast it ran if you have not used the -b option (which can REALLY slows things down) as it is working purely of directory information. If you have cache_dirs running this is probably mostly already cached in memory. I do not really expect the -b option would be used very often. I added it to help me check whether file corruption appeared to be happening in light of the bug in v6 beta 7/8. I thought about providing an option to exclude the cache drive. My initial thinking was that if mover is not running then it probably makes little difference. However since you mentioned it I will probably make including the cache drive optional. Good - it is doing its job. [quote[i stopped the scan and restarted rerouting the results to a file to analyse later, really a lot of dupes here.. It should by default write a 'duplicates.txt' file on the flash drive. You can provide your own filename instead using the -o option. As one of the first to use this in anger (besides myself) please continue to provide feedback. I am particularly interested in knowing whether the amount of detail output appears about right (both with and without the -v option). Tweaking that should be easy enough.
  3. Let me know what results you get. It helped me get rid of duplicates that had crept onto my own system.
  4. It is possible in unRAID to end up with files with the same name on more than one array disk. In particular this could happen when moving files around the system between disk shares (typically at the command line level) if you make a copy and forget to delete the source. This can be an issue on an unRAID system as if you are using unRAID user shares (as I expect most unRAID users would be) as unRAID only shows the first occurrence in such a case so it may not be obvious that you have duplicate files on the system. As well s potentially wasting space this can lead to unexpected behaviour such as deleting a file on a user share and finding it appears to still be there (because unRAID is now showing you the other copy that was previously hidden). Please note that we are talking about files with the same name that are present in more than one location and are thus wasting space. This utility does not try and detect files with different names that have the same content. If you want to try and detect such files then Joe L (of pre-clear and cache-dirs fame) has developed a script that will do this as described in this post). It is possible to see that such duplicate filenames exist by browsing via the GUI, but this has to be done on a case-by-case basis and there is no easy way to get a consolidated list of all duplicates. To get around this I created the attached script for my own use that is reasonably comprehensive and that others may find useful. The script runs very quickly (as it is working purely off directory information) so it is not much of chore to run it at regular intervals as part of your system housekeeping. LimeTech have on the Roadmap an item to include duplicate checking utility as a standard feature at some point. I thought that this script might be a useful stopgap (NOTE: I am more than happy if Limetech want to include this script (or a derivative) in any standard unRAID release). I modelled this on the style of script that is used in cache-dirs. I hope that Joe L. does not mind that I borrowed some of the coding techniques that he used. The following shows the usage information built into the script. Hopefully it is enough to get anyone interested started successfully. I would recommend that you try the -v (verbose) option at least initially. NOTE:. If using this script with Unraid 6.8.0 or later then due to a tightening of the security on the flash drive you need to precede the script with the ‘bash’ command:. E.g. bash ./unRAIDFindDuplicates.sh -v Usage: ./unRAIDFindDuplicates.sh [-v] ]-q] [-b ] [-c] -d exclude_disk] [-o filename] [-i dirname] [-e dirname] [-f/F] [z|Z] -b = If duplicate names found, do a binary compare of the files as well If omitted, then only a compare on file name is done NOTE. Using this option slows things down A LOT as it needs to read every byte of files whose names match to compare them -c = Ignore case when checking names. This can help with the fact that linux is case sensitive on filenames, whereas Windows, and by implication Samba, is case independent. This can lead to unexpected filename collisions. -d exclude_disk (may be repeated as many times as desired) The default behavior is to include all disks in the checks. Use this to exclude a specific disk from being included in the checks -D path Treat the given path as if it was an array disk (may be repeated as many times as necessary). Can be useful to test if files on an extra disk already exist in the array. -e exclude_dir (may be repeated as many times as desired) Use this to exclude a particular share/directory from being included in the checks -f List any empty folders (directories) that are duplicates of non-empty folder on another disk. These can be left behind when you remove duplicate files, but notg their containing folder. However empty folders are also possible in normal operation so finding these is not necessarily an issue. -F List any empty folders even if they are not duplicated on another drive. This may be perfectly valid but at least this helps you decide if this is so. -i include_dir (may be repeated as many times as desired) Use this to include a particular share/directory to be included in the checks If omitted, then all top level folders on each disk (except for those specifically excluded via the -e option(s)) will be included in the checks -o filename Specify the output filename to which a copy of the results should be sent. If omtted then the results are sent to the file duplicates.txt on the root of the flash drive e.g. /boot/duplicates.txt from linux -q = Quiet mode. No console output while running. Need to see results file for output. -v = verbose mode. Additional Details are produced as progress proceeds -V = print program version -x = Only report file mismatches on time/size (default) or content (if -b also used) Does not simply report the fact that there is a duplicate if they appear identical. -X path = Check the array against the given disk and report if files on the array are either missing or appear to be different size. Use the -b option as well if you want the file contents checked as well. Useful for checking whether you have files on a backup disk that are not also on the main array. It is assumed that the path specified conatins files in the same folder structure as is present on the array. z Report zero length files that are also duplicates. These are not necessarily wrong, but could be a remnant of some previous issue or failed copy Z Report xero length files even when they are not duplicates. EXAMPLES: To check all shares on all disks except disk 9 ./unRAIDFindDuplicates.sh -d 9 To check just the TV share ./unRAIDFindDuplicates.sh -i TV To check all shares except the Archives share ./unRAIDFindDuplicates.sh -e Archives TIP: This program runs much faster if all drives are spun up first Note: This script still works unchanged on the newest Unraid releases such as the 6.12.x releases. # CHANGE HISTORY # ~~~~~~~~~~~~~~ # Version 1.0 09 Sep 2014 First Version # Version 1.1 10 Sep 2014 Got the -b option working to identify files where # the names are the same but the contents differ. # Added -q option to suppress all console output while running # Added warning if file sizes differ # Version 1.2 13 Sep 2014 Added the -D option to check extra disk # Version 1.3 01 Oct 2014 Added -f and -F options to list empty (duplicated) directories. # Added -z and -Z options to list zero length (duplicated) files. # Fix: Allow for shares that have spaces in their names. # Version 1.4 07 Oct 2014 Fix: Use process ID in /tmp filenames to allow multiple copies of # the script to be run in parallel without interfering with each other. # Version 1.5 07 Mar 2016 Fix: Incorrect reporting of file size mismatches when sparse # files involved. If you find any issues with the script or have suggestions for improvement please let me know and I will see if I can incorporate the feedback. unRAIDFindDuplicates_v1.5.zip
  5. After reading this post and having a moment of lucidity I think it's dawned on me how this can happen If new files are being written to cached share, and those files are beyond the 'slit level' in the directory hierarchy, code that prevents split kicks in and forces new files to stay on device where parent exists, in this case the cache. The free space is not taken into consideration in this case - but with the cache disk it should be - that's the bug. Sound like this is your scenario? And I was just about to release -beta8.... Sounds like that could be it. Get b8 out and we can look at this for b9. Nice one. Looks as though the fix was easy once the cause was identified as it appears to have made beta 8.
  6. I believe that there is a bug in the current beta 7 GUI code to do with recognizing suffixes when trying to set the min free space value. Not sure if that is relevant if the value is already set (or you set it manually via a text editor).
  7. As long as you have Min Free Space set to be more than the largest file you want to copy to the share, then unRAID already handles this. Once the free space falls below the Min free Space value unRAID stats writing directly to the array data drives.
  8. that is a nice set of icons!. They combine both shape and color indications of what they are which is a good thing. As important, I see they are free for commercial use.
  9. RobJ - great job! Great info so far, and once complete this will be an outstanding resource!!! I will be adding a link from myMain. I agree. I am hoping that the descriptions for reallocated sectors and pending sectors get added ASAP as these are the ones that are probably of most importance to unRAID users. I would think that some sort of colour coding to indicate ones that are of particular interest would help.
  10. Sounds correct! Assuming that you are running the pre_clear script from /boot, that is where the reports will be placed on completion. A suggestion is to run preclear_disk.sh -l before doing the actual preclear to get the list of devices that are available. Helps avoid any accidents
  11. Never seen (or heard) of anything like this. cache_dirs is a read only process so never creates files/folders. Are you running any other logins? Is so it is more likely that one of these is the culprit.
  12. You do not want to use a disk in unRAID that is showing any Pending sectors. The good sign is that the number went down during the pre-clear without forcing any re-allocation. More worrying is why were they not all cleared! I would suggest you put it through another pre_clear cycle to see what happens. If you cannot clear the Pending Sectors then you need to consider a RMA.
  13. There is no supported cache_dirs plugin available at the moment. The one that was available at one point was withdrawn at the request of Joe L. (the author of cache-dirs.).
  14. Yes. There is nothing on a pre-cleared disk that ties it to a particular system. I have heard of people having a separate system specifically targeted at allowing them to run pre-clears on disks without disturbing the system running the production servers.
  15. That will almost certainly be a permissions issue! The easiest thing way to rectify the permissions is to log in via telnet and then run the 'newperms' command providing the path as a parameter.
  16. Have you tried booting in safe mode to ensure no plugins are being loaded.
  17. The error message you are getting suggests something has overwritten the libc library. That is why I am suggesting you should try with no plugins loaded.
  18. I would comment out the line that tries to load from /boot/packages. I would guess that you are loading a package that is not 64-bit compatible and is messing up the system.
  19. I wonder if it is something simpler - such as the fact that 64-bit systems do not have constraint on low memory, so can keep more entries cached in memory due to Linux not being forced to push entries out of the cache. Does anyone have any idea on how one might investigate this?
  20. yes i did see that suggestion earlier in this thread and had done that in the script before adding to go file and rebooting, still caused nasty OOM and crash. i take it you are running it now?, if so what flags are you using and are you running this via go file or manually starting after unraid has booted?. I start mine from the go file using a command line of the form
  21. I found that I had to comment out the ulimit line to get the cache_dirs to run reliably on the 64-bit environment.
  22. There are options to that can be passed to the pre-clear script to just run parts of the pre-clear proess if you know roughly where it got to that can be used to save time. However running from the tart will not do any damage so it might be the easiest thing to do.
  23. The /dev/sdd device is looking problematical as the number of 'sectors pending reallocated' seems to continually be going up. You do not want to use in unRAID any drive that does not finish the pre-clear with 0 pending sectors. The only anomaly is that no sectors are actually being re-allocated, so it always possible that there is an external factor causing the pending sectors such as bad cabling at the power/SATA level. The large number of sectors pending reallocation is a good enough reason to RMA the drive The other two drives look fine. The key attributes relating to reallocated sectors are all 0 which is what you want.
  24. The pre-clear tries to put the drive through the same sort of load as is involved in parity rebuilds and/or normal use. If at the end of that there are no signs of any problems you have reasonable confidence that this point the drive is showing no problems. That is actually better than you would have for new drives - a significant proportion of those fail when put through their irst stress test via pre-clear. I actually have a caddy I can plug in externally via eSATA or USB on demand to do this. You could also use another system as there is no requirement that the pre-clear run on the system where the drive is to be used.
  25. I handle this by having a spare drive that has been previously put through a thorough pre-clear to check it out. Using this disk I go through the process of rebuilding the failed drive onto this spare as its replacement. If the rebuild fails for any reason I still have the 'red-balled' disk untouched to attempt data recovery. If the rebuild works I then put the disk that had 'red-balled' through a thorough pre-clear test. I use the results of this to decide if the disk is OK or whether the drive really needs replacing. If the drive appears OK it becomes my new spare disk.
×
×
  • Create New...