Jump to content

itimpi

Moderators
  • Posts

    19,779
  • Joined

  • Last visited

  • Days Won

    54

Everything posted by itimpi

  1. It is possible in unRAID to end up with files with the same name on more than one array disk. In particular this could happen when moving files around the system between disk shares (typically at the command line level) if you make a copy and forget to delete the source. This can be an issue on an unRAID system as if you are using unRAID user shares (as I expect most unRAID users would be) as unRAID only shows the first occurrence in such a case so it may not be obvious that you have duplicate files on the system. As well s potentially wasting space this can lead to unexpected behaviour such as deleting a file on a user share and finding it appears to still be there (because unRAID is now showing you the other copy that was previously hidden). Please note that we are talking about files with the same name that are present in more than one location and are thus wasting space. This utility does not try and detect files with different names that have the same content. If you want to try and detect such files then Joe L (of pre-clear and cache-dirs fame) has developed a script that will do this as described in this post). It is possible to see that such duplicate filenames exist by browsing via the GUI, but this has to be done on a case-by-case basis and there is no easy way to get a consolidated list of all duplicates. To get around this I created the attached script for my own use that is reasonably comprehensive and that others may find useful. The script runs very quickly (as it is working purely off directory information) so it is not much of chore to run it at regular intervals as part of your system housekeeping. LimeTech have on the Roadmap an item to include duplicate checking utility as a standard feature at some point. I thought that this script might be a useful stopgap (NOTE: I am more than happy if Limetech want to include this script (or a derivative) in any standard unRAID release). I modelled this on the style of script that is used in cache-dirs. I hope that Joe L. does not mind that I borrowed some of the coding techniques that he used. The following shows the usage information built into the script. Hopefully it is enough to get anyone interested started successfully. I would recommend that you try the -v (verbose) option at least initially. NOTE:. If using this script with Unraid 6.8.0 or later then due to a tightening of the security on the flash drive you need to precede the script with the ‘bash’ command:. E.g. bash ./unRAIDFindDuplicates.sh -v Usage: ./unRAIDFindDuplicates.sh [-v] ]-q] [-b ] [-c] -d exclude_disk] [-o filename] [-i dirname] [-e dirname] [-f/F] [z|Z] -b = If duplicate names found, do a binary compare of the files as well If omitted, then only a compare on file name is done NOTE. Using this option slows things down A LOT as it needs to read every byte of files whose names match to compare them -c = Ignore case when checking names. This can help with the fact that linux is case sensitive on filenames, whereas Windows, and by implication Samba, is case independent. This can lead to unexpected filename collisions. -d exclude_disk (may be repeated as many times as desired) The default behavior is to include all disks in the checks. Use this to exclude a specific disk from being included in the checks -D path Treat the given path as if it was an array disk (may be repeated as many times as necessary). Can be useful to test if files on an extra disk already exist in the array. -e exclude_dir (may be repeated as many times as desired) Use this to exclude a particular share/directory from being included in the checks -f List any empty folders (directories) that are duplicates of non-empty folder on another disk. These can be left behind when you remove duplicate files, but notg their containing folder. However empty folders are also possible in normal operation so finding these is not necessarily an issue. -F List any empty folders even if they are not duplicated on another drive. This may be perfectly valid but at least this helps you decide if this is so. -i include_dir (may be repeated as many times as desired) Use this to include a particular share/directory to be included in the checks If omitted, then all top level folders on each disk (except for those specifically excluded via the -e option(s)) will be included in the checks -o filename Specify the output filename to which a copy of the results should be sent. If omtted then the results are sent to the file duplicates.txt on the root of the flash drive e.g. /boot/duplicates.txt from linux -q = Quiet mode. No console output while running. Need to see results file for output. -v = verbose mode. Additional Details are produced as progress proceeds -V = print program version -x = Only report file mismatches on time/size (default) or content (if -b also used) Does not simply report the fact that there is a duplicate if they appear identical. -X path = Check the array against the given disk and report if files on the array are either missing or appear to be different size. Use the -b option as well if you want the file contents checked as well. Useful for checking whether you have files on a backup disk that are not also on the main array. It is assumed that the path specified conatins files in the same folder structure as is present on the array. z Report zero length files that are also duplicates. These are not necessarily wrong, but could be a remnant of some previous issue or failed copy Z Report xero length files even when they are not duplicates. EXAMPLES: To check all shares on all disks except disk 9 ./unRAIDFindDuplicates.sh -d 9 To check just the TV share ./unRAIDFindDuplicates.sh -i TV To check all shares except the Archives share ./unRAIDFindDuplicates.sh -e Archives TIP: This program runs much faster if all drives are spun up first Note: This script still works unchanged on the newest Unraid releases such as the 6.12.x releases. # CHANGE HISTORY # ~~~~~~~~~~~~~~ # Version 1.0 09 Sep 2014 First Version # Version 1.1 10 Sep 2014 Got the -b option working to identify files where # the names are the same but the contents differ. # Added -q option to suppress all console output while running # Added warning if file sizes differ # Version 1.2 13 Sep 2014 Added the -D option to check extra disk # Version 1.3 01 Oct 2014 Added -f and -F options to list empty (duplicated) directories. # Added -z and -Z options to list zero length (duplicated) files. # Fix: Allow for shares that have spaces in their names. # Version 1.4 07 Oct 2014 Fix: Use process ID in /tmp filenames to allow multiple copies of # the script to be run in parallel without interfering with each other. # Version 1.5 07 Mar 2016 Fix: Incorrect reporting of file size mismatches when sparse # files involved. If you find any issues with the script or have suggestions for improvement please let me know and I will see if I can incorporate the feedback. unRAIDFindDuplicates_v1.5.zip
  2. After reading this post and having a moment of lucidity I think it's dawned on me how this can happen If new files are being written to cached share, and those files are beyond the 'slit level' in the directory hierarchy, code that prevents split kicks in and forces new files to stay on device where parent exists, in this case the cache. The free space is not taken into consideration in this case - but with the cache disk it should be - that's the bug. Sound like this is your scenario? And I was just about to release -beta8.... Sounds like that could be it. Get b8 out and we can look at this for b9. Nice one. Looks as though the fix was easy once the cause was identified as it appears to have made beta 8.
  3. I believe that there is a bug in the current beta 7 GUI code to do with recognizing suffixes when trying to set the min free space value. Not sure if that is relevant if the value is already set (or you set it manually via a text editor).
  4. As long as you have Min Free Space set to be more than the largest file you want to copy to the share, then unRAID already handles this. Once the free space falls below the Min free Space value unRAID stats writing directly to the array data drives.
  5. that is a nice set of icons!. They combine both shape and color indications of what they are which is a good thing. As important, I see they are free for commercial use.
  6. RobJ - great job! Great info so far, and once complete this will be an outstanding resource!!! I will be adding a link from myMain. I agree. I am hoping that the descriptions for reallocated sectors and pending sectors get added ASAP as these are the ones that are probably of most importance to unRAID users. I would think that some sort of colour coding to indicate ones that are of particular interest would help.
  7. Sounds correct! Assuming that you are running the pre_clear script from /boot, that is where the reports will be placed on completion. A suggestion is to run preclear_disk.sh -l before doing the actual preclear to get the list of devices that are available. Helps avoid any accidents
  8. Never seen (or heard) of anything like this. cache_dirs is a read only process so never creates files/folders. Are you running any other logins? Is so it is more likely that one of these is the culprit.
  9. You do not want to use a disk in unRAID that is showing any Pending sectors. The good sign is that the number went down during the pre-clear without forcing any re-allocation. More worrying is why were they not all cleared! I would suggest you put it through another pre_clear cycle to see what happens. If you cannot clear the Pending Sectors then you need to consider a RMA.
  10. There is no supported cache_dirs plugin available at the moment. The one that was available at one point was withdrawn at the request of Joe L. (the author of cache-dirs.).
  11. Yes. There is nothing on a pre-cleared disk that ties it to a particular system. I have heard of people having a separate system specifically targeted at allowing them to run pre-clears on disks without disturbing the system running the production servers.
  12. That will almost certainly be a permissions issue! The easiest thing way to rectify the permissions is to log in via telnet and then run the 'newperms' command providing the path as a parameter.
  13. Have you tried booting in safe mode to ensure no plugins are being loaded.
  14. The error message you are getting suggests something has overwritten the libc library. That is why I am suggesting you should try with no plugins loaded.
  15. I would comment out the line that tries to load from /boot/packages. I would guess that you are loading a package that is not 64-bit compatible and is messing up the system.
  16. I wonder if it is something simpler - such as the fact that 64-bit systems do not have constraint on low memory, so can keep more entries cached in memory due to Linux not being forced to push entries out of the cache. Does anyone have any idea on how one might investigate this?
  17. yes i did see that suggestion earlier in this thread and had done that in the script before adding to go file and rebooting, still caused nasty OOM and crash. i take it you are running it now?, if so what flags are you using and are you running this via go file or manually starting after unraid has booted?. I start mine from the go file using a command line of the form
  18. I found that I had to comment out the ulimit line to get the cache_dirs to run reliably on the 64-bit environment.
  19. There are options to that can be passed to the pre-clear script to just run parts of the pre-clear proess if you know roughly where it got to that can be used to save time. However running from the tart will not do any damage so it might be the easiest thing to do.
  20. The /dev/sdd device is looking problematical as the number of 'sectors pending reallocated' seems to continually be going up. You do not want to use in unRAID any drive that does not finish the pre-clear with 0 pending sectors. The only anomaly is that no sectors are actually being re-allocated, so it always possible that there is an external factor causing the pending sectors such as bad cabling at the power/SATA level. The large number of sectors pending reallocation is a good enough reason to RMA the drive The other two drives look fine. The key attributes relating to reallocated sectors are all 0 which is what you want.
  21. The pre-clear tries to put the drive through the same sort of load as is involved in parity rebuilds and/or normal use. If at the end of that there are no signs of any problems you have reasonable confidence that this point the drive is showing no problems. That is actually better than you would have for new drives - a significant proportion of those fail when put through their irst stress test via pre-clear. I actually have a caddy I can plug in externally via eSATA or USB on demand to do this. You could also use another system as there is no requirement that the pre-clear run on the system where the drive is to be used.
  22. I handle this by having a spare drive that has been previously put through a thorough pre-clear to check it out. Using this disk I go through the process of rebuilding the failed drive onto this spare as its replacement. If the rebuild fails for any reason I still have the 'red-balled' disk untouched to attempt data recovery. If the rebuild works I then put the disk that had 'red-balled' through a thorough pre-clear test. I use the results of this to decide if the disk is OK or whether the drive really needs replacing. If the drive appears OK it becomes my new spare disk.
  23. The speed will vary depending on where on the disks the heads are positioned. Speeds will be fastest at the outer edge and progressively slow down as you move inwards. My rule of thumb is about 10 hours per TB for modern drives which mean pre-clearing a 4TB drive is about 40 hours. This can vary depending on your system specs, in particular how the disks are connected as controller throughput is an important factor
  24. I think your statement is too broad! I think then the issue is not that a parity check is being started, but that it is a correcting parity check which can result in writes to the parity disk. I an unclean shutdown is detected (whatever the reason) and the parity check was a non-correcting one then this would mean that most users would not notice anything much happening if the shutdown was caused by something like a power failure but their array would still be checked for integrity. What I do agree with is that a correcting parity check should not be auto-started outside user control. As has been mentioned this can lead to data loss under certain (albeit rare) circumstances. You also do not want an automatic parity check if any disk has been red-balled due to a write failure for the same reasons.
  25. If you run the script in a console/telnet session without any parameters it will list all the command line options that are available.
×
×
  • Create New...