Jump to content

Joe L.

Members
  • Posts

    19,010
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by Joe L.

  1. I think you'll be fine. It appears as if your disks did not respond at all at that point in the post-read-verify, possibly due to an intermittent cable. The discrete command you issued did work as expected. It showed the set of blocks as zero. You can always re-run just the post-read verify process to confirm, but I suspect your cabling issue to be the root cause. preclear_disk.sh -V /dev/sdX Joe L.
  2. Linux ... feh!! ... FYI, I first used, and did kernel development on, Unix before it even had dd (v4, 1973). Were you one of those who used "adb -w" on the kernel while loaded in memory? I fully respect anyone with that skill. The description of the count= option has not changed in its entire 38+ year lifetime (except in a negligibly semantic sense): May 1974: copy only n input records August 2012: copy only N input blocks The fact that it is a copy precludes any concern about buffering. It isn't really keeping it (in an active sense); it has just not yet overwritten it with anything else. True. I understand the concepts involved. My early involvement with computers was roughly about the same time as you, but I was fixing them, at the hardware level, and running hand-coded machine code routines to test them. There was no "motherboard" back then on the TSPS system I was working... It was all DTL logic. (ICs were not yet in common use) My involvement with UNIX did not begin until 1979/1980. It was a version of PWB-Unix... and prior to the Borne shell. (Its "Mashey" shell actually had labels and "goto") Years later (late 80s) I also was involved in writing custom kernel level "device driver" code in SVR3 UNIX (on a 3B2) for a very special interface to hardware manufactured by a supplier of "smart-phones" back when I was working on project for AT&T. Their customer at that time had specified the hardware, and I needed to communicate with it, and it was not possible through any existing interface. I wrote the device driver. It was years later that I first ran Linux, and re-wrote the scsi/sound-card driver on it to work on my hardware. (So I could play "doom" ) I'll agree with you there. (on both.. the Politicians , and the kernel developers) I have a feeling, like you, that one of the cache systems is not configured to be able to handle a large traversal of files using "find" at the same time performing a "dd" of zeros to an entire disk. Since the conditions needed to experience the issue are rare, nobody in the kernel-dev team have fixed it (if they even know of it) The issue seems to be related to "low memory" exhaustion. (a concept we never had to worry about on true UNIX with a single linear address space, and swap space available if memory was insufficient.) It seems to not occur if smaller block-sizes are used with the "dd" command. I never encounter "low memory" issues, since all I have on both of my servers is "low" memory. 512 Meg on one, 4gig on the other) As you said, the "dd" operation does not involve anything but the disk-buffer cache, but running it concurrently with a "find" of a large hierarchy certainly does involve the dentry and inode cache systems. I'm sure the user-share file-system is complicating the issue, it being entirely in memory. In an ideal world, we would not need to deal with any of this. Linux has its quirks when allocating memory for cache and processes, even more in SMP environments. I suspect it only shows on some hardware/drivers. As Linux/unRAID changes, we'll just have to adapt to the environment. 5 years ago I never had to think about concurrently writing using large buffer sizes to multiple 3TB disks. I guess I will eventually just change the preclear script to use a fixed buffer size, and hope it will work in all situations. Joe L. PS. Great to see another old-time-Unix-geek on here.
  3. Granted. I was younger and far more innocent when I wrote the preclear utility. FYI, I first used "dd" many many years before Linux was created. It has not changed much over the years... ( It was on version 1.0 of CB-Unix. (barely out of Bell-Labs)) I knew it issued reads to the OS sized at the block size. Never gave much thought to the "count" and its buffering prior to output. I'll have to look at linux source to see what it does these days. Regardless of what "dd" is doing, the disk buffer cache will be keeping much of what it had recently accessed simple because it was most recently accessed. In the same way, cache_dirs is just using the "find" command, and it will force a lot of the disk buffer cache to be involved if you have a deep/extensive directory hierarchy. Between them, you can run out of "low" memory. Joe L.
  4. Cylinder? As a design criterion (such as you allude), the cylinder has been obsolete for ~20 years, even moreso in the last 5-8 years. Just change your default to something that "feels" right. And, totally forget about "disk geometry"--all that does for you is create a chunk size that is NOT a multiple of 4K. --UhClem "The times they are a'changing." I fully understand that "cylinders" have not been used for 20 years (probably much more) The issue I was faced when originally writing the preclear script was in selecting an appropriate "block size" when reading and writing to the disks. I used the output of the "fdisk" command as my guide. I figured the disk geometry would probably report a size it could handle. "fdisk" presented a line like this (from a sample IDE disk on one of my servers): Units = cylinders of 16065 * 512 = 8225280 bytes The preclear script then read, by default, 200 "units" of data at a time with a "dd" command looking something like this (for that disk) dd if=/dev/sdX bs=8225280 count=200 ......... The amount of memory used for a single read request was then 8225280 * 200 = 1,645,056,000 bytes. Now, with larger 3TB disks, and a much larger different "Unit" you can easily run out of memory. The use of Units worked for many years, with disk sizes from 6 Gig upwards. It is only now with 3TB drives are the sizes out-growing the available RAM. I agree, there needs to be a limit, but a multiple of 4k makes no practical difference at all when you are asking for 8225280 bytes or more at a time. In the interim, use the "-r" and "-w" options as I previously indicated, and you'll probably not run out of memory. Joe L.
  5. basically, you ran out of free memory. Apparently, on the 3TB drives you will need to use the -r, -w, and -b options to limit the memory used by the preclear process. This is because those parameters were originally sized when 1 TB drives were common. At that time the preclear script was designed to read and write a cylinder at a time. With larger disks, their geometry has gotten large enough that it may not leave enough memory for other processes. Try something like this: preclear_disk.sh -w 65536 -r 65536 -b 200 /dev/sdX
  6. Thanks Joe! Also, I suppose there's no difference if I do one cycle at a time and repeat three times, rather than 3 cycles in one go? three at once is quicker, since the post-read phase of the first acts as the pre-read phase of the second, and the post-read phase of the second acts as the pre-read phase of the third. You save two "read" phases, so probably 6 to 8 hours or more with large disks is saved.
  7. No, for those parameters the manufacturer simply has set the failure threshold high. (only a few failures to spin up in a timely manner would cause the drive to fail SMART tests) The "normalized" values are still at their starting values from the factory and the raw value has no errors. Some disks are faster, others slower... (same for disk controller ports)
  8. ALL drives have raw read errors. Every one... some report them, some just retry. You can ignore the "raw" read-errors parameter UNLESS the normalized value reaches the affiliated failure threshold.
  9. I have no idea why you are including the "diff" for our analysis. There is a report that summarizes the effect of the preclear. It shows how many sectors were pending re-allocation and already re-allocated both before and after the preclear. It also shows those attributes that are failing, or near their failing threshold. Basically, if it does not show re-allocated sectors, and no sectors are pending re-allocation,and there are no attributes failing_now, and the post-read was successful, the disk should be fine. The reports from the preclear are in /boot/preclear_reports. You can look there to see the results. (not in the syslog) Joe L.
  10. That would occur if the disks are not yet mounted while their transactions in their journals are replayed. The disks should appear online once that step is complete. With large disks and lots of file activity it could take 30 minutes or more. (of course, during that time, the parity check is also occurring, although that would not prevent file access.)
  11. Basically, you are looking for no sectors pending re-allocation at the end of the process, and few, if any sectors that have been re-allocated. Also, you are looking for no other parameters where the normalized value has reached its affiliated failure threshold. (Those would be showing FAILING_NOW on their report line) no
  12. I'd say the disk is bad... I think 9168 is the highest number of re-allocated sectors I've ever seen on a disk without it being marked as failed. I'd strongly advise not using it in the unRAID array. 5 Reallocated_Sector_Ct 0x0033 087 087 036 Pre-fail Always - 9168
  13. UNC media errors are sectors where the data on the sector does not match the checksum at the end of that sector. It might be a defective sector, or it might be written poorly. Either way, let the process complete. It will re-allocate the sector if it needs to when it gets to the writing phase, or, re-write it in place (it tries that first) in an attempt to not re-allocate the sector. Notice there were 39 sectors already re-allocated prior to the start of the process. Can't tell you about the time counters... Joe L.
  14. No, there is no guarantee the manufacturer does what is needed. The risk is too high. So rather than risk someone's data, I have elected to not not add that option.
  15. If you were using a physical timer/clock I would put it BETWEEN the wall outlet and the UPS. Then, set up unRAID to cleanly stop itself when power is lost while powered by the UPS (set it to power down after 15 seconds or so) Set it to power down the UPS too, so its batteries are not exhausted. Then, when timer is set to provide power back to the UPS, it will power up the server. That will work as long as you are able to cleanly stop all processes using the disks when powering down. Joe L.
  16. It depends on your BIOS. Some BIOS have the ability turn on at a certain time, others do not. It entirely depends on your motherboard. As you said, you can set up a cron task to stop the array and power down. Joe L.
  17. If it isunRAID you are referring to as your NAS, type ethtool eth0
  18. Looks fine. It had one sector which had been re-allocated before the start of the pre-clear, and no additional sectors were identified by the SMART firmware during the pre-clear. The drive should be fine. You are looking for sectors pending re-allocation and re-allocated sectors. (and changes in those parameters during the process) Most disks have anywhere from several hundred to several thousand spare sectors that can be used by the disks's SMART firmware to re-allocate un-readable sectors. Your disk is doing fine. Other than that, you are looking for ANY "normalized VALUE" parameter that has reached or fallen below the affiliated failure THRESHOLD. Joe L.
  19. Looks fine. It had three sectors which had been re-allocated before the start of the pre-clear, and no additional sectors were identified by the SMART firmware during the pre-clear. The drive should be fine.
  20. A. it writes slower than it reads ? ? ? B. You are doing something else that is keeping the same disk controller busy ? ? ? C. Slow electrons ? ? ? D. Otherwise, no... you did NOT provide a syslog for analysis. (so all we can do is guess) did you look there for any signs of errors?
  21. I did consider cases like this when I added that "-d type" option. You are one of the first to report they were able to use it on their hardware. Very nice disk controller card... (very expensive too, but I assume it was something you already had) I'm guessing your hardware is far more than a usual desktop machine. (How much RAM do you have?) Yes,you need a really good single 12 volt rail, high capacity power supply once you get up there in disks. Lookat the report it creates when it ends. (copies of the reports will be in /boot/preclear_reports) Your report shows one attribute that looks odd. 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 285 For many disks, this attribute represents emergency retracts of the disk heads when power is lost. It seems high unless the disk just had power cut off rather than having its heads parked in an orderly shutdown in previous usage. /dev/sdb and /dev/sdg will never be the same disk. You can type: ls -l /dev/disk/by-id/* to see a listing of all your disks and disk partitions by model and serial number. The preclear script was written to not allow you to clear a disk that is assigned to the array, or mounted and in use. Have fun, Joe L.
  22. Many of the folders and files will not have their original name, or even their original "extension" You basically go through all the files and folders, using hints such as their size and folder to identify them. Try re-naming them with the correct extension and opening them. Using different programs to attempt to open them might help to identify their original names. Some files might never be able to be recovered (if you've overwritten them) and you might find that others were on a different disk (you'll need to use the -S option on /dev/md2, etc if that is the case). When you've identified all you can, or want, then you can delete the lost+found folder.
×
×
  • Create New...