November 22, 201213 yr Can someone tell me what this output on a failed preclear means: Disk /dev/sdu has NOT been precleared successfully skip=241800 count=200 bs=8225280 returned 27322 instead of 00000 skip=242000 count=200 bs=8225280 returned 20555 instead of 00000 skip=242200 count=200 bs=8225280 returned 11874 instead of 00000 skip=242400 count=200 bs=8225280 returned 36822 instead of 00000 I don't know what skip, count and bs mean. But, I think preclear was reading back the zero bytes and found four instances where they came back as non-zero. If so, does this mean four bad sectors that SMART should have re-mapped and ignored? Did SMART not do its work or is preclear warning me well in advance that even with SMART re-mapping this is going to be an unreliable drive that ought not be used? Thanks!
November 22, 201213 yr No, worse than that. It does mean that the "dd" command attempted to read a block of the disk and instead of all zeros, something else returned. It has nothing to do with re-mapping, but a disk returning values different than what you wrote to it. It could be the "write" did not work, or it could be the read returns garbage at times. The issue could be the disk, the disk controller, RAM, the motherboard, or the power-supply. (although the RAM, DISK, and power-supply are most common) You can attempt to run the same "dd" command as in the preclear script to see if the values returned are randomly bad, or consistently the same. "skip, count, and "bs" are all arguments to the "dd" command. They describe the Block-size (bs), how many blocks to skip before starting to read (skip), and how many blocks to read (count). These types of errors are typically VERY HARD to diagnose, especially once you assign the disk to the array, as they show as constant random parity errors. (but you cannot tell which disk is the cause... if it even is a disk) Often times there are no other indications of problems (other than the data being returned different than what you stored) Try dd if=/dev/sdu count=200 skip=241800 bs=8225280 conv=noerror | sum | awk '{print $1}' and see what it prints. Try multiple times, is the value always the same? Try the other "skip" values... If other disks also fail in the same way, it is probably memory or power supply related. (not the disks) If not, you just have a bad disk. It should then be tested as a wheel-chock. That test involves placing the disk behind the wheel of a car and repeatedly driving over it testing its ability to impede the motion of the car. Once flattened to where there is a barely noticeable bump, the remains will probably fail any subsequent preclear, and could be RMA'd (you can try to blame the new form factor on UPS of FedEx) Joe L.
November 22, 201213 yr Author Hah - I am not ready to make this one a wheel-chock yet. I ran the command you gave me, using all four of the skip values and for each skip value I ran it three times. It always returned the exact same information for each skip value. For example: dd if=/dev/sdu count=200 skip=242200 bs=8225200 conv=noerror | sum | awk '{print $1}' produced this three times: 11874 200+0 records in 200+0 records out [snip time information] which is to be expected I think, because its the same number that was produced during preclear. Is there a way to test writing 0000 to just this area and re-testing with the dd command? by the way, I ran three preclears at the same time using a new controller. The other two passed, while this drive failed. This drive has failed multiple times on preclears on two other controllers. So, I am pretty certain its the drive - but I am just trying to understand why. Thanks!
November 22, 201213 yr Happy Thanksgiving... I'm getting ready yo go out for a few hours to deliver meals to the needy. First, the data you are reading, again and again is probably still in the disk-buffer-cache. So it is is unlikely you are reading the disk again and again, but the cached value. You'll need to clear the cache to be sure. Clear it by typing: sync echo 3 > /proc/sys/vm/drop_caches in between each read of the disk. You might also be able to add iflag=direct to the "dd" command like this to have it bypass the buffer cache: dd if=/dev/sdu iflag=direct count=200 skip=242200 bs=8225200 conv=noerror | sum | awk '{print $1}' Yes, you can write to just those sectors. I'll provide details later this afternoon when I return. Joe L.
November 22, 201213 yr Yes, you can write to just those sectors. I'll provide details later this afternoon when I return. Joe L. dd if=/dev/zero of=/dev/sdu count=200 bs=8225200 seek=242200 This time we read from /dev/zero, write to the disk being cleared, and provide the same count and block-size. Instead of "skip" you need to use "seek" when writing to the disk to advance to the correct starting point as shown in the example above. then,again you need to clear the disk cache before attempting to read the data back. Otherwise it will just read the disk buffer cache. To flush and empty the disk buffer cache, type: sync echo 3 > /proc/sys/vm/drop_caches Joe L.
Archived
This topic is now archived and is closed to further replies.