PeteAron Posted November 22, 2011 Posted November 22, 2011 I am preclearing 2 new drives and an older WD drive using the latest 5.0 beta 13. My WD went through the entire process, but at the end returned these results: ================================================================== 1.13 = unRAID server Pre-Clear disk /dev/sda = cycle 1 of 1, partition start on sector 64 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 29C, Elapsed Time: 14:16:08 ========================================================================1.13 == WDC WD10EADS-00M2B0 WD-WMAV50229330 == Disk /dev/sda has NOT been precleared successfully == skip=113800 count=200 bs=8225280 returned 53564 instead of 00000 skip=114600 count=200 bs=8225280 returned 51645 instead of 00000 skip=116600 count=200 bs=8225280 returned 59265 instead of 00000 skip=119400 count=200 bs=8225280 returned 37918 instead of 00000 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sda /tmp/smart_finish_sda ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Spin_Up_Time = 191 113 21 ok 3433 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. Can anyone suggest my next action? Just preclear again, any hope for the drive? I have not tested the memory in this new box yet, I will do that when the other two finish tonight.
Interstellar Posted November 23, 2011 Posted November 23, 2011 Run it again. I had a drive which threw up a few sector errors first time through, ran it again and it was ok as it had re-allocated them.
WeeboTech Posted November 23, 2011 Posted November 23, 2011 Agreed, run it again, also there is a tool called badblocks that will write and read test a bunch of patterns to every sector on the hard drive. This should surely force a remap if there is actually a bad sector.
Joe L. Posted November 23, 2011 Posted November 23, 2011 Run it again. I had a drive which threw up a few sector errors first time through, ran it again and it was ok as it had re-allocated them. those errors were not unreadable sectors. They were sectors that did not return zeros when read back. They had no other error. That type of error causes hair loss. You will pull your hair out trying to figure out why parity is constantly finding errors. There is a good possibility it is the memory, or disk controller port, and not the disk, or a poorly regulated power supply, but odds are it is the disk if a memory test passes.
Interstellar Posted November 23, 2011 Posted November 23, 2011 Run it again. I had a drive which threw up a few sector errors first time through, ran it again and it was ok as it had re-allocated them. those errors were not unreadable sectors. They were sectors that did not return zeros when read back. They no other error. That type of error causes hair loss. You will pull your hair out trying to figure out why parity is constantly finding errors. There is a good possibility it is the memory, or disk controller port, and not the disk, or a poorly regulated power supply, but odds are it is the disk if a memory test passes. It's the cache drive, so given it had two runs and no new SMART errors have been logged since then I'm quite happy it's ok. I've had no disk related parity errors in 8 months, so I'm quite happy about that. (Other errors was me messing about with voltages).
Joe L. Posted November 23, 2011 Posted November 23, 2011 Run it again. I had a drive which threw up a few sector errors first time through, ran it again and it was ok as it had re-allocated them. those errors were not unreadable sectors. They were sectors that did not return zeros when read back. They no other error. That type of error causes hair loss. You will pull your hair out trying to figure out why parity is constantly finding errors. There is a good possibility it is the memory, or disk controller port, and not the disk, or a poorly regulated power supply, but odds are it is the disk if a memory test passes. In my personal opinion, if you do not have an explanation as to why when reading back zeros in the post-read, non-zero values were detected, then you are (pardon the expression) a complete idiot for using that disk as the cache drive. What you are basically saying is it is OK if the files you write to it are INCORRECTLY copied to the protected array. The corruption of an occasional byte here and there in the process does not bother you as long as the SMART reports on the disk do not display any error. After all, it was just 4 different spots on the cache drive that were read incorrectly, and who knows, those spots might not be important to the files or media you are writing to the drive.
vca Posted November 23, 2011 Posted November 23, 2011 Run it again. I had a drive which threw up a few sector errors first time through, ran it again and it was ok as it had re-allocated them. those errors were not unreadable sectors. They were sectors that did not return zeros when read back. They had no other error. That type of error causes hair loss. You will pull your hair out trying to figure out why parity is constantly finding errors. There is a good possibility it is the memory, or disk controller port, and not the disk, or a poorly regulated power supply, but odds are it is the disk if a memory test passes. Sounds like it might be the same issue as I ran into a while ago on a 1.5TB WD drive: http://lime-technology.com/forum/index.php?topic=11515.msg109840#msg109840 where sometimes the drive would return different data from the sector and yet not signal any form of read error. Took me a long time to isolate the problem to the one drive. After removing the drive from the array and pre-clearing it the problem went away, but I'm not going to use that drive for anything important again. Regards, Stephen
Interstellar Posted November 23, 2011 Posted November 23, 2011 Run it again. I had a drive which threw up a few sector errors first time through, ran it again and it was ok as it had re-allocated them. those errors were not unreadable sectors. They were sectors that did not return zeros when read back. They no other error. That type of error causes hair loss. You will pull your hair out trying to figure out why parity is constantly finding errors. There is a good possibility it is the memory, or disk controller port, and not the disk, or a poorly regulated power supply, but odds are it is the disk if a memory test passes. In my personal opinion, if you do not have an explanation as to why when reading back zeros in the post-read, non-zero values were detected, then you are (pardon the expression) a complete idiot for using that disk as the cache drive. What you are basically saying is it is OK if the files you write to it are INCORRECTLY copied to the protected array. The corruption of an occasional byte here and there in the process does not bother you as long as the SMART reports on the disk do not display any error. After all, it was just 4 different spots on the cache drive that were read incorrectly, and who knows, those spots might not be important to the files or media you are writing to the drive. To be honest, my cache drive doesn't deal with anything important, only my backed-up DVD/Blu-ray collections and anything actually important goes straight to the protected array. I think I'd be an idiot if I spent £100 replacing the drive because once upon-a-time it threw some bad sectors. Hard disks sometimes throw bad sectors. I have a 500GB WD drive here which is 6 years old and has about 50 bad sectors, when was the last time this number changed even after yearly clears? Three. So for three years running this 6 year old disk has produced no errors what-so-ever. Add to the fact you have no idea if any of the disks in any of your protected array have bad bits. This is akin to stability testing, how long before you conclude it's actually stable? Failure in my mind is when a drive repeatedly produces read/write errors, this drive does not. Edit: Is there a way to prove my point by running something akin to pre-clear but on a formatted device with data on it?
Joe L. Posted November 23, 2011 Posted November 23, 2011 I agree with you. A few reallocated sectors are no issue at all, especially if they do not continue to grow at a rapid rate. If an occasional copy of a bad bit does not bother ypu, fine. But, the errors detcted by the preclear script were not reallocated sectors. The disk just did not read back the zeros.that were written. What it read passed all its internal checksums and thecrc check on the. Sata link. It just silently corrupted your file. Yes, youcan run repeated dd commands of the same set of blocls and they should always get the same checksum
WeeboTech Posted November 24, 2011 Posted November 24, 2011 who knows, those spots might not be important to the files or media you are writing to the drive. Or the superblock... In my case recently. the bad sectors popped up right in the superblock. The bad sectors were so bad that I could not read the drive sequentially forward to recover. I had to go forward, when the drive went off line, reset it, and continue. Then read the drive backwards until the bad sector. See my thread on ddrescue, It took two days to recover from a bad block that caused the firmware to timeout this causing linux to take the drive off line. I would highly recommend a badblocks test. That tests every sector with multiple patterns and logs the bad blocks to a file. It's better then a DD read test. In addition the badblocks can be used as input to mkreiserfs so that reiserfs will mark them as bad also, thus avoiding them in the future.
WeeboTech Posted November 24, 2011 Posted November 24, 2011 Edit: Is there a way to prove my point by running something akin to pre-clear but on a formatted device with data on it? badblocks has a readonly mode, a non destructive read/write/read mode and a destructive write mode. (be careful).
JonathanM Posted November 24, 2011 Posted November 24, 2011 Run it again. I had a drive which threw up a few sector errors first time through, ran it again and it was ok as it had re-allocated them. those errors were not unreadable sectors. They were sectors that did not return zeros when read back. They had no other error. That type of error causes hair loss. You will pull your hair out trying to figure out why parity is constantly finding errors. There is a good possibility it is the memory, or disk controller port, and not the disk, or a poorly regulated power supply, but odds are it is the disk if a memory test passes. Joe, how much trouble would it be to add an additional optional step into the preclear script to write all ones instead of zeros for the first pass? That way we could see if there was a bit stuck at zero, as well as the current case where it fails on a random bit stuck at one? It could be an additional command line switch for thoroughness, after the initial smart test and read cycle it would write all ones and read them back, then write all zeros and read them back, then write the preclear magic and get the final smart status. If my logic is sound, it should just be a copy and paste of a couple lines of code, along with the edits to make the change from zero to one on the write and read cycle, along with the extra progress output and command line switch parsing. Easy peasy hour of coding and week of testing, right?
WeeboTech Posted November 24, 2011 Posted November 24, 2011 It would be far better to upgrade preclear to do a badblocks test and save the bad block list to a file. Then use the subsequent file as input to mkreiserfs. It was designed that way from early days when reliability wasn't as good as it is today. In fact it's dam good, but still not perfect. When a corruption occurs these days, more data is at stake. I'm tellin' ya, you don't want a bad block coming up in the middle of your superblock LOL! I've been there and it was tough getting past it. badblocks does a 4 pass test, 0xaa, 0x55, 0xff, 0x00. Any bad blocks are put in a list if the option is enabled. It can be run in verbose mode. It's not as cool as the current DD status with MB/s, but you do have a clear indication of the state. There are other tools such as scrub and shred. However badblocks is designed to write patterns, read them back acknowledging the bad blocks and assist in skipping them during a higher level format. I think this is what spinrite used to do also. I know I plan to do a monthly badblock read on each disk just for insurance now. I've seen that even a monthly parity check doesn't insure every drive is in good condition. bad blocks in the preclear instead of x00 is the best way to go. Unfortunately it takes a really long time.
Joe L. Posted November 24, 2011 Posted November 24, 2011 I think the addition of badblocks as an initial (optional...and very very long) phase is a good idea.
WeeboTech Posted November 24, 2011 Posted November 24, 2011 I think the addition of badblocks as an initial (optional...and very very long) phase is a good idea. Agreed, but consider saving the badblocks to a file with the -o option, then use that as input to the mkreiserfs stage. There is also an option to specifcy a pattern. This might be useful for faster preclears. I.E. Do a one pass x00 pattern write while saving the bad blocks. Thing is, badblocks does the whole write and re-read that the preclear is doing, only it saves what's questionable to be used in the next step of high level format. What I like about the DD method employed now is you can see how fast you are writing/reading from the drive. That's the only thing missing from bad blocks. Perhaps we could adjust it or use a piped value to determine actual transfer speed.
WeeboTech Posted November 24, 2011 Posted November 24, 2011 It could be an additional command line switch for thoroughness, after the initial smart test and read cycle it would write all ones and read them back, then write all zeros and read them back, then write the preclear magic and get the final smart status. Just for added information and clarity. bad blocks does a 4 pass test in write mode of the following patterns. 0xAA = 10101010 0x55 = 01010101 0xff = 11111111 0x00 = 00000000
Interstellar Posted November 24, 2011 Posted November 24, 2011 I think I have found the source of confusion regarding my cache drive. I had a change in *reallocated sectors* first pass, with no changes to sectors on the second pass. This error is different to the OP's drive. I think I must have had a brain fart. kimifelipe, did you run pre-clear again? @WeeboTech, if my cache drive is /dev/sda, and the array is in maintenance mode, how would I run a bad blocks test? i.e. the command line?
WeeboTech Posted November 24, 2011 Posted November 24, 2011 Here's what I would do. mkdir -p /boot/log smartctl -a /dev/sda > /boot/log/smartctl-1.sda badblocks -vs -o /boot/log/badblocks.sda /dev/sda This is a readonly test. smartctl -a /dev/sda > /boot/log/smartctl-2.sda then diff the two smartctl logs to see the changes. I might even do a smartctl -t short test on the drives before and after just to be informed. If you are sure there is nothing important you can add the -n option for a non-destructive read-write mode. but if you have any power issues, this could cause a problem as your drive is being read/tested/then re-written with the original data. If you are OK with erasing the cache drive and laying down a new format/filesystem then you can use the -w option which will do a write mode test. See the manpage here. http://linux.die.net/man/8/badblocks
PeteAron Posted December 9, 2011 Author Posted December 9, 2011 I think I have found the source of confusion regarding my cache drive. I had a change in *reallocated sectors* first pass, with no changes to sectors on the second pass. This error is different to the OP's drive. I think I must have had a brain fart. kimifelipe, did you run pre-clear again? @WeeboTech, if my cache drive is /dev/sda, and the array is in maintenance mode, how would I run a bad blocks test? i.e. the command line? I did. I was using the new hardware as a free server at the time of the post, with only three drives, preclearing all of them at the same time. I rebuilt the server and all three passed. iirc I posted the reports to ask for advice and was told the drives looked okay. I'll check. In any event the two 2 TB drives are now over half full and the WDEARS in the OP now has about 200 gb on it. Everything seems okay. Thanks for asking.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.