November 15, 20178 yr Attached is an image of the SMART page in unRAID for this drive. So I am wondering how this is possible? I have an old WD BLACK 2TB hard drive that is clearly having problems. I have run two extended SMART tests on the drive both of which have failed at 60%. The first test failed with LBA of the first bad sector at 2149769622 and the second SMART test failed with the first bad LBA at 2149731100. So clearly this drive is failing. To run the tests, I have the drive attatched to unRAID via "Unassigned devices." That said, I thought after the first SMART test I would try and run a preclear to see if the drive could remap the bad sector(s) and see if the drive is potentially salvageable. However, both preclear operations completed with 100% success with 0 reallocated sector count. For preclear, I used gfjardim - 0.9.4 with the full "clear" option. How on earth is it possible that this drive can pass preclear when it is clearly failing according to SMART. The drive also gets stuck on reads or writes at times so I am pretty certain it is bad. This has really shaken my confidence in preclear The drive is eight years old, but has less than 9000 power on hours. Pretty poor for a WD BLACK. I have six 3TB GREEN drives, one of which has failed, but all the rest have nearly 42,000 hours on them. I also have six 4TB GREEN drives none of which have failed, three 5TB GREEN drives, and two 8TB RED drives. None of those have any reallocated sectors. Thanks, craigr Edited November 15, 20178 yr by craigr
November 15, 20178 yr Community Expert 6 minutes ago, craigr said: How on earth is it possible that this drive can pass preclear when it is clearly failing according to SMART. Disks are much more likely to fail on read than on write, i.e., there's no reported error during the write (preclear) but there is during the read (smart test), disk should be replaced.
November 15, 20178 yr Author 2 minutes ago, johnnie.black said: Disks are much more likely to fail on read than on write, i.e., there's no reported error during the write (preclear) but there is during the read (smart test), disk should be replaced. Thanks for the input. But I am confused because with preclear every sector is first read, then written as 0, then read again. So shouldn't preclear run into this problem during a read? What am I missing? ...oh yeah, this drive is going in the garbage for sure, I just want to understand how preclear can't identify this drive as bad. Best, craigr Edited November 15, 20178 yr by craigr
November 15, 20178 yr Community Expert That is strange, I assumed you were skipping both pre and post reads on preclear, still if the extended SMART test fails drive is no good.
November 15, 20178 yr Author 16 minutes ago, johnnie.black said: That is strange, I assumed you were skipping both pre and post reads on preclear, still if the extended SMART test fails drive is no good. Yes, I did run the pre and post reads. OK so I am not the only one who finds this strange than! The drive is definitely failing and is garbage. It was actually a storage drive in my Windows machine and I noticed that it was locking up the computer periodically while trying to read or write. I pulled it out of the desktop and put it into the unRAID machine just because it's easier to run SMART tests and because I wanted to run preclear to see if I could remap the sectors with preclear. Remapping the bad sectors is now a moot point with how many seem to be bad based on the SMART test. But, like I said, I just want to understand why preclear is missing this failure. Usually when I get new drives I just run two preclears and don't do an extended SMART test until the drive has a thousand hours or so. With this revelation, simply running preclear may very well not be enough to prove a drive is good at all. It really may be necessary to run an extended SMART test as well. I wounder if there is something wrong with gfjardim - 0.9.4 and we should use "Joe L - 1.15" instead... I may try that and see if I get the same results with preclear. craigr Edited November 15, 20178 yr by craigr
November 15, 20178 yr Author Strange, when I try and run Joe L - 1.15 I am getting this error and no preclear. /boot/config/plugins/preclear.disk/preclear_disk.sh /dev/sdt root@Tower:/usr/local/emhttp# /boot/config/plugins/preclear.disk/preclear_disk.sh /dev/sdt sfdisk: invalid option -- 'R'
November 15, 20178 yr Community Expert 16 minutes ago, craigr said: Strange, when I try and run Joe L - 1.15 I am getting this error and no preclear. You need to patch the old script:
November 15, 20178 yr Community Expert Here is a link to the actual patch for JoeL script: https://forums.lime-technology.com/topic/12391-re-preclear_disksh-a-new-utility-to-burn-in-and-pre-clear-disks-for-quick-add/?page=53#comment-460592
November 15, 20178 yr Community Expert 19 minutes ago, Frank1940 said: Here is a link to the actual patch for JoeL script: That's the same link??
November 15, 20178 yr Community Expert 22 minutes ago, johnnie.black said: That's the same link?? There is a problem with those 'Expanded' links. There are actually TWO links inside of the the information that is displayed. One will take you to this post: https://forums.lime-technology.com/topic/12391-re-preclear_disksh-a-new-utility-to-burn-in-and-pre-clear-disks-for-quick-add/?page=53&tab=comments#comment-460592 and the other Link will take you to the thread: https://forums.lime-technology.com/topic/12391-re-preclear_disksh-a-new-utility-to-burn-in-and-pre-clear-disks-for-quick-add/ Needless to say, this can quite confusing to those who don't use the forum as much as you do. (To be completely honest, I will click on the wrong one about half of the time!) For this reason, I have opted to post up just the link rather than 'fancy' Graphic with its embedded links as there is no confusion. Edited November 15, 20178 yr by Frank1940
November 15, 20178 yr Author Well, the patch is not working for me. No errors when I apply the patch, but Joe L - 1.15 still doesn't run while gfjardim - 0.9.4 does run. WTH? craigr
November 15, 20178 yr Community Expert 3 minutes ago, craigr said: Well, the patch is not working for me. No errors when I apply the patch, but Joe L - 1.15 still doesn't run while gfjardim - 0.9.4 does run. WTH? craigr It sounds like the script was not patched. Note that you will have to cd to directory on the flash drive where that scrip is actually at, before you run the sed command. And did you copy and paste the sed command line? Retyping it is not for the faint of heart! When it runs it will tell how many changes it made. IF it does not do that, something is wrong!
November 15, 20178 yr Author 1 hour ago, Frank1940 said: It sounds like the script was not patched. Note that you will have to cd to directory on the flash drive where that scrip is actually at, before you run the sed command. And did you copy and paste the sed command line? Retyping it is not for the faint of heart! When it runs it will tell how many changes it made. IF it does not do that, something is wrong! OK, I did not get any feedback from the patch when I ran it so I guess something is wrong (I ran the patch sed -i -e "s/print \$9 /print \$8 /" -e "s/sfdisk -R /blockdev --rereadpt /" preclear_disk.sh while at the command prompt in putty). As you can see, I did CD into my root directory where I placed preclear_disk.sh way back in the day. I did also copy and past the line in putty for the patch rather than typing it by hand. That said, I am using the preclear gui now so perhaps preclear_disk.sh is someplace else in unRAID 6.x? I looked through the directories on my flash drive and did not see it anywhere else. Is there a way to patch the gui or just the command prompt? FWIW preclear_disk.sh on my root shows that it was indeed modified today, and I installed it a long time ago before there was a GUI version so I think the one in my root is the new version with gui interface. I could try running at the command prompt and see if it goes, but I don't know the command parameters to specify if it uses "gfjardim - 0.9.4" or "Joe L - 1.15." This seems as though it should be simple... Thanks for any help, craigr
November 15, 20178 yr Author OK, I found the new preclear at: cd /boot/config/plugins/preclear.disk I ran sed -i -e "s/print \$9 /print \$8 /" -e "s/sfdisk -R /blockdev --rereadpt /" preclear_disk.sh but still got no feedback from the command. However, now Joe L - 1.15 is indeed working in the GUI I'll post what I find after it's done running in about six hours... or tomorrow. Thanks for the help, craigr Edited November 15, 20178 yr by craigr
November 18, 20178 yr Author So the drive also passed preclear with Joe L - 1.15. I forgot how long Joe L - 1.15 takes to run compared to gfjardim - 0.9.4. So the moral of the story here is that preclear cannot be relied upon to find all bad discs. I think moving forward I may skip preclear when replacing drives and just do an extended SMART test. For adding drives I will probably do extended SMART tests and then a preclear without pre and post reads. I still don't understand HOW preclear doesn't force the drive to relocate sectors that SMART finds bad. Very strange. craigr
November 18, 20178 yr Note that a preclear writes to all sectors. But the disk doesn't know if there is a sector error. It just assumes that the sectors are ok and that the writes did work ok. It isn't practically possible for a HDD to read back all written content since that requires yet another revolution of the disk before the disk can perform a disk seek and do something on another track. At the most, some drives may catch some types of mishaps - such as a strong knock on the drive during the write - and redo the write. The drive may check the servo information directly after the write and make sure it's still centered on the next sector rotating in under the head. The long SMART test on is a read test - it reads every sector and verifies that the data is read back without any problem. It's only when reading the sector content that the disk can know if the content is bad. And only by rewriting and directly reading back can the disk know if the sector is physically damaged or if the read error was caused by some other issues and the sector may be reused. This is why it is important to always have some routine where all disk surface gets read regularly, to make sure that the disk can spot problematic sectors and have the data either rewritten or relocated to a spare sector before there are too many bit errors so the error correction code is not enough to correct the bit errors. Same with external USB drives - they should be powered up regularly and scanned.
November 19, 20178 yr Author 20 hours ago, pwm said: Note that a preclear writes to all sectors. But the disk doesn't know if there is a sector error. It just assumes that the sectors are ok and that the writes did work ok. It isn't practically possible for a HDD to read back all written content since that requires yet another revolution of the disk before the disk can perform a disk seek and do something on another track. At the most, some drives may catch some types of mishaps - such as a strong knock on the drive during the write - and redo the write. The drive may check the servo information directly after the write and make sure it's still centered on the next sector rotating in under the head. The long SMART test on is a read test - it reads every sector and verifies that the data is read back without any problem. It's only when reading the sector content that the disk can know if the content is bad. And only by rewriting and directly reading back can the disk know if the sector is physically damaged or if the read error was caused by some other issues and the sector may be reused. This is why it is important to always have some routine where all disk surface gets read regularly, to make sure that the disk can spot problematic sectors and have the data either rewritten or relocated to a spare sector before there are too many bit errors so the error correction code is not enough to correct the bit errors. Same with external USB drives - they should be powered up regularly and scanned. Please note in the above posts that I had pre and post read enabled on all preclear operations. So as far as I know, all sectors should have been read twice durring each preclear operation. Thus I still have the same questions, how can preclear NOT have triggered bad sectors when the extended SMART tests show specific bad LBA's? craigr Edited November 19, 20178 yr by craigr
November 19, 20178 yr What I find strange is that the SMART log mentions sector failures - the second run fails slightly earlier. But the image of all SMART parameters doesn't mention a single sector reallocation. Parameters 5, 196, 197, 198 are all zero. With the exception for the log entries, there is no counter in the SMART data that indicates any problems.
November 19, 20178 yr Author 21 minutes ago, pwm said: What I find strange is that the SMART log mentions sector failures - the second run fails slightly earlier. But the image of all SMART parameters doesn't mention a single sector reallocation. Parameters 5, 196, 197, 198 are all zero. With the exception for the log entries, there is no counter in the SMART data that indicates any problems. It is very strange. I assumed that the earlier LBA location on the second SMART test was due to more bad sectors resulting after the first SMART test. Perhaps that's not the case and there is something else failing in the drive (as you mentioned) that manifests as a bad sector in the SMART report. That said, I find it strange that both extended SMART tests failed close together and at 60%. The drive is clearly having a problem because it was causing my Windows PC to lock up with a solid HD LED occasionally. craigr
Archived
This topic is now archived and is closed to further replies.