Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

I went  back and did some more analysis of my syslog.  On the first preclear run all of the errors were isolated to two sectors (1682918047 & 1683029711).  It's a little strange that the post-clear SMART report showed a raw value of 3 current pending sectors.

 

The second preclear run produced errors on five sectors (1682918047, 1683018743, 1683029711, 1683035192, and 1683062151).  So the two suspect sectors from the first pass repeated and three new ones were added.  This time the post-clear SMART report showed a raw value of 7 current pending sectors.  Again not sure why it saying 7 when there were errors on only 5.

 

The third preclear run is not completed yet (about 50% thru the post read process) but so far there were errors reported against four sectors (1683018743, 1683035191, 1683062152, and 1683090639).  No errors reported yet against the repeats from the first two runs, maybe they've been reassigned.  Also one repeat of a new sector from the second run and then three new sectors.  But the funny thing is that the SMART report at the beginning of the run showed 8 current pending sectors when it only reported 7 at the end of the last run.  How can that be?  This drive is not assigned in my array so it should not be read from, so how could more pending sectors be identified between preclear runs?

 

Another strange occurence.  Before starting the precelar process I used the WDIDLE3 utility to disable the head parking feature on this drive.  I used the WDIDLE3 /D command and got a response that the head park time was set to something like 64.7 minutes - IIRC.  Then I tried the WDIDLE3 /S0 command and got a response that said head parking was disabled.  Well looking at the results it is clearly not disabled.  The load cycle count started the first run and 8 and ended it at 9.  In the hour and 25 minutes betwen preclear runs the load cycle count went from 9 to 96.  It ended the second run at 97.  In the 3 hours and 17 minutes between the 2nd and 3rd runs it wend from 97 to 295.  So clearly it has not been disabled eventhough it reported that it was.  I'm really beginning to not like this drive.

Link to comment

I'm pretty sure, in fact I'm positive, that I did a power cycle.  I know I did because I put the DOS bootable USB drive in the place of my unRAID USB drive to change the setting.  After changing the IDLE3 setting I powered down the server, swapped USB drives, then powered up in unRAID.  But just to be sure I'm going to check the setting again tonight.  If that doesn't work I'll try the WDIDLE3 /D command and see how that works.  I tried that the first time but it did not report that head parking was disabled (as the command is supposed to) just that it was a really long time (~64 minutes).

Link to comment

I went  back and did some more analysis of my syslog.  On the first preclear run all of the errors were isolated to two sectors (1682918047 & 1683029711).  It's a little strange that the post-clear SMART report showed a raw value of 3 current pending sectors.

 

The second preclear run produced errors on five sectors (1682918047, 1683018743, 1683029711, 1683035192, and 1683062151).  So the two suspect sectors from the first pass repeated and three new ones were added.  This time the post-clear SMART report showed a raw value of 7 current pending sectors.  Again not sure why it saying 7 when there were errors on only 5.

 

The third preclear run is not completed yet (about 50% thru the post read process) but so far there were errors reported against four sectors (1683018743, 1683035191, 1683062152, and 1683090639).  No errors reported yet against the repeats from the first two runs, maybe they've been reassigned.  Also one repeat of a new sector from the second run and then three new sectors.  But the funny thing is that the SMART report at the beginning of the run showed 8 current pending sectors when it only reported 7 at the end of the last run.  How can that be?  This drive is not assigned in my array so it should not be read from, so how could more pending sectors be identified between preclear runs?

 

Another strange occurence.  Before starting the precelar process I used the WDIDLE3 utility to disable the head parking feature on this drive.  I used the WDIDLE3 /D command and got a response that the head park time was set to something like 64.7 minutes - IIRC.  Then I tried the WDIDLE3 /S0 command and got a response that said head parking was disabled.  Well looking at the results it is clearly not disabled.  The load cycle count started the first run and 8 and ended it at 9.  In the hour and 25 minutes betwen preclear runs the load cycle count went from 9 to 96.  It ended the second run at 97.  In the 3 hours and 17 minutes between the 2nd and 3rd runs it wend from 97 to 295.  So clearly it has not been disabled eventhough it reported that it was.  I'm really beginning to not like this drive.

The same sectors repeating, to me would indicate a physical issue with the sector, not a noise sensitive issue as I first suspected.  Those sectors apparently should be re-allocated, but the drive keeps re-using them since it was able to read them when the write phase occurs.  They are bad, but not bad enough.... argh......
Link to comment

Well the third run is complete and there were no additional errors reported - so a total of four sectors had errors.  Ironically the current pending sector count went down from 8 to 2.  Still no reallocated sectors.  Maybe my prayers to the HDD gods are being answered.  I swapped the power connection with one from another drive just to see what happens.  I also checked and I don't know how but the IDLE3 timer was set back to the factory default of 8 secs.  That would explain why the load cycle counts were incrementing between preclear cycles.  I again disabled it using WDIDLE3 /S0 command.  Rebooted and verified that head parking was indeed disabled using WDIDLE3 /R and it was.  Perhaps I do have a power issue as the load cycle count went up by two during the last preclear cycle.  On the previous two runs it only changed by one, which is what I would expect.  I launched another preclear cycle and I'm interested to see what happens with this one.  I'll find out in about 30 hours.

Link to comment

Thought these unusual results might be of interest to Joe and/or others.

 

I ran a preclear on a disk in a port I had not used on my backplane.  Drive seemed to be recognized but noticed that smart reports were failing (see bolded section below) while disk was being precleared.  Sometimes it worked, sometimes it got this error.  Was moving along at a good clip and finished this morning. unRAID is reporting non-zero values on the drive.  Finding issues like this is why we run preclear scripts!  I am going to experiement further to see if I have a loose cable or something, or if the drive itself is bad.

 

Question about the non-zero values ... would preclear continue to search the entire drive before reporting non-zero values on the drive, or stop immediately when it hit one?  Since it appeared to go all the way through the entire disk, is there any way to know how many or where these non-zero values are?

 

Thanks Joe for this great tool.  Saved me from a nightmare if this had been added to the array!

 

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Home page is http://smartmontools.sourceforge.net/

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: === START OF INFORMATION SECTION ===

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Device Model: Hitachi HDS722020ALA330

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Serial Number: JK11A5YAKDWW3X

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Firmware Version: JKAOA3EA

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: User Capacity: 2,000,398,934,016 bytes

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Device is: Not in smartctl database [for details use: -P showall]

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ATA Version is: 8

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ATA Standard is: ATA-8-ACS revision 4

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Local Time is: Thu Sep 30 07:54:31 2010 EDT

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART support is: Available - device has SMART capability.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART support is: Enabled

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: === START OF READ SMART DATA SECTION ===

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART overall-health self-assessment test result: PASSED

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: General SMART Values:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Offline data collection status: (0x84)^IOffline data collection activity

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^Iwas suspended by an interrupting command from host.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^IAuto Offline Data Collection: Enabled.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Self-test execution status: ( 0)^IThe previous self-test routine completed

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^Iwithout error or no self-test has ever

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^Ibeen run.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Total time to complete Offline

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: data collection: ^I^I (23212) seconds.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Offline data collection

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: capabilities: ^I^I^I (0x5b) SMART execute Offline immediate.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^IAuto Offline data collection on/off support.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^ISuspend Offline collection upon new

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^Icommand.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^IOffline surface scan supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^ISelf-test supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^INo Conveyance Self-test supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^ISelective Self-test supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART capabilities: (0x0003)^ISaves SMART data before entering

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^Ipower-saving mode.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^ISupports SMART auto save timer.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Error logging capability: (0x01)^IError logging supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^IGeneral Purpose Logging supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Short self-test routine

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: recommended polling time: ^I ( 1) minutes.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Extended self-test routine

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: recommended polling time: ^I ( 255) minutes.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SCT capabilities: ^I (0x003d)^ISCT Status supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^ISCT Feature Control supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ^I^I^I^I^ISCT Data Table supported.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART Attributes Data Structure revision number: 16

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Vendor Specific SMART Attributes with Thresholds:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 3 Spin_Up_Time 0x0007 100 100 024 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 33

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART Error Log Version: 0

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: No Errors Logged

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART Self-test log structure revision number 1

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: No self-tests have been logged. [To run self-tests, use: smartctl -t]

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SMART Selective self-test log data structure revision number 1

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 1 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 2 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 3 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 4 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: 5 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: Selective self-test flags (0x0):

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: After scanning selected spans, do NOT read-scan remainder of disk.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]: If Selective self-test is pending on power-up, resume after 0 minute delay.

Sep 30 07:54:32 Tower preclear_disk-finish[6363]:

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ============================================================================

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ==

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Disk /dev/sdb has NOT been successfully precleared

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Postread detected un-expected non-zero bytes on disk==

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Ran 1 preclear-disk cycle

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ==

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Using :Read block size = 8225280 Bytes

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Last Cycle's Pre Read Time : 6:34:23 (84 MB/s)

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Last Cycle's Zeroing time : 5:45:35 (96 MB/s)

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Last Cycle's Post Read Time : 20:32:46 (27 MB/s)

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Last Cycle's Total Time : 32:53:57

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ==

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Total Elapsed Time 32:53:57

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ==

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Disk Start Temperature: 34C

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ==

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: == Current Disk Temperature: 32C,

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ==

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ============================================================================

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: S.M.A.R.T. error count differences detected after pre-clear

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: note, some 'raw' values may change, but not be an indication of a problem

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: 15,25c15,85

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < Error SMART Status command failed

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < Please get assistance from http://smartmontools.sourceforge.net/

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < Register values returned from SMART Status command are:

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < ST =0x50

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < ERR=0x00

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < NS =0x08

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < SC =0xa0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < CL =0x88

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < CH =0xe0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < SEL=0x40

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: < A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ---

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > === START OF READ SMART DATA SECTION ===

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SMART overall-health self-assessment test result: PASSED

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: >

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > General SMART Values:

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Offline data collection status: (0x84)^IOffline data collection activity

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^Iwas suspended by an interrupting command from host.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^IAuto Offline Data Collection: Enabled.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Self-test execution status: ( 0)^IThe previous self-test routine completed

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^Iwithout error or no self-test has ever

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^Ibeen run.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Total time to complete Offline

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > data collection: ^I^I (23212) seconds.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Offline data collection

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > capabilities: ^I^I^I (0x5b) SMART execute Offline immediate.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^IAuto Offline data collection on/off support.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^ISuspend Offline collection upon new

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^Icommand.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^IOffline surface scan supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^ISelf-test supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^INo Conveyance Self-test supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^ISelective Self-test supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SMART capabilities: (0x0003)^ISaves SMART data before entering

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^Ipower-saving mode.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^ISupports SMART auto save timer.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Error logging capability: (0x01)^IError logging supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^IGeneral Purpose Logging supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Short self-test routine

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > recommended polling time: ^I ( 1) minutes.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Extended self-test routine

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > recommended polling time: ^I ( 255) minutes.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SCT capabilities: ^I (0x003d)^ISCT Status supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^ISCT Feature Control supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ^I^I^I^I^ISCT Data Table supported.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: >

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SMART Attributes Data Structure revision number: 16

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Vendor Specific SMART Attributes with Thresholds:

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 3 Spin_Up_Time 0x0007 100 100 024 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 3

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: >

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SMART Error Log Version: 0

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > No Errors Logged

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: >

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SMART Self-test log structure revision number 1

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > No self-tests have been logged. [To run self-tests, use: smartctl -t]

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: >

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: >

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SMART Selective self-test log data structure revision number 1

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 1 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 2 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 3 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 4 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > 5 0 0 Not_testing

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > Selective self-test flags (0x0):

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > After scanning selected spans, do NOT read-scan remainder of disk.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: > If Selective self-test is pending on power-up, resume after 0 minute delay.

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: >

Sep 30 07:54:32 Tower preclear_disk-diff[6376]: ============================================================================

Sep 30 07:54:32 Tower preclear_disk-diff[6376]:

 

Link to comment

Thought these unusual results might be of interest to Joe and/or others.

 

I ran a preclear on a disk in a port I had not used on my backplane.  Drive seemed to be recognized but noticed that smart reports were failing (see bolded section below) while disk was being precleared.  Sometimes it worked, sometimes it got this error.  Was moving along at a good clip and finished this morning. unRAID is reporting non-zero values on the drive.  Finding issues like this is why we run preclear scripts!  I am going to experiement further to see if I have a loose cable or something, or if the drive itself is bad.

Unlikely to be a cabling issue, but I can't predict what an intermittent connection would do.

 

Question about the non-zero values ... would preclear continue to search the entire drive before reporting non-zero values on the drive, or stop immediately when it hit one? 

It continues to the end.
Since it appeared to go all the way through the entire disk, is there any way to know how many or where these non-zero values are?
Yes, in the /tmp directory you'll find a file named: /tmp/postread_errors$disk_basename

where disk_basename is your disk under test.  Check it out for the specific blocks and offsets.

 

Th test for non-zeros bytes is fairly crude, it is just a sum of all the values returned when it reads a block of data. (block size is set to the size, in bytes, of a cylinder as reported by

fdisk -l /dev/sdX

If the sum of the bytes is zero, then all the bytes read in that set of blocks were zero.

I do not know which specific byte/sector was non-zero.

Thanks Joe for this great tool.  Saved me from a nightmare if this had been added to the array!

Yes, these disks that occasionally random values show up as parity errors when parity is checked, but unless you are doing a NOCORRECT check, they also then modify parity to reflect the bad data reported from the drives. 

 

They cause hair-loss, because you'll pull your hair out trying to figure out the cause of the random parity errors.  (you'll have no idea which disk is the cause, because they do not report these as errors, they think they are reading the platter correctly)

 

If not an obviously loose cable, RMA the drive. 

 

Joe L.

Link to comment

Contents of postread_errorssdb

 

skip=135200 count=200 returned  instead of 00000

skip=149000 count=200 returned  instead of 00000

 

Also forgot to report - the disk was occasionally reporting in standby (not spinnning) while preclear was occurring.

Returned "" (blank) instead of 00000 is more interesting.  Glad you took the time to give me feedback.  That would probably indicate the drive did not respond at all. 

 

Interesting...  It could then be a cabling issue, and not a random errant bit, or a drive that occasionally likes to not respond.  (it needs to respond to get its temperature or spin-up/down status)

 

Joe L.

Link to comment

Ok - so I decided to run a short and a long smart test.

 

The short test ran and completed.  The long test seemed to start, but when I checked on progress, it seemed as though it had forgotten about the request ...

 

 

 

 

root@Tower:~# smartctl -d ata -tlong /dev/sdb

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".

Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 255 minutes for test to complete.

Test will complete after Thu Sep 30 16:01:42 2010

 

Use smartctl -X to abort test.

 

<about 20 minutes passed>

 

root@Tower:~# smartctl -a -d ata /dev/sdb

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:    Hitachi HDS722020ALA330

Serial Number:    JK11A5YAKDWW3X

Firmware Version: JKAOA3EA

User Capacity:    2,000,398,934,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Thu Sep 30 11:58:57 2010 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x85) Offline data collection activity

                                        was aborted by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 249) Self-test routine in progress...

                                        90% of test remaining.

Total time to complete Offline

data collection:                (23212) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

SCT capabilities:              (0x003d) SCT Status supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000b  100  100  016    Pre-fail  Always      -      0

  2 Throughput_Performance  0x0005  131  131  054    Pre-fail  Offline      -      109

  3 Spin_Up_Time            0x0007  100  100  024    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      3

  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0

  8 Seek_Time_Performance  0x0005  121  121  020    Pre-fail  Offline      -      35

  9 Power_On_Hours          0x0012  100  100  000    Old_age  Always      -      37

10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      3

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      3

193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      3

194 Temperature_Celsius    0x0002  181  181  000    Old_age  Always      -      33 (Lifetime Min/Max 25/36)

196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0

 

SMART Error Log Version: 0

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed without error      00%        37        -

<Shouldn't there be a row here saying the Long test was running??>

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Link to comment

:-[

 

Thanks!  Will check back tonight and see how it did.  I am doing a big copy operation on array right now.  Hope it will be done tonight so I can take the server down and check all of the connections.

 

Trouble is, I don't have a very reliable way to tell if the problem is fixed.

Link to comment

I ran a preclear on a Hitachi 2 tb drive yesterday and when it came towards the end I got this message

Sorry /dev/sdj  mbr was not precleared  or close to that.  It made it through all but the final step.

 

This is actually my 3 time trying to run preclear on this drive.  The first time, my system totally lockup, couldn't http in or telnet, and counsel was frozen, so I couldn't get a syslog.  I was running 2 preclears at the same time.  And both on the Sata Card ports.

 

I ran it again, this time just this drive, and did the same thing.  And no syslog because everything froze.

 

So I thought 3rd times a charm.  I moved this drive off of the Sata card, and on to the Motherboard port.  Thinking that might help.  I also ran the preclear with the (-n) as this preclear_disk.sh -n /dev/sdj.  My thinking here was that it always made it up to the end, so if it worked this way, I would run it again.  Well, it did and it didn't work.  It failed, but this time, nothing locked up so I have a Syslog this time.

 

I have shortened the syslog at the point that it starts repeating similar info.  If not it would be thousands of lines long and about 44Mb's.

 

 

I figure that I have a bad brand new hard drive, but would like someone to take a look if possible.

syslog.txt

Link to comment

I ran a preclear on a Hitachi 2 tb drive yesterday and when it came towards the end I got this message

Sorry /dev/sdj  mbr was not precleared  or close to that.  It made it through all but the final step.

 

This is actually my 3 time trying to run preclear on this drive.  The first time, my system totally lockup, couldn't http in or telnet, and counsel was frozen, so I couldn't get a syslog.  I was running 2 preclears at the same time.  And both on the Sata Card ports.

 

I ran it again, this time just this drive, and did the same thing.  And no syslog because everything froze.

 

So I thought 3rd times a charm.  I moved this drive off of the Sata card, and on to the Motherboard port.  Thinking that might help.  I also ran the preclear with the (-n) as this preclear_disk.sh -n /dev/sdj.  My thinking here was that it always made it up to the end, so if it worked this way, I would run it again.  Well, it did and it didn't work.  It failed, but this time, nothing locked up so I have a Syslog this time.

 

I have shortened the syslog at the point that it starts repeating similar info.  If not it would be thousands of lines long and about 44Mb's.

 

 

I figure that I have a bad brand new hard drive, but would like someone to take a look if possible.

Since you've already move the drive from one disk controller to another, it would eliminate the disk controller from being a possibility.

 

The drive initially responds when the SMART report is first performed and then it times-out.  The OS resets it and tries again, it still fails to respond.  All the subsequent writes to it fail with errors written to the syslog.  Eventually, the syslog would grow to where it uses all memory, and your server would become un-responsive as you've discovered.

 

Only other possibility, besides the drive itself, would be a poor or intermittent power connection to the drive. (try an alternate power connection)  Other than that... I'd say RMA the drive.  Be thankful it was discovered before you tried using it in your array.

Oct  1 00:52:28 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x3f80000 SErr 0x80000 action 0x6 frozen

Oct  1 00:52:28 Tower kernel: ata9: SError: { 10B8B }

Oct  1 00:52:28 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED

Oct  1 00:52:28 Tower kernel: ata9.00: cmd 61/00:98:20:9d:24/04:00:de:00:00/40 tag 19 ncq 524288 out

Oct  1 00:52:28 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Oct  1 00:52:28 Tower kernel: ata9.00: status: { DRDY }

Oct  1 00:52:28 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED

Oct  1 00:52:28 Tower kernel: ata9.00: cmd 61/00:a0:20:a1:24/04:00:de:00:00/40 tag 20 ncq 524288 out

Oct  1 00:52:28 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Oct  1 00:52:28 Tower kernel: ata9.00: status: { DRDY }

Oct  1 00:52:28 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED

Oct  1 00:52:28 Tower kernel: ata9.00: cmd 61/00:a8:20:a5:24/04:00:de:00:00/40 tag 21 ncq 524288 out

Oct  1 00:52:28 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Oct  1 00:52:28 Tower kernel: ata9.00: status: { DRDY }

Oct  1 00:52:28 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED

Oct  1 00:52:28 Tower kernel: ata9.00: cmd 61/00:b0:20:a9:24/04:00:de:00:00/40 tag 22 ncq 524288 out

Oct  1 00:52:28 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Oct  1 00:52:28 Tower kernel: ata9.00: status: { DRDY }

Oct  1 00:52:28 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED

Oct  1 00:52:28 Tower kernel: ata9.00: cmd 61/00:b8:20:ad:24/04:00:de:00:00/40 tag 23 ncq 524288 out

Oct  1 00:52:28 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Oct  1 00:52:28 Tower kernel: ata9.00: status: { DRDY }

Oct  1 00:52:28 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED

Oct  1 00:52:28 Tower kernel: ata9.00: cmd 61/00:c0:20:b1:24/04:00:de:00:00/40 tag 24 ncq 524288 out

Oct  1 00:52:28 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Oct  1 00:52:28 Tower kernel: ata9.00: status: { DRDY }

Oct  1 00:52:28 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED

Oct  1 00:52:28 Tower kernel: ata9.00: cmd 61/00:c8:20:b5:24/04:00:de:00:00/40 tag 25 ncq 524288 out

Oct  1 00:52:28 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Oct  1 00:52:28 Tower kernel: ata9.00: status: { DRDY }

Oct  1 00:52:28 Tower kernel: ata9: hard resetting link

Oct  1 00:52:36 Tower kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Oct  1 00:52:36 Tower kernel: ata9.00: configured for UDMA/133

Oct  1 00:52:36 Tower kernel: ata9.00: device reported invalid CHS sector 0

Oct  1 00:52:36 Tower last message repeated 6 times

Oct  1 00:52:36 Tower kernel: ata9: EH complete

Oct  1 01:00:12 Tower kernel: sd 4:0:0:0: [sdj] Unhandled error code

Oct  1 01:00:12 Tower kernel: sd 4:0:0:0: [sdj] Result: hostbyte=0x00 driverbyte=0x06

Oct  1 01:00:12 Tower kernel: sd 4:0:0:0: [sdj] CDB: cdb[0]=0x2a: 2a 00 de ab 6d 20 00 04 00 00

Oct  1 01:00:12 Tower kernel: end_request: I/O error, dev sdj, sector 3735776544

Oct  1 01:00:12 Tower kernel: Buffer I/O error on device sdj, logical block 466972068

Oct  1 01:00:12 Tower kernel: lost page write due to I/O error on sdj

Oct  1 01:00:12 Tower kernel: Buffer I/O error on device sdj, logical block 466972069

Oct  1 01:00:12 Tower kernel: lost page write due to I/O error on sdj

Oct  1 01:00:12 Tower kernel: Buffer I/O error on device sdj, logical block 466972070

Oct  1 01:00:12 Tower kernel: lost page write due to I/O error on sdj

Oct  1 01:00:12 Tower kernel: Buffer I/O error on device sdj, logical block 466972071

Oct  1 01:00:12 Tower kernel: lost page write due to I/O error on sdj

Oct  1 01:00:12 Tower kernel: Buffer I/O error on device sdj, logical block 466972072

Oct  1 01:00:12 Tower kernel: lost page write due to I/O error on sdj

Oct  1 01:00:12 Tower kernel: Buffer I/O error on device sdj, logical block 466972073

Oct  1 01:00:12 Tower kernel: lost page write due to I/O error on sdj

Oct  1 01:00:12 Tower kernel: Buffer I/O error on device sdj, logical block 466972074

Link to comment

 

Since you've already move the drive from one disk controller to another, it would eliminate the disk controller from being a possibility.

 

The drive initially responds when the SMART report is first performed and then it times-out.  The OS resets it and tries again, it still fails to respond.  All the subsequent writes to it fail with errors written to the syslog.  Eventually, the syslog would grow to where it uses all memory, and your server would become un-responsive as you've discovered.

 

Only other possibility, besides the drive itself, would be a poor or intermittent power connection to the drive. (try an alternate power connection)  Other than that... I'd say RMA the drive.  Be thankful it was discovered before you tried using it in your array.

 

I will try it again tonight with a different power connection and see what it does.  Right now it is on the last connection of 4 on the Corsair's power cable.  Hopefully this is it. 

 

Thank you

Link to comment

I changed my power cable to one that I have been using on a drive.  I wish I could report that it made the difference, but it didn't.  Totally frozen by 96% complete of step 2. 

Time to RMA this drive.

Sorry you have a bad drive, but better learning now than after you added it to your array.
Link to comment

:-[

 

Thanks!  Will check back tonight and see how it did.  I am doing a big copy operation on array right now.  Hope it will be done tonight so I can take the server down and check all of the connections.

 

Trouble is, I don't have a very reliable way to tell if the problem is fixed.

 

The long smart test ran successfully.

 

I moved the disk to a different controller and ran preclear successfully (so disk is okay).

 

The smart report error I was getting suggested running with the "-T permissive" option.  When I added that to the command, it worked.  So I think that has more to do with the controller than with the preclear error (I confirmed this on other ports).  My other controllers don't requrie this permissive option.

 

Joe L., you might want to add the permissive option to your unmain and preclear scripts.

 

I still don't know why I got those 2 errors preclearing the disk.  But I've now reseated all of the cables and plan to continue to run preclear tests.

 

Here is the error I was seeing in case someone is searching for the forum looking for this error:

 

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model:     Hitachi HDS722020ALA330

Serial Number:    JK11A5YAKDBxxx

Firmware Version: JKAOA3EA

User Capacity:    2,000,398,934,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Sat Oct  2 13:52:13 2010 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

Error SMART Status command failed

Please get assistance from http://smartmontools.sourceforge.net/

Register values returned from SMART Status command are:

ST =0x50

ERR=0x00

NS =0x00

SC =0xc8

CL =0x43

CH =0x3b

SEL=0x40

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

 

Link to comment

I recently purchased a couple of new 2TB Hitachi drives and tried replacing both my 1TB parity and disk1 drives. I followed the instructions from the "official" unRAID manual which doesn't include several steps recommended by knowledgeable users like Joe L., including parity checks and running pre-clear on the new drives. After replacing the parity drive everything seemed fine, but after replacing the disk1 data drive, 151 errors were reported for the parity drive. More info is available in this thread: http://lime-technology.com/forum/index.php?topic=8096.msg78237#msg78237

 

Joe explained how to get my array back to its previous state with the 1TB drives. And, yesterday, I finished pre-clearing both the new 2TB Hitachi drives. However, before I put them in my array, I wanted to ask a few questions about the results of the pre-clear.

 

I ran the pre-clears from 2 PuTTY session windows. At the end of the session for one of the disks, there were several errors listed. I don't know how to interpret these errors, so I tried copying & pasting the text to a txt file. Unfortunately, I didn't know that common Windows copy/paste techniques could result in pasting back into the PuTTY window. Afterwards, I also discovered the command for copying everything in the PuTTY window to the clipboard. Because of my sloppy copy/paste mess, it is hard to interpret my pre-clear results. I would get a syslog, but my server (and all other computers in my home) were shut down last night during a very rare power failure in our neighborhood.

 

I've attached the session text from both of the drives to this message in case someone can decipher these and help me out. The "sda" drive doesn't seem to have any errors. However, the "sdg" drive might have some problems. I'd like to know whether this drive needs to be returned to Newegg because I want to make sure I do so within the terms of their return policy (not sure if it is 15 or 30 days).

Pre-Clear_session_for_sdg_drive.txt

Pre-Clear_session_for_sda_drive.txt

Link to comment

Drive /dev/sdg

Prior to the pre-clear there was 1 re-allocated sector.

<  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail Always  -      1

---

After the pre-clear, there were 4 re-allocated sectors.

>  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail Always  -      4

 

Prior to the pre-clear there was 1 re-allocated "event"  (the one sector it had re-allocated)

< 196 Reallocated_Event_Count 0x0032  100  100  000    Old_age Always  -  1

 

Prior to the pre-clear there were 3 sectors pending re-allocation when next written.

< 197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always -  3

---

After the pre-clear there were 4 re-allocated events (the 4 sectors it re-allocated)

> 196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always -  4

After the pre-clear there are no more sectors pending re-allocation.

> 197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always  -  0

 

Notice in all cases the normalized value of 100 is un-changed and nowhere near the "normalized" failure threshold of 5

 

Since most large disks have several thousand spare sectors, this is expected.  There is nothing really wrong with the drive and they would have every right to consider it to not be failing.  An RMA is not in order unless you see the re-allocated sector count growing over the next months/years.

 

I would run a few more pre-clear cycles on the drive.  It the re-allocated sector count continues to increment, then you have the ammunition to RMA the drive and defend it is defective.  If the re-allocation sector cont goes un-changed, you'll probably be fine for a very long time.

 

Joe L.

Link to comment

Does preclear write a log anywhere?  I've got what it says on the screen for the 5 drives I just ran, but it's a lot to copy out by hand ;)  I did it from the root console.  Got errors (at least I think they are errors) and wanted to check on what the heck they were.

It writes its output to the syslog.  In addition the smart reports for the drives are all in the /tmp directory.  Both are available until you reboot.

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.