Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

They all look great.  Enjoy your server.

 

Is it normal for the Post Read on all his drives to be nearly half the Preread and Zeroing speeds?

 

Yes.  In rough terms, the pre-read takes 1/4th of the time for a cycle, the actual pre-clear takes 1/4th of the time; and the post-read takes 1/2 of the overall time.

 

Link to comment

They all look great.  Enjoy your server.

 

Is it normal for the Post Read on all his drives to be nearly half the Preread and Zeroing speeds?

Yes, perfectly normal .... it is in the post-read where the verification of written zeros is performed.  That takes the additional time.
Link to comment
  • 2 weeks later...

RMA'd a Seagate ES2 drive and got back a repaired one.

Ran preclear 3 times and ... see the results.

What about those failures in the SMART log?  ???

 

My guess is you received a drive with some problems, that had been 'repaired' by clearing the SMART tables, masking the problems, so the first test runs re-exposed the problems.  The first 2 Preclears seemed to have dealt with most of them, and the third looks much better, but I'm not confident you've uncovered ALL of the marginal sectors yet.  I'd run 2 or 3 more Preclears, and I'd only feel more confident if I had at least 2 passes with NO further changes, no more Current Pending sectors at any phase, no additional Uncorrectables, no additional Reallocated sectors.  If interested and have time, you might also try a full badblocks run with the -w option.

 

The other possibility is that it's a bad drive, and it's going to continue getting worse.  I suspect that after another Preclear, you will either know it's bad or may decide that you aren't willing to trust the drive, even if it starts behaving, has clean reports.

Link to comment

RMA'd a Seagate ES2 drive and got back a repaired one.

Ran preclear 3 times and ... see the results.

What about those failures in the SMART log?  ???

 

My guess is you received a drive with some problems, that had been 'repaired' by clearing the SMART tables, masking the problems, so the first test runs re-exposed the problems.  The first 2 Preclears seemed to have dealt with most of them, and the third looks much better, but I'm not confident you've uncovered ALL of the marginal sectors yet.  I'd run 2 or 3 more Preclears, and I'd only feel more confident if I had at least 2 passes with NO further changes, no more Current Pending sectors at any phase, no additional Uncorrectables, no additional Reallocated sectors.  If interested and have time, you might also try a full badblocks run with the -w option.

 

The other possibility is that it's a bad drive, and it's going to continue getting worse.  I suspect that after another Preclear, you will either know it's bad or may decide that you aren't willing to trust the drive, even if it starts behaving, has clean reports.

 

I would suggest keeping all the pre-clear SMART results, just in case Seagate questions why you want to return a drive they just shipped you. From what I've read around here, they don't normally question it, but returning a just shipped drive might raise an eyebrow somewhere. What's a few K of disk space for a few weeks, just to be on the safe side...

Link to comment

Thank you for your feedback.

 

Of course I will keep the logs, they are saved on the unRAID website.  ;D

 

In fact, the bad blocks don't give me headaches.

I'm more worried about this log entries:

Error 418 occurred at disk power-on lifetime: 55 hours (2 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   1d+02:05:45.236  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00   1d+02:05:45.209  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00   1d+02:05:45.207  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   1d+02:05:45.194  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   1d+02:05:45.166  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] 

 

Meanwhile I tested a second drive - the same model - and it logs the same errors as this one.

 

I've opened a ticket with the seagate support.

I'm curious about their answers.

 

Link to comment

LOL

 

Just received word from seagate.

 

Claudia (that's her name) tells me that she couldn't open the file attached to the ticket.

As she doesn't know what tool I used she makes clear that only results from seatools are valid to her (seagate).

If seatools don't indicate any problem, the drive is OK.

 

Wow, that's the skill level that we're faced with...  :(

(FYI, I attached the preclear_finish...)

 

 

Then she tells me if I refer to bad blocks, there would be no problem because they can be "replaced and repaired" in large quantities.

 

Cool eh!? My drives CAN DO THAT!  :P

 

In my answer I copied the content of the preclear file into the mail but

I guess there won't be an enlighting answer from seagate on this topic...

 

A perfect start into weekend.

 

Link to comment

In fact, the bad blocks don't give me headaches.

I'm more worried about this log entries:

Error 418 occurred at disk power-on lifetime: 55 hours (2 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   1d+02:05:45.236  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00   1d+02:05:45.209  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00   1d+02:05:45.207  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   1d+02:05:45.194  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   1d+02:05:45.166  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] 

 

Meanwhile I tested a second drive - the same model - and it logs the same errors as this one.

 

Those ARE the bad blocks, in more detail and only the last 5.  UNC is short for UNCorrectable, so "Error: UNC at LBA = 0x0fffffff = 268435455" roughly means "bad block at 268435455".

 

You probably got a typical response, from any manufacturing rep.  But I would expect SeaTools to provide a similar report.

Link to comment

In fact, the bad blocks don't give me headaches.

I'm more worried about this log entries:

Error 418 occurred at disk power-on lifetime: 55 hours (2 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   1d+02:05:45.236  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00   1d+02:05:45.209  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00   1d+02:05:45.207  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   1d+02:05:45.194  SET FEATURES [set transfer mode]
  27 00 00 00 00 00 e0 00   1d+02:05:45.166  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] 

 

Meanwhile I tested a second drive - the same model - and it logs the same errors as this one.

 

Those ARE the bad blocks, in more detail and only the last 5.  UNC is short for UNCorrectable, so "Error: UNC at LBA = 0x0fffffff = 268435455" roughly means "bad block at 268435455".

Not this time ... Look at the fuller "picture"-- first, always be a little suspicious of numbers that are "all ones" (ie, 0x0fffffff); then look carefully at the preceding commands in the error log for the conclusive clue.

 

"The devil is in the details."

 

--UhClem

 

Link to comment

Those ARE the bad blocks, in more detail and only the last 5.  UNC is short for UNCorrectable, so "Error: UNC at LBA = 0x0fffffff = 268435455" roughly means "bad block at 268435455".

Not this time ... Look at the fuller "picture"-- first, always be a little suspicious of numbers that are "all ones" (ie, 0x0fffffff); then look carefully at the preceding commands in the error log for the conclusive clue.

 

--UhClem

 

Oops, you are absolutely right.  I didn't recognize that number in decimal.

 

Not sure what to make of it though, need a lot more context, more of the code path to here.  If you check his last SMART report, the 5 last errors show alternating like the above, then simple reads, then repeat.  If I had to guess (and that is all I can do here), I would say there is a firmware issue.  The LBA, even if it is 0x0fffffff, appears to be valid, about at the 137GB point.  But in this small context, is probably a mask, and appears to be a part of an internal reset, possibly an internal crash?  If my 'guess' is correct, then you cannot trust this drive.  UhClem, I'd like to hear your opinion.

Link to comment

Hi All, Having problem with my preclear please help. I already have 2 data drives and 1 parity in my unRaid Server. Just got the plus licensing today was started the preclear for my 4th drive. This is WD Red Nas drive. I got the following error when ran the first time

 

Jan 26 00:55:29 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Jan 26 00:55:29 Manitower kernel: ata5.00: irq_stat 0x40000008
Jan 26 00:55:29 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jan 26 00:55:29 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in
Jan 26 00:55:29 Manitower kernel:          res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F>
Jan 26 00:55:29 Manitower kernel: ata5.00: status: { DRDY ERR }
Jan 26 00:55:29 Manitower kernel: ata5.00: error: { UNC }
Jan 26 00:55:29 Manitower kernel: ata5.00: configured for UDMA/133
Jan 26 00:55:29 Manitower kernel: ata5: EH complete
Jan 26 00:55:33 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Jan 26 00:55:33 Manitower kernel: ata5.00: irq_stat 0x40000008
Jan 26 00:55:33 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jan 26 00:55:33 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in
Jan 26 00:55:33 Manitower kernel:          res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F>

 

Then I replaced the cable and then connected to a different SATA port and ran the preclear. Again within 1% I started getting errors, the speed reduced to 15-25 Mbs

 

Jan 26 01:10:00 Manitower kernel: ata5.00: irq_stat 0x40000008
Jan 26 01:10:00 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jan 26 01:10:00 Manitower kernel: ata5.00: cmd 60/40:00:c0:36:a4/00:00:03:00:00/40 tag 0 ncq 32768 in
Jan 26 01:10:00 Manitower kernel:          res 41/40:00:c0:36:a4/00:00:03:00:00/40 Emask 0x409 (media error) <F>
Jan 26 01:10:00 Manitower kernel: ata5.00: status: { DRDY ERR }
Jan 26 01:10:00 Manitower kernel: ata5.00: error: { UNC }
Jan 26 01:10:00 Manitower kernel: ata5.00: configured for UDMA/133
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Unhandled sense code
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd]  Result: hostbyte=0x00 driverbyte=0x08
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd]  Sense Key : 0x3 [current] [descriptor]
Jan 26 01:10:00 Manitower kernel: Descriptor sense data with sense descriptors (in hex):
Jan 26 01:10:00 Manitower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Jan 26 01:10:00 Manitower kernel:         03 a4 36 c0 
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd]  ASC=0x11 ASCQ=0x4
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] CDB: cdb[0]=0x28: 28 00 03 a4 36 c0 00 00 40 00
Jan 26 01:10:00 Manitower kernel: end_request: I/O error, dev sdd, sector 61093568
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636696
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636697
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636698
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636699
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636700
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636701
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636702
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636703
Jan 26 01:10:00 Manitower kernel: ata5: EH complete

 

Can some please help me, I am a newbie in unRaid and really need some help. Last 3 drive's preclear went without any issues.

 

This is the command I used

 

preclear_disk.sh -A -M 4 /dev/sda

 

I am attaching the SMART report and syslog, please help !!!!

syslog.zip

SMART_Report.txt

Link to comment

Hey!

 

Sorry to bother you guys. A friend of mine gave me a 3TB hard drive, which has abnormal SMART values. That's why I started a three-cycle-preclear to test the drive. Here is the preclear report:

========================================================================1.14
== invoked as: ./preclear_disk.sh -A -c 3 /dev/sdd
== ST3000DM001-9YN166   S1F14GED
== Disk /dev/sdd has been successfully precleared
== with a starting sector of 1 
== Ran 3 cycles
==
== Using :Read block size = 8388608 Bytes
== Last Cycle's Pre Read Time  : 6:22:11 (130 MB/s)
== Last Cycle's Zeroing time   : 5:33:40 (149 MB/s)
== Last Cycle's Post Read Time : 14:56:28 (55 MB/s)
== Last Cycle's Total Time     : 20:31:10
==
== Total Elapsed Time 68:28:52
==
== Disk Start Temperature: 26C
==
== Current Disk Temperature: 29C, 
==
============================================================================
** Changed attributes in files: /tmp/smart_start_sdd  /tmp/smart_finish_sdd
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   114      97            6        ok          60561760
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
       Reported_Uncorrect =     1       1            0        near_thresh 3320
  Airflow_Temperature_Cel =    71      74           45        In_the_past 29
      Temperature_Celsius =    29      26            0        ok          29
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 3.
0 sectors were pending re-allocation after post-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 2 of 3.
0 sectors were pending re-allocation after post-read in cycle 2 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 3 of 3.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
2136 sectors had been re-allocated before the start of the preclear.
2144 sectors are re-allocated at the end of the preclear,
    a change of 8 in the number of sectors re-allocated. 
============================================================================

 

I've attached the SMART reports below. It would be nice if someone could take a look at the reports.

 

Would you still trust the hard drive and save data on it? Or is it more a case for disposal?  :'(

Thanks in advance.  :D

 

preclear_start_SMART.txt

preclear_finish_SMART.txt

Link to comment

Hi All, Having problem with my preclear please help. I already have 2 data drives and 1 parity in my unRaid Server. Just got the plus licensing today was started the preclear for my 4th drive. This is WD Red Nas drive. I got the following error when ran the first time

...

Then I replaced the cable and then connected to a different SATA port and ran the preclear. Again within 1% I started getting errors, the speed reduced to 15-25 Mbs

 

I am attaching the SMART report and syslog, please help !!!!

 

You have a series of bad sectors on this drive, very early on it too.  The SMART report was not very useful, as it was truncated on the right side at 80 columns, cutting off the RAW numbers.  Not sure what did that.

Link to comment

Hey!

 

Sorry to bother you guys. A friend of mine gave me a 3TB hard drive, which has abnormal SMART values. That's why I started a three-cycle-preclear to test the drive. Here is the preclear report:

========================================================================1.14
== invoked as: ./preclear_disk.sh -A -c 3 /dev/sdd
== ST3000DM001-9YN166   S1F14GED
== Disk /dev/sdd has been successfully precleared
== with a starting sector of 1 
== Ran 3 cycles
==
== Using :Read block size = 8388608 Bytes
== Last Cycle's Pre Read Time  : 6:22:11 (130 MB/s)
== Last Cycle's Zeroing time   : 5:33:40 (149 MB/s)
== Last Cycle's Post Read Time : 14:56:28 (55 MB/s)
== Last Cycle's Total Time     : 20:31:10
==
== Total Elapsed Time 68:28:52
==
== Disk Start Temperature: 26C
==
== Current Disk Temperature: 29C, 
==
============================================================================
** Changed attributes in files: /tmp/smart_start_sdd  /tmp/smart_finish_sdd
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   114      97            6        ok          60561760
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
       Reported_Uncorrect =     1       1            0        near_thresh 3320
  Airflow_Temperature_Cel =    71      74           45        In_the_past 29
      Temperature_Celsius =    29      26            0        ok          29
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 3.
0 sectors were pending re-allocation after post-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 2 of 3.
0 sectors were pending re-allocation after post-read in cycle 2 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 3 of 3.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
2136 sectors had been re-allocated before the start of the preclear.
2144 sectors are re-allocated at the end of the preclear,
    a change of 8 in the number of sectors re-allocated. 
============================================================================

 

I've attached the SMART reports below. It would be nice if someone could take a look at the reports.

 

Would you still trust the hard drive and save data on it? Or is it more a case for disposal?  :'(

Thanks in advance.  :D

 

I would run another Preclear or 2 on it, check for additional reallocated sectors.  If you can obtain clean results, no further adverse numbers, then it is usable.  With that many reallocated sectors, I know that some users would prefer to reserve the drive for secondary uses, such as holding backups.

Link to comment

Thanks for the reply RobJ, I am attaching the report again, can you please check and let me know if I have RMA the drive. Can you also please how you came this conculsion, I have been reading the the SMART report but still not able to understand it.

 

root@Manitower:~# smartctl -a /dev/sdd
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WCC300664700
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   9
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Jan 26 12:13:44 2014 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (27540) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   155   081   051    Pre-fail  Always       -       1164
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       5
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       4
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1
194 Temperature_Celsius     0x0022   119   118   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       81
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Link to comment

Thanks for the reply RobJ, I am attaching the report again, can you please check and let me know if I have RMA the drive. Can you also please how you came this conculsion, I have been reading the the SMART report but still not able to understand it.

 

root@Manitower:~# smartctl -a /dev/sdd
...
Device Model:     WDC WD20EFRX-68AX9N0
...
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       81
...

 

I was too tired to take much time with it, sorry.  This SMART report is intact, thanks.  The line above shows 81 Current Pending sectors, a very ominous sign, especially when you only have 5 operational hours on the drive.  It means it has already found 81 sectors that are probably bad.  As near as I can tell, you have started Preclear 3 times, then aborted it quite early, probably because of the errors and how long it was taking, but these short passes make this even more ominous, in that you shouldn't find even one error over the entire drive, and you found 81 in just the first 1 or 2 percent.

 

What I based my opinion on was the syslog you attached and the 2 syslog excerpts you posted.  They all show a series of errors logged by the exception handler.  All of them are noted as 'media error' which means a problem with a physical sector on the drive surface.  More specifically, the error flag raised for each of those sectors is 'UNC' (short for 'UNCorrectable'), which means the sector was found to be corrupted so much that even the embedded error correction info could not fix it.  Because these first Preclear passes are just read passes, we CANNOT conclude for sure that the drive is bad yet, until the drive attempts to fix them, by rewriting them correctly.  At that point, the drive will determine if the magnetic media under the sector is good or bad, and either return the sector to service, validly rewritten, or remap it elsewhere (as a reallocated sector).  The drive MAY be bad, but the SMART report is not showing any mechanical issues, so far, so it's possible the magnetic surface is good but has been scrambled some how???  Not likely, but possible.

 

An immediate zeroing pass would probably be a good next step, skipping the Preclear Pre-Read, and forcing writes to all sectors.  It should rather quickly help you decide if the drive is worth further effort or not.  Syntax I believe would be "preclear_disk.sh -W /dev/sdd".

Link to comment

Thanks a lot RobJ, this is very informative, thank you for taking time to write such a detailed explanation. Meanwhile I took HDD out of the array and dropped it in my windows 7 PC and ran HD tune. Not sure if this is a good idea but as you predicated it showed bunch of bad sectors at the very beginning. It has around 0.6% of damaged blocks as per HD tune. Should I go ahead and RMA the drive or run zero pass preclear?.....

Link to comment

Hi All, Having problem with my preclear please help. I already have 2 data drives and 1 parity in my unRaid Server. Just got the plus licensing today was started the preclear for my 4th drive. This is WD Red Nas drive. I got the following error when ran the first time

 

Jan 26 00:55:29 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Jan 26 00:55:29 Manitower kernel: ata5.00: irq_stat 0x40000008
Jan 26 00:55:29 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jan 26 00:55:29 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in
Jan 26 00:55:29 Manitower kernel:          res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F>
Jan 26 00:55:29 Manitower kernel: ata5.00: status: { DRDY ERR }
Jan 26 00:55:29 Manitower kernel: ata5.00: error: { UNC }
Jan 26 00:55:29 Manitower kernel: ata5.00: configured for UDMA/133
Jan 26 00:55:29 Manitower kernel: ata5: EH complete
Jan 26 00:55:33 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Jan 26 00:55:33 Manitower kernel: ata5.00: irq_stat 0x40000008
Jan 26 00:55:33 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jan 26 00:55:33 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in
Jan 26 00:55:33 Manitower kernel:          res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F>

 

Then I replaced the cable and then connected to a different SATA port and ran the preclear. Again within 1% I started getting errors, the speed reduced to 15-25 Mbs

 

Jan 26 01:10:00 Manitower kernel: ata5.00: irq_stat 0x40000008
Jan 26 01:10:00 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jan 26 01:10:00 Manitower kernel: ata5.00: cmd 60/40:00:c0:36:a4/00:00:03:00:00/40 tag 0 ncq 32768 in
Jan 26 01:10:00 Manitower kernel:          res 41/40:00:c0:36:a4/00:00:03:00:00/40 Emask 0x409 (media error) <F>
Jan 26 01:10:00 Manitower kernel: ata5.00: status: { DRDY ERR }
Jan 26 01:10:00 Manitower kernel: ata5.00: error: { UNC }
Jan 26 01:10:00 Manitower kernel: ata5.00: configured for UDMA/133
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Unhandled sense code
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd]  Result: hostbyte=0x00 driverbyte=0x08
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd]  Sense Key : 0x3 [current] [descriptor]
Jan 26 01:10:00 Manitower kernel: Descriptor sense data with sense descriptors (in hex):
Jan 26 01:10:00 Manitower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Jan 26 01:10:00 Manitower kernel:         03 a4 36 c0 
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd]  ASC=0x11 ASCQ=0x4
Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] CDB: cdb[0]=0x28: 28 00 03 a4 36 c0 00 00 40 00
Jan 26 01:10:00 Manitower kernel: end_request: I/O error, dev sdd, sector 61093568
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636696
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636697
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636698
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636699
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636700
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636701
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636702
Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636703
Jan 26 01:10:00 Manitower kernel: ata5: EH complete

 

Can some please help me, I am a newbie in unRaid and really need some help. Last 3 drive's preclear went without any issues.

 

This is the command I used

 

preclear_disk.sh -A -M 4 /dev/sda

 

I am attaching the SMART report and syslog, please help !!!!

UNC = un-correctable

media-error = un-readable sector (sector contents do not match affiliated checksum at end of sector)

 

let the preclear continue.

Link to comment

Hi Joe,

Thank you for the latest check on my drivers.

Now I have precleared my WD AV-GP 2TB disk and have a question about ATA Errors. I guess the smart attributes looks OK? There are no pending sectors to be re-allocated. 

But the preclear start and finish reports have ATA Errors. As far I could find out, these are cable problems, right? Both Errors occurred on 159 day of use, the disk has currently power on of 408 days, which would mean that the errors occurred quite time ago (where the disk was used in my htpc).

What I don’t understand is:

 

SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:22:17.056  SMART WRITE LOG
  ec 00 00 00 00 00 a0 00      00:22:17.054  IDENTIFY DEVICE
  ef 02 00 00 00 00 00 00      00:22:16.633  SET FEATURES [Enable write cache]
  ec 00 00 00 00 00 a0 00      00:22:16.630  IDENTIFY DEVICE
  b0 d6 01 e0 4f c2 a0 00      00:22:06.756  SMART WRITE LOG

Error 1 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:22:06.756  SMART WRITE LOG
  ec 00 00 00 00 00 a0 00      00:22:06.752  IDENTIFY DEVICE
  ef 02 00 00 00 00 00 00      00:22:06.722  SET FEATURES [Enable write cache]
  ec 00 00 00 00 00 a0 00      00:22:06.719  IDENTIFY DEVICE
  ec 00 00 00 00 00 a0 00      00:21:35.933  IDENTIFY DEVICE

 

What does mean: Error: ABRT?

Attached start, finish and rpt files. Thanks!

WD_AV-GP_2TB.zip

Link to comment

Hi Joe,

Thank you for the latest check on my drivers.

Now I have precleared my WD AV-GP 2TB disk and have a question about ATA Errors. I guess the smart attributes looks OK? There are no pending sectors to be re-allocated. 

But the preclear start and finish reports have ATA Errors. As far I could find out, these are cable problems, right? Both Errors occurred on 159 day of use, the disk has currently power on of 408 days, which would mean that the errors occurred quite time ago (where the disk was used in my htpc).

What I don’t understand is:

 

SMART Error Log Version: 1
ATA Error Count: 2
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:22:17.056  SMART WRITE LOG
  ec 00 00 00 00 00 a0 00      00:22:17.054  IDENTIFY DEVICE
  ef 02 00 00 00 00 00 00      00:22:16.633  SET FEATURES [Enable write cache]
  ec 00 00 00 00 00 a0 00      00:22:16.630  IDENTIFY DEVICE
  b0 d6 01 e0 4f c2 a0 00      00:22:06.756  SMART WRITE LOG

Error 1 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:22:06.756  SMART WRITE LOG
  ec 00 00 00 00 00 a0 00      00:22:06.752  IDENTIFY DEVICE
  ef 02 00 00 00 00 00 00      00:22:06.722  SET FEATURES [Enable write cache]
  ec 00 00 00 00 00 a0 00      00:22:06.719  IDENTIFY DEVICE
  ec 00 00 00 00 00 a0 00      00:21:35.933  IDENTIFY DEVICE

 

What does mean: Error: ABRT?

Attached start, finish and rpt files. Thanks!

 

I did some research, and found only 2 possibilities - the drive SMART firmware tried to write to the SMART log but SMART was not enabled at that instant (possibly at drive startup), or there is a bug in the SMART firmware on that drive.  Not something to worry about, as it only happened once (the second is a retry of the first), and that was a long time ago.

Link to comment
  • 2 weeks later...

I had a Red BAll on a 2Tb WD Red drive this week which was succesfully replaced and all is well with the array.

 

However I have now ran Pre-Clear on the drive that had had read errors for 4 passes with the following results, does that make it safe to resuse in the array?

 

 

================================================================== 1.13

=                unRAID server Pre-Clear disk /dev/sda

=              cycle 3 of 3, partition start on sector 64

=

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Verifying if the MBR is cleared.              DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 19C, Elapsed Time:  53:14:13

========================================================================1.13

==  WDC WD20EFRX-68AX9N0    WD-WMC301458571

== Disk /dev/sda has been successfully precleared

== with a starting sector of 64

============================================================================

** Changed attributes in files: /tmp/smart_start_sda  /tmp/smart_finish_sda

                ATTRIBUTE  NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VA

LUE

      Temperature_Celsius =  128    127            0        ok          19

No SMART attributes are FAILING_NOW

 

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 3.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 3.

0 sectors were pending re-allocation after post-read in cycle 1 of 3.

0 sectors were pending re-allocation after zero of disk in cycle 2 of 3.

0 sectors were pending re-allocation after post-read in cycle 2 of 3.

0 sectors were pending re-allocation after zero of disk in cycle 3 of 3.

0 sectors are pending re-allocation at the end of the preclear,

    the number of sectors pending re-allocation did not change.

0 sectors had been re-allocated before the start of the preclear.

0 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

root@Tower:/boot#

 

****************

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.