Jump to content

To be not defective or not?


opentoe

Recommended Posts

I have a hard drive that was getting read errors in my syslog. The drive is only several months old. So, I took out the drive and installed it in a Windows computer. I did a 36 hours break-in test and a full surface scan. The break-in was a constant read/write. Of course prior to all this I confirmed the SATA cable was good and my Seasonic power supply in in working order. All the tests I performed on this drive passed with flying colors. Why would unraid keep giving me read errors? Also, if it is NOT a hardware issue does that mean unraid is just producing corrupt files or something? I'm just trying to get to the bottom of this.

 

Is there a similar CHKDSK operation that can be done with unraid? I remember doing a resiserchk or something like that previously but last time I did that I lost some important files somehow. I don't want to do that again if I don't have to.

 

 

Link to comment

I have a hard drive that was getting read errors in my syslog. The drive is only several months old. So, I took out the drive and installed it in a Windows computer. I did a 36 hours break-in test and a full surface scan. The break-in was a constant read/write. Of course prior to all this I confirmed the SATA cable was good and my Seasonic power supply in in working order. All the tests I performed on this drive passed with flying colors. Why would unraid keep giving me read errors? Also, if it is NOT a hardware issue does that mean unraid is just producing corrupt files or something? I'm just trying to get to the bottom of this.

 

Is there a similar CHKDSK operation that can be done with unraid? I remember doing a resiserchk or something like that previously but last time I did that I lost some important files somehow. I don't want to do that again if I don't have to.

 

 

read errors are frequently bad sectors on a disk.

 

Have you gotten a SMART report on the disk?  Does it show any re-allocated sectors? or sectors pending re-allocation?

 

If the file system is corrupted, then you MUST run reiserfsck to fix it, or it will just keep corrupting itself even worse.

 

The equivalent to CHKDISK in unRAID is

reiserfsck --check /dev/sdX1

 

you also must look at

smartctl -a /dev/sdX

 

to learn the true health of the drive.

But... since you seem to like windows tools and how they keep you in the dark, have fun....  

 

Joe L.

 

Link to comment

There is def something going on with this drive while being used in unraid. I can't even run a reiserfsck, just comes back and tells me there is no super-block found. When I try to rebuild the super-block it keeps telling me I have the wrong version of reiserfsck or something.

 

smartctl -a -d ata /dev/sdh

smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Device Model:    WDC WD2002FAEX-007BA0

Serial Number:    WD-WMAY01287329

Firmware Version: 05.01D05

User Capacity:    2,000,398,934,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed Oct  5 00:07:04 2011 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 118) The previous self-test completed having

the read element of the test failed.

Total time to complete Offline

data collection: (30480) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x3037) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      3

  3 Spin_Up_Time            0x0027  253  253  021    Pre-fail  Always      -      8858

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      22

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  094  094  000    Old_age  Always      -      5073

10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      21

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      2

193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      19

194 Temperature_Celsius    0x0022  116  105  000    Old_age  Always      -      36

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      14

 

SMART Error Log Version: 1

ATA Error Count: 44 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

 

Error 44 occurred at disk power-on lifetime: 4931 hours (205 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 50 c8 e7 00 e0  Error: UNC 80 sectors at LBA = 0x0000e7c8 = 59336

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 50 87 e7 00 e0 08  30d+05:19:51.557  READ DMA

  ef 10 02 00 00 00 a0 08  30d+05:19:51.557  SET FEATURES [Reserved for Serial ATA]

  ec 00 00 00 00 00 a0 08  30d+05:19:51.553  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 08  30d+05:19:51.553  SET FEATURES [set transfer mode]

 

Error 43 occurred at disk power-on lifetime: 4931 hours (205 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 50 ca e7 00 e0  Error: UNC 80 sectors at LBA = 0x0000e7ca = 59338

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 50 87 e7 00 e0 08  30d+05:19:49.400  READ DMA

  ef 10 02 00 00 00 a0 08  30d+05:19:49.400  SET FEATURES [Reserved for Serial ATA]

  ec 00 00 00 00 00 a0 08  30d+05:19:49.396  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 08  30d+05:19:49.396  SET FEATURES [set transfer mode]

 

Error 42 occurred at disk power-on lifetime: 4931 hours (205 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 50 c8 e7 00 e0  Error: UNC 80 sectors at LBA = 0x0000e7c8 = 59336

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 50 87 e7 00 e0 08  30d+05:19:47.677  READ DMA

  ef 10 02 00 00 00 a0 08  30d+05:19:47.677  SET FEATURES [Reserved for Serial ATA]

  ec 00 00 00 00 00 a0 08  30d+05:19:47.673  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 08  30d+05:19:47.673  SET FEATURES [set transfer mode]

 

Error 41 occurred at disk power-on lifetime: 4931 hours (205 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 50 c8 e7 00 e0  Error: UNC 80 sectors at LBA = 0x0000e7c8 = 59336

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 50 87 e7 00 e0 08  30d+05:19:45.954  READ DMA

  ef 10 02 00 00 00 a0 08  30d+05:19:45.954  SET FEATURES [Reserved for Serial ATA]

  ec 00 00 00 00 00 a0 08  30d+05:19:45.950  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 08  30d+05:19:45.950  SET FEATURES [set transfer mode]

 

Error 40 occurred at disk power-on lifetime: 4931 hours (205 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 50 c8 e7 00 e0  Error: UNC 80 sectors at LBA = 0x0000e7c8 = 59336

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 50 87 e7 00 e0 08  30d+05:19:44.233  READ DMA

  ef 10 02 00 00 00 a0 08  30d+05:19:44.231  SET FEATURES [Reserved for Serial ATA]

  ec 00 00 00 00 00 a0 08  30d+05:19:44.227  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 08  30d+05:19:44.227  SET FEATURES [set transfer mode]

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed: read failure      60%      4931        59336

# 2  Short offline      Completed: read failure      60%      4931        59336

# 3  Short offline      Completed without error      00%      1630        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Link to comment

There is def something going on with this drive while being used in unraid. I can't even run a reiserfsck, just comes back and tells me there is no super-block found. When I try to rebuild the super-block it keeps telling me I have the wrong version of reiserfsck or something.

What device are you running reiserfsck on?

 

You must run it on the first partition, not the entire drive

reiserfsck --check /dev/sdh1

(note the "1" on the end of the device name, indicating partition 1)

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...