Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

drive stripe errors. failing disc?

Featured Replies

one of my drives keeps getting errors in the system log. the smart report seems fine i think. i never quite figured out how to do the long smart test though.

i attached the syslog

 

 

and here is the smart report:

 smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2002FYPS-01U1B0
Serial Number:    WD-WCAVY0105385
Firmware Version: 04.05G04
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Nov 18 16:28:20 2009 GMT+8
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (  41)	The self-test routine was interrupted
				by the host with a hard or soft reset.
Total time to complete Offline 
data collection: 		 (40200) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x303f)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
 3 Spin_Up_Time            0x0027   157   155   021    Pre-fail  Always       -       9108
 4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       267
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
 9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       3706
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       52
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       39
193 Load_Cycle_Count        0x0032   175   175   000    Old_age   Always       -       75837
194 Temperature_Celsius     0x0022   121   100   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       185
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       116
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       3
200 Multi_Zone_Error_Rate   0x0008   195   193   000    Old_age   Offline      -       1176

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      90%      3634         -
# 2  Short offline       Interrupted (host reset)      10%      3596         -
# 3  Short offline       Aborted by host               10%      3596         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

i frequently get network access errors when trying to copy files. i hope its not failing as its a newer disc. and an enterprise storage one at that. any ideas would be greatly appreciated.

 

 

 

The "smart" report shows the drive is probably just fine.  The errors seem to be in communicating with the drive.

 

To do a "long" test you will need to disable the spin-down for that drive, as the long test will probably be aborted if unRAID issues a spin-down command while it is running for several hours or more.

 

Once spin-down is disabled, type:

smartctl -t long /dev/sdX

 

where sdX = the device for your disk.

 

Then, wait for the recommended time it showed as the "Extended self-test" polling interval (Your SMART report showed 255 minutes)

 

Then, just get another SMART status report.

 

The section at the bottom looking like this will let you know if it completed or is still running.  It will also let you know of the result.

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Interrupted (host reset)      90%      3634        -

# 2  Short offline      Interrupted (host reset)      10%      3596        -

# 3  Short offline      Aborted by host              10%      3596        -

It seems all the tests you requested have been aborted before they were completed so far. 

 

The lines in the SMART report you will be looking for looks like this:

[b]SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
[color=blue]# 1  Extended offline    Completed without error       00%     10537         -[/color]
# 2  Short offline       Completed without error       00%      8589         -
[/b]

Problems like you are having are frequently caused by cabling...

 

Loose connections, poor quality SATA cables and power splitters are frequently the problem.  Start there.  First,  Re-seat them, or replace them.  If you don't have a spare cable, swap the one on the bad disk with one on a disk that has no errors.  If errors continues, try putting the disk on a different port on the disk controller.  (swap it with a disk that is working)  The array will notice you swapped them and reset its display once you confirm it should start the array.  It could be a bad port on the controller card. 

 

Joe L.

Joe helps so many users, he can't see everything!  It looks like a couple of problems here, with Disk 10.  In addition to what Joe said about the communications problem, there appears to have been a bad sector found too, that caused errors, plus the SMART report is reporting a Current_Pending_Sector count of 185.  The SMART long test is a good idea.

 

The communications problem is a little strange, new to me, does not raise any of the usual communications error flags.  It consistently raised only the SATA error flag UnrecovData, which means a data integrity error that could not be recovered from.  I believe the kernel repeated the operation a number of times, with the same error flag returned, before giving up, and then unRAID reports read errors because of the failed I/O.

 

I note that this is a modern WD 2TB drive attached to an older controller, a Promise combo card that supports both IDE and SATA150 ports.  I wonder if it would be safer to swap this drive with an older drive (such as the WD 400GB), of the same generation as the Promise card.  I *think* it is this Promise card that is reacting with the UnrecovData error flag, for unknown reasons.  It was designed long before these new MUCH larger drives, and before SATA II.

I forgot to mention that I don't see any network issues, to explain the network access errors you mentioned.  However, you are operating at 100Mbps, which will be significantly slower than gigabit speed.  The kernel found 2 network chipsets, a Yukon one first (using the skge driver) that you are using at 100Mbps, and then a Realtek gigabit chipset, which uses the r8169 driver.

  • Author

I forgot to mention that I don't see any network issues, to explain the network access errors you mentioned.  However, you are operating at 100Mbps, which will be significantly slower than gigabit speed.  The kernel found 2 network chipsets, a Yukon one first (using the skge driver) that you are using at 100Mbps, and then a Realtek gigabit chipset, which uses the r8169 driver.

 

yes. i have been trying (somewhat half assedly) to get gigabit speeds. but i dont know enough about unraid to do that. i did notice that drive 10 is the only sata drive plugged into that sata controller. this hardware is so old and been repaired so much. i tried a different controller but the bios doesnt want to see it. i think its about had it. ive been thinking about upgrading so now seems like a good time. ill post back when the parts arrive. i appreciate all the help.  ;)

  • 2 weeks later...
  • Author

ok so i changed the cables and got all new hardware. still giving read errors. here is the latest long smart report:

 smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2002FYPS-01U1B0
Serial Number:    WD-WCAVY0105385
Firmware Version: 04.05G04
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Dec  4 12:59:54 2009 GMT+8
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
				was suspended by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (40200) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x303f)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   156   155   021    Pre-fail  Always       -       9158
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       304
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       3769
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       79
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       59
193 Load_Cycle_Count        0x0032   175   175   000    Old_age   Always       -       75977
194 Temperature_Celsius     0x0022   129   100   000    Old_age   Always       -       23
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       188
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       100
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       3
200 Multi_Zone_Error_Rate   0x0008   195   193   000    Old_age   Offline      -       1182

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      3758         38393859
# 2  Extended offline    Interrupted (host reset)      90%      3634         -
# 3  Short offline       Interrupted (host reset)      10%      3596         -
# 4  Short offline       Aborted by host               10%      3596         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.