November 19, 200916 yr one of my drives keeps getting errors in the system log. the smart report seems fine i think. i never quite figured out how to do the long smart test though. i attached the syslog and here is the smart report: smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD2002FYPS-01U1B0 Serial Number: WD-WCAVY0105385 Firmware Version: 04.05G04 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Nov 18 16:28:20 2009 GMT+8 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 41) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: (40200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 1 3 Spin_Up_Time 0x0027 157 155 021 Pre-fail Always - 9108 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 267 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 3706 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 52 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 39 193 Load_Cycle_Count 0x0032 175 175 000 Old_age Always - 75837 194 Temperature_Celsius 0x0022 121 100 000 Old_age Always - 31 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 185 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 116 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 3 200 Multi_Zone_Error_Rate 0x0008 195 193 000 Old_age Offline - 1176 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Interrupted (host reset) 90% 3634 - # 2 Short offline Interrupted (host reset) 10% 3596 - # 3 Short offline Aborted by host 10% 3596 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. i frequently get network access errors when trying to copy files. i hope its not failing as its a newer disc. and an enterprise storage one at that. any ideas would be greatly appreciated.
November 19, 200916 yr The "smart" report shows the drive is probably just fine. The errors seem to be in communicating with the drive. To do a "long" test you will need to disable the spin-down for that drive, as the long test will probably be aborted if unRAID issues a spin-down command while it is running for several hours or more. Once spin-down is disabled, type: smartctl -t long /dev/sdX where sdX = the device for your disk. Then, wait for the recommended time it showed as the "Extended self-test" polling interval (Your SMART report showed 255 minutes) Then, just get another SMART status report. The section at the bottom looking like this will let you know if it completed or is still running. It will also let you know of the result. SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Interrupted (host reset) 90% 3634 - # 2 Short offline Interrupted (host reset) 10% 3596 - # 3 Short offline Aborted by host 10% 3596 - It seems all the tests you requested have been aborted before they were completed so far. The lines in the SMART report you will be looking for looks like this: [b]SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error [color=blue]# 1 Extended offline Completed without error 00% 10537 -[/color] # 2 Short offline Completed without error 00% 8589 - [/b] Problems like you are having are frequently caused by cabling... Loose connections, poor quality SATA cables and power splitters are frequently the problem. Start there. First, Re-seat them, or replace them. If you don't have a spare cable, swap the one on the bad disk with one on a disk that has no errors. If errors continues, try putting the disk on a different port on the disk controller. (swap it with a disk that is working) The array will notice you swapped them and reset its display once you confirm it should start the array. It could be a bad port on the controller card. Joe L.
November 19, 200916 yr Joe helps so many users, he can't see everything! It looks like a couple of problems here, with Disk 10. In addition to what Joe said about the communications problem, there appears to have been a bad sector found too, that caused errors, plus the SMART report is reporting a Current_Pending_Sector count of 185. The SMART long test is a good idea. The communications problem is a little strange, new to me, does not raise any of the usual communications error flags. It consistently raised only the SATA error flag UnrecovData, which means a data integrity error that could not be recovered from. I believe the kernel repeated the operation a number of times, with the same error flag returned, before giving up, and then unRAID reports read errors because of the failed I/O. I note that this is a modern WD 2TB drive attached to an older controller, a Promise combo card that supports both IDE and SATA150 ports. I wonder if it would be safer to swap this drive with an older drive (such as the WD 400GB), of the same generation as the Promise card. I *think* it is this Promise card that is reacting with the UnrecovData error flag, for unknown reasons. It was designed long before these new MUCH larger drives, and before SATA II.
November 19, 200916 yr I forgot to mention that I don't see any network issues, to explain the network access errors you mentioned. However, you are operating at 100Mbps, which will be significantly slower than gigabit speed. The kernel found 2 network chipsets, a Yukon one first (using the skge driver) that you are using at 100Mbps, and then a Realtek gigabit chipset, which uses the r8169 driver.
November 22, 200916 yr Author I forgot to mention that I don't see any network issues, to explain the network access errors you mentioned. However, you are operating at 100Mbps, which will be significantly slower than gigabit speed. The kernel found 2 network chipsets, a Yukon one first (using the skge driver) that you are using at 100Mbps, and then a Realtek gigabit chipset, which uses the r8169 driver. yes. i have been trying (somewhat half assedly) to get gigabit speeds. but i dont know enough about unraid to do that. i did notice that drive 10 is the only sata drive plugged into that sata controller. this hardware is so old and been repaired so much. i tried a different controller but the bios doesnt want to see it. i think its about had it. ive been thinking about upgrading so now seems like a good time. ill post back when the parts arrive. i appreciate all the help.
December 5, 200916 yr Author ok so i changed the cables and got all new hardware. still giving read errors. here is the latest long smart report: smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD2002FYPS-01U1B0 Serial Number: WD-WCAVY0105385 Firmware Version: 04.05G04 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Dec 4 12:59:54 2009 GMT+8 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 1 3 Spin_Up_Time 0x0027 156 155 021 Pre-fail Always - 9158 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 304 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 3769 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 79 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 59 193 Load_Cycle_Count 0x0032 175 175 000 Old_age Always - 75977 194 Temperature_Celsius 0x0022 129 100 000 Old_age Always - 23 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 188 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 100 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 3 200 Multi_Zone_Error_Rate 0x0008 195 193 000 Old_age Offline - 1182 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 3758 38393859 # 2 Extended offline Interrupted (host reset) 90% 3634 - # 3 Short offline Interrupted (host reset) 10% 3596 - # 4 Short offline Aborted by host 10% 3596 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Archived
This topic is now archived and is closed to further replies.