Posted September 5, 201113 yr Hi there. I've had a new unRAID server up and running for 2 days. It contains 5 x 2TB Seagate LP drives. No cache drive (as yet). These drives are various vintages, salvaged from a bunch of Seagate Expansion USB drives I had laying around. Copying media over onto the server went fine up until the drives got to around 18% capacity (I'm splitting ISO files equally across the disks) when suddenly my file copying failed and I noticed a bunch of error messages in the console. Investigating the Syslog, I see lots of the following (full file attached): Sep 5 19:02:37 UNRAID-01 kernel: md: disk4 read error Sep 5 19:02:37 UNRAID-01 kernel: handle_stripe read error: 30064/4, count: 1 Screenshot from unMENU: A SMART status report for disk4 gives the status listed at the bottom (sorry but this means nothing to me). Any suggestions please? Is it definately Disk 4 (ST32000540AS_9WM037SE) which is at fault? Should I just replace it, or is it worth running more tests? Should I run a parity check? If I should replace it, can you send me to a link describing the process? Many thanks. Statistics for /dev/sdd ST32000540AS_9WM037SE smartctl -a -d ata /dev/sdd smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: ST32000540AS Serial Number: 9WM037SE Firmware Version: CC83 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Sep 5 19:24:45 2011 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 609) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 094 090 006 Pre-fail Always - 92471940 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 093 093 020 Old_age Always - 7686 5 Reallocated_Sector_Ct 0x0033 074 074 036 Pre-fail Always - 1080 7 Seek_Error_Rate 0x000f 037 037 030 Pre-fail Always - 14499830388675 9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 12328 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 37 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 150 188 Command_Timeout 0x0032 100 001 000 Old_age Always - 8989503719470 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 062 030 045 Old_age Always In_the_past 38 (9 169 38 26) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 13 193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 7694 194 Temperature_Celsius 0x0022 038 070 000 Old_age Always - 38 (0 9 0 0) 195 Hardware_ECC_Recovered 0x001a 049 026 000 Old_age Always - 92471940 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 195579925760904 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 3018733366 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1448457851 SMART Error Log Version: 1 ATA Error Count: 108 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 108 occurred at disk power-on lifetime: 12328 hours (513 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 1f 72 00 00 Error: UNC at LBA = 0x0000721f = 29215 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 07 72 00 e0 00 1d+22:03:36.879 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+22:03:36.878 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+22:03:36.876 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+22:03:36.876 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+22:03:36.854 READ NATIVE MAX ADDRESS EXT Error 107 occurred at disk power-on lifetime: 12328 hours (513 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 1f 72 00 00 Error: UNC at LBA = 0x0000721f = 29215 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 07 72 00 e0 00 1d+22:03:34.065 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+22:03:34.063 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+22:03:34.062 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+22:03:34.061 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+22:03:34.040 READ NATIVE MAX ADDRESS EXT Error 106 occurred at disk power-on lifetime: 12328 hours (513 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 1f 72 00 00 Error: UNC at LBA = 0x0000721f = 29215 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 07 72 00 e0 00 1d+22:03:29.982 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+22:03:29.960 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+22:03:29.959 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+22:03:29.875 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+22:03:29.874 READ NATIVE MAX ADDRESS EXT Error 105 occurred at disk power-on lifetime: 12328 hours (513 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 1f 72 00 00 Error: UNC at LBA = 0x0000721f = 29215 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 07 72 00 e0 00 1d+22:03:25.405 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+22:03:25.403 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+22:03:25.402 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+22:03:25.401 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+22:03:25.380 READ NATIVE MAX ADDRESS EXT Error 104 occurred at disk power-on lifetime: 12328 hours (513 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 1f 72 00 00 Error: UNC at LBA = 0x0000721f = 29215 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 07 72 00 e0 00 1d+22:03:18.876 READ DMA EXT 27 00 00 00 00 00 e0 00 1d+22:03:18.875 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 1d+22:03:18.873 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+22:03:18.873 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+22:03:18.852 READ NATIVE MAX ADDRESS EXT SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. syslog-2011-09-05_nopwd.txt
September 6, 201113 yr Author replace the drive, there is a very high number of reallocated sectors. I have another drive in the array with even more (1318) - but that one's not giving any read errors yet. I think I'll replace drive 4 and see what happens with the other. Thanks.
Archived
This topic is now archived and is closed to further replies.