September 29, 20169 yr Hey, I've started finding the server unresponsive more and more often, and I'm having to do a hard reset as I can't remote into it. I also seem to get lag when streaming to Kodi, it freezes for about 10 secs then seems to fast forward to the relevant point playing everything at a high speed. It's been happening more and more. I've attached the diagnostics file, usually I get nothing in the syslog, but this time it is showing disk 6 read errors. This is one of the older drives in the array (circa 5-6 years) though there are older. If it's this one going would it effect everything though? Disk 6 is WDC_WD20EARX-00PASB0_WD-WCAZAH738536-20160929-1125.txt smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.7-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD20EARX-00PASB0 Serial Number: WD-WCAZAH738536 LU WWN Device Id: 5 0014ee 2b209029b Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Sep 29 11:30:52 2016 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (38580) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 372) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 30228 3 Spin_Up_Time 0x0027 164 161 021 Pre-fail Always - 6800 4 Start_Stop_Count 0x0032 096 096 000 Old_age Always - 4460 5 Reallocated_Sector_Ct 0x0033 145 145 140 Pre-fail Always - 2327 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 059 059 000 Old_age Always - 29996 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 117 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 46 193 Load_Cycle_Count 0x0032 171 171 000 Old_age Always - 89615 194 Temperature_Celsius 0x0022 115 099 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 434 197 Current_Pending_Sector 0x0032 200 198 000 Old_age Always - 174 198 Offline_Uncorrectable 0x0030 198 198 000 Old_age Offline - 754 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 162 162 000 Old_age Offline - 10145 SMART Error Log Version: 1 ATA Error Count: 7 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 7 occurred at disk power-on lifetime: 29909 hours (1246 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 90 0b 67 ea Error: UNC 8 sectors at LBA = 0x0a670b90 = 174525328 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 90 0b 67 ea 08 00:50:57.339 READ DMA c8 00 08 f8 2f 66 ea 08 00:50:55.490 READ DMA c8 00 08 48 3b 64 ea 08 00:50:54.319 READ DMA c8 00 08 b8 3f 64 ea 08 00:50:52.490 READ DMA c8 00 08 e8 cb 64 ea 08 00:50:51.089 READ DMA Error 6 occurred at disk power-on lifetime: 29438 hours (1226 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 a8 09 28 ef Error: UNC 8 sectors at LBA = 0x0f2809a8 = 254282152 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 a8 09 28 ef 08 01:24:24.444 READ DMA ca 00 08 a0 09 28 ef 08 01:24:24.444 WRITE DMA ef 10 02 00 00 00 a0 08 01:24:24.444 SET FEATURES [Enable SATA feature] ec 00 00 00 00 00 a0 08 01:24:24.438 IDENTIFY DEVICE Error 5 occurred at disk power-on lifetime: 29438 hours (1226 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 a0 09 28 ef Error: UNC 8 sectors at LBA = 0x0f2809a0 = 254282144 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 a0 09 28 ef 08 01:24:21.168 READ DMA c8 00 08 98 09 28 ef 08 01:24:21.168 READ DMA c8 00 08 90 09 28 ef 08 01:24:21.168 READ DMA c8 00 08 88 09 28 ef 08 01:24:21.168 READ DMA c8 00 08 80 09 28 ef 08 01:24:21.168 READ DMA Error 4 occurred at disk power-on lifetime: 29437 hours (1226 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 28 90 76 2f e2 Error: UNC 40 sectors at LBA = 0x022f7690 = 36664976 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 28 90 76 2f e2 08 00:11:46.743 READ DMA ca 00 08 88 6f 2f e2 08 00:11:46.742 WRITE DMA ca 00 08 90 6f 2f e2 08 00:11:46.742 WRITE DMA ca 00 08 98 6f 2f e2 08 00:11:46.742 WRITE DMA ca 00 08 a0 6f 2f e2 08 00:11:46.742 WRITE DMA Error 3 occurred at disk power-on lifetime: 28284 hours (1178 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 10 88 0d 6f ef Error: UNC 16 sectors at LBA = 0x0f6f0d88 = 258936200 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 10 88 0d 6f ef 00 9d+05:21:54.038 READ DMA c8 00 08 80 0d 6f ef 00 9d+05:21:54.038 READ DMA c8 00 08 78 0d 6f ef 00 9d+05:21:54.038 READ DMA c8 00 08 70 0d 6f ef 00 9d+05:21:54.038 READ DMA c8 00 08 68 0d 6f ef 00 9d+05:21:54.038 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 14007 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Disk 3 WDC_WD20EARS-00MVWB0_WD-WMAZA3020880-20160929-1130 is showing some errors but again not in the syslog! smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.7-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF) Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WMAZA3020880 LU WWN Device Id: 5 0014ee 600b6eb71 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Thu Sep 29 11:30:52 2016 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (35760) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 345) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3101 3 Spin_Up_Time 0x0027 186 171 021 Pre-fail Always - 5683 4 Start_Stop_Count 0x0032 095 095 000 Old_age Always - 5037 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 037 037 000 Old_age Always - 46487 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 249 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 82 193 Load_Cycle_Count 0x0032 143 143 000 Old_age Always - 172365 194 Temperature_Celsius 0x0022 114 102 000 Old_age Always - 36 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 2 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 46 SMART Error Log Version: 1 ATA Error Count: 1 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 occurred at disk power-on lifetime: 45415 hours (1892 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 68 30 da ee Error: UNC 8 sectors at LBA = 0x0eda3068 = 249180264 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 68 30 da ee 08 3d+16:45:45.619 READ DMA c8 00 08 60 30 da ee 08 3d+16:45:45.619 READ DMA c8 00 08 58 30 da ee 08 3d+16:45:45.619 READ DMA c8 00 08 50 30 da ee 08 3d+16:45:45.619 READ DMA c8 00 08 48 30 da ee 08 3d+16:45:45.619 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 30488 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Can anyone please help me decode these? I'm assuming it means the 2 older drives are on the way out, Disk 6 being more terminal. Disk 3 and the Parity though stump me a bit since I'm not seeing errors in the syslog? Apologies for the embedding it all, I couldn't attach them all due to size! Thanks in advance!
September 29, 20169 yr Author Syslog1 Sep 26 04:40:01 Wintermute rsyslogd: [origin software="rsyslogd" swVersion="8.6.0" x-pid="1144" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Sep 26 04:40:01 Wintermute logger: Community Applications Auto Update Running Sep 26 04:44:20 Wintermute kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Sep 26 04:44:20 Wintermute kernel: ata8.00: irq_stat 0x40000001 Sep 26 04:44:20 Wintermute kernel: ata8.00: failed command: READ DMA EXT Sep 26 04:44:20 Wintermute kernel: ata8.00: cmd 25/00:40:30:f6:2e/00:05:3a:00:00/e0 tag 18 dma 688128 in Sep 26 04:44:20 Wintermute kernel: res 51/40:3f:30:f7:2e/00:04:3a:00:00/e0 Emask 0x9 (media error) Sep 26 04:44:20 Wintermute kernel: ata8.00: status: { DRDY ERR } Sep 26 04:44:20 Wintermute kernel: ata8.00: error: { UNC } Sep 26 04:44:20 Wintermute kernel: ata8.00: configured for UDMA/133 Sep 26 04:44:20 Wintermute kernel: sd 8:0:0:0: [sdi] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Sep 26 04:44:20 Wintermute kernel: sd 8:0:0:0: [sdi] tag#18 Sense Key : 0x3 [current] [descriptor] Sep 26 04:44:20 Wintermute kernel: sd 8:0:0:0: [sdi] tag#18 ASC=0x11 ASCQ=0x4 Sep 26 04:44:20 Wintermute kernel: sd 8:0:0:0: [sdi] tag#18 CDB: opcode=0x28 28 00 3a 2e f6 30 00 05 40 00 Sep 26 04:44:20 Wintermute kernel: blk_update_request: I/O error, dev sdi, sector 976156464 Sep 26 04:44:20 Wintermute kernel: ata8: EH complete Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156400 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156408 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156416 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156424 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156432 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156440 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156448 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156456 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156464 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156472 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156480 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156488 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156496 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156504 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156512 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156520 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156528 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156536 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156544 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156552 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156560 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156568 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156576 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156584 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156592 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156600 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156608 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156616 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156624 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156632 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156640 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156648 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156656 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156664 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156672 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156680 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156688 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156696 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156704 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156712 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156720 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156728 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156736 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156744 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156752 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156760 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156768 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156776 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156784 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156792 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156800 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156808 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156816 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156824 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156832 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156840 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156848 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156856 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156864 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156872 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156880 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156888 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156896 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156904 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156912 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156920 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156928 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156936 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156944 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156952 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156960 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156968 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156976 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156984 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976156992 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157000 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157008 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157016 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157024 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157032 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157040 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157048 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157056 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157064 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157072 Sep 26 04:44:20 Wintermute kernel: md: disk6 read error, sector=976157080 It goes on like that for a while always disk 6
September 29, 20169 yr Community Expert You didn't post the SMART for disk3, or better yet post the diagnostics. Disk6 is on its way out, parity had some issues in the past, but it should be OK for now, I guess you'll know after trying to rebuild disk6.
September 29, 20169 yr Author Hey yeah sorry, I overdid the max characters so ended up chopping the wrong bit out! Disk 6 looks a goner! Thanks for confirming. Everything else should be here now! My Parity is also showing an error though I'm stumped as to what it means!? And that's a fairly new 6TB WD Red (2 years tops) WDC_WD60EFRX-68MYMN1_WD-WX51D6427226-20160929-1130.txt smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.7-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: WDC WD60EFRX-68MYMN1 Serial Number: WD-WX51D6427226 LU WWN Device Id: 5 0014ee 260205560 Firmware Version: 82.00A82 User Capacity: 6,001,175,126,016 bytes [6.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5700 rpm Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Sep 29 11:30:53 2016 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 5204) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 705) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 197 193 021 Pre-fail Always - 9141 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1277 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 17156 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 69 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 29 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 4379 194 Temperature_Celsius 0x0022 118 108 000 Old_age Always - 34 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 1 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 34 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 34 occurred at disk power-on lifetime: 13493 hours (562 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 90 d8 00 e0 Error: UNC 8 sectors at LBA = 0x0000d890 = 55440 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 90 d8 00 e0 08 2d+05:06:02.778 READ DMA ca 00 80 10 d8 00 e0 08 2d+05:06:02.777 WRITE DMA ca 00 08 08 d8 00 e0 08 2d+05:06:02.777 WRITE DMA ca 00 08 00 d8 00 e0 08 2d+05:06:02.628 WRITE DMA c8 00 80 10 d8 00 e0 08 2d+05:06:02.628 READ DMA Error 33 occurred at disk power-on lifetime: 13493 hours (562 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 f8 d7 00 e0 Error: UNC 8 sectors at LBA = 0x0000d7f8 = 55288 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 f8 d7 00 e0 08 2d+05:05:04.113 READ DMA ca 00 40 b8 d7 00 e0 08 2d+05:05:04.112 WRITE DMA ca 00 08 b0 d7 00 e0 08 2d+05:05:04.112 WRITE DMA ca 00 08 a8 d7 00 e0 08 2d+05:05:03.963 WRITE DMA c8 00 40 b8 d7 00 e0 08 2d+05:05:03.963 READ DMA Error 32 occurred at disk power-on lifetime: 13493 hours (562 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 28 d6 00 e0 Error: UNC 8 sectors at LBA = 0x0000d628 = 54824 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 28 d6 00 e0 08 2d+05:03:55.097 READ DMA ef 10 02 00 00 00 a0 08 2d+05:03:55.079 SET FEATURES [Enable SATA feature] ec 00 00 00 00 00 a0 08 2d+05:03:55.079 IDENTIFY DEVICE Error 31 occurred at disk power-on lifetime: 13493 hours (562 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 08 20 d6 00 e0 Error: IDNF at LBA = 0x0000d620 = 54816 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca 00 08 20 d6 00 e0 08 2d+05:03:32.474 WRITE DMA c8 00 08 20 d6 00 e0 08 2d+05:03:32.040 READ DMA ca 00 70 b0 d5 00 e0 08 2d+05:03:32.039 WRITE DMA Error 30 occurred at disk power-on lifetime: 13493 hours (562 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 50 d5 00 e0 Error: UNC 8 sectors at LBA = 0x0000d550 = 54608 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 50 d5 00 e0 08 2d+05:02:38.969 READ DMA ca 00 68 e8 d4 00 e0 08 2d+05:02:38.968 WRITE DMA ca 00 08 e0 d4 00 e0 08 2d+05:02:38.803 WRITE DMA c8 00 68 e8 d4 00 e0 08 2d+05:02:38.765 READ DMA ca 00 08 d8 d4 00 e0 08 2d+05:02:38.765 WRITE DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1258 - # 2 Short offline Completed without error 00% 1206 - # 3 Short offline Completed without error 00% 1206 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
September 29, 20169 yr Community Expert Disk3 doesn't look very good, but it can be a false positive and disk6 looks worse, so I would replace that one first. After the rebuild look at the error counters, if there any in disk3, parity or any other disk there could be some corrupt files.
September 29, 20169 yr Author Ok thanks for confirming it all, Disk 3 threw me as it wasn't in the syslog but I was combing them all trying to work it out. I really wish SMART reports gave more detail for dummies ha ha. Well I've been looking for an excuse for another 6TB...oh well. I'll redo the scans post fixing Disk 6. I do wonder if the parity was down to how often the thing had been crashing recently? Thanks again!
September 29, 20169 yr Community Expert Parity should be OK, there were some errors but they were about 5 month ago, and there are no pending sectors. Disk3 shows some pending sectors and some read errors 45 days ago, so during the rebuild there could be some read errors that could result on some corrupt files on the rebuilt disk.
September 29, 20169 yr Community Expert Here are 2 things for future reference. 1) Always to go Tools - Diagnostics and post the complete diagnostics zip. That one file would have included everything needed instead of embedding individual SMART and syslog excerpts across several posts. 2) Set up Notifications. Looks like disk6 had probably been an issue for a long time but you weren't aware of it. Notifications would have told you.
September 29, 20169 yr Author Yeah I'm reading up on notifications at the minute. I got lazy post set up, I should have covered my back earlier. I tried the zip, too large apparently!
Archived
This topic is now archived and is closed to further replies.