September 19, 201114 yr Server running great for months. Can one of the pros check out the errors in the syslog and try to point in the right direction? Running 4.7. Sep 16 03:52:45 sun kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Sep 16 03:52:45 sun kernel: ata9.00: irq_stat 0x40000001 (Drive related) Sep 16 03:52:45 sun kernel: ata9.00: failed command: READ DMA EXT (Minor Issues) Sep 16 03:52:45 sun kernel: ata9.00: cmd 25/00:00:07:46:00/00:04:00:00:00/e0 tag 0 dma 524288 in (Drive related) Sep 16 03:52:45 sun kernel: res 51/40:1f:de:48:00/00:01:00:00:00/e0 Emask 0x9 (media error) (Errors) Sep 16 03:52:45 sun kernel: ata9.00: status: { DRDY ERR } (Drive related) Sep 16 03:52:45 sun kernel: ata9.00: error: { UNC } (Errors) Sep 16 03:52:45 sun kernel: ata9.00: configured for UDMA/133 (Drive related) Sep 16 03:52:45 sun kernel: ata9: EH complete (Drive related) Sep 16 03:52:47 sun kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Sep 16 03:52:47 sun kernel: ata9.00: irq_stat 0x40000001 (Drive related) Sep 16 03:52:47 sun kernel: ata9.00: failed command: READ DMA EXT (Minor Issues) Sep 16 03:52:47 sun kernel: ata9.00: cmd 25/00:00:07:46:00/00:04:00:00:00/e0 tag 0 dma 524288 in (Drive related) Sep 16 03:52:47 sun kernel: res 51/40:1f:de:48:00/00:01:00:00:00/e0 Emask 0x9 (media error) (Errors) Sep 16 03:52:47 sun kernel: ata9.00: status: { DRDY ERR } (Drive related) Sep 16 03:52:47 sun kernel: ata9.00: error: { UNC } (Errors) Sep 16 03:52:47 sun kernel: ata9.00: configured for UDMA/133 (Drive related) Sep 16 03:52:47 sun kernel: ata9: EH complete (Drive related) Sep 16 03:52:49 sun kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Sep 16 03:52:49 sun kernel: ata9.00: irq_stat 0x40000001 (Drive related) Sep 16 03:52:49 sun kernel: ata9.00: failed command: READ DMA EXT (Minor Issues) Sep 16 03:52:49 sun kernel: ata9.00: cmd 25/00:00:07:46:00/00:04:00:00:00/e0 tag 0 dma 524288 in (Drive related) Sep 16 03:52:49 sun kernel: res 51/40:1f:de:48:00/00:01:00:00:00/e0 Emask 0x9 (media error) (Errors) Sep 16 03:52:49 sun kernel: ata9.00: status: { DRDY ERR } (Drive related) Sep 16 03:52:49 sun kernel: ata9.00: error: { UNC } (Errors) Sep 16 03:52:49 sun kernel: ata9.00: configured for UDMA/133 (Drive related) Sep 16 03:52:49 sun kernel: ata9: EH complete (Drive related) Sep 16 03:52:50 sun kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Sep 16 03:52:50 sun kernel: ata9.00: irq_stat 0x40000001 (Drive related) Sep 16 03:52:50 sun kernel: ata9.00: failed command: READ DMA EXT (Minor Issues) Sep 16 03:52:50 sun kernel: ata9.00: cmd 25/00:00:07:46:00/00:04:00:00:00/e0 tag 0 dma 524288 in (Drive related) Sep 16 03:52:50 sun kernel: res 51/40:1f:de:48:00/00:01:00:00:00/e0 Emask 0x9 (media error) (Errors) Sep 16 03:52:50 sun kernel: ata9.00: status: { DRDY ERR } (Drive related) Sep 16 03:52:50 sun kernel: ata9.00: error: { UNC } (Errors) Sep 16 03:52:50 sun kernel: ata9.00: configured for UDMA/133 (Drive related) Sep 16 03:52:50 sun kernel: ata9: EH complete (Drive related) Sep 16 03:52:52 sun kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Sep 16 03:52:52 sun kernel: ata9.00: irq_stat 0x40000001 (Drive related) Sep 16 03:52:52 sun kernel: ata9.00: failed command: READ DMA EXT (Minor Issues) Sep 16 03:52:52 sun kernel: ata9.00: cmd 25/00:00:07:46:00/00:04:00:00:00/e0 tag 0 dma 524288 in (Drive related) Sep 16 03:52:52 sun kernel: res 51/40:1f:de:48:00/00:01:00:00:00/e0 Emask 0x9 (media error) (Errors) Sep 16 03:52:52 sun kernel: ata9.00: status: { DRDY ERR } (Drive related) Sep 16 03:52:52 sun kernel: ata9.00: error: { UNC } (Errors) Sep 16 03:52:52 sun kernel: ata9.00: configured for UDMA/133 (Drive related) Sep 16 03:52:52 sun kernel: ata9: EH complete (Drive related) Sep 16 03:52:54 sun kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Sep 16 03:52:54 sun kernel: ata9.00: irq_stat 0x40000001 (Drive related) Sep 16 03:52:54 sun kernel: ata9.00: failed command: READ DMA EXT (Minor Issues) Sep 16 03:52:54 sun kernel: ata9.00: cmd 25/00:00:07:46:00/00:04:00:00:00/e0 tag 0 dma 524288 in (Drive related) Sep 16 03:52:54 sun kernel: res 51/40:1f:de:48:00/00:01:00:00:00/e0 Emask 0x9 (media error) (Errors) Sep 16 03:52:54 sun kernel: ata9.00: status: { DRDY ERR } (Drive related) Sep 16 03:52:54 sun kernel: ata9.00: error: { UNC } (Errors) Sep 16 03:52:54 sun kernel: ata9.00: configured for UDMA/133 (Drive related) Sep 16 03:52:54 sun kernel: sd 4:0:0:0: [sdh] Unhandled sense code (Drive related) Sep 16 03:52:54 sun kernel: sd 4:0:0:0: [sdh] Result: hostbyte=0x00 driverbyte=0x08 (System) Sep 16 03:52:54 sun kernel: sd 4:0:0:0: [sdh] Sense Key : 0x3 [current] [descriptor] (Drive related) Sep 16 03:52:54 sun kernel: Descriptor sense data with sense descriptors (in hex): Sep 16 03:52:54 sun kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Sep 16 03:52:54 sun kernel: 00 00 48 de Sep 16 03:52:54 sun kernel: sd 4:0:0:0: [sdh] ASC=0x11 ASCQ=0x4 (Drive related) Sep 16 03:52:54 sun kernel: sd 4:0:0:0: [sdh] CDB: cdb[0]=0x28: 28 00 00 00 46 07 00 04 00 00 (Drive related) Sep 16 03:52:54 sun kernel: end_request: I/O error, dev sdh, sector 18654 (Errors) Sep 16 03:52:54 sun kernel: ata9: EH complete (Drive related) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18584/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18592/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18600/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18608/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18616/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18624/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18632/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18640/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18648/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18656/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18664/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18672/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18680/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18688/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18696/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18704/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18712/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18720/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18728/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18736/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18744/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18752/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18760/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18768/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18776/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18784/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18792/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18800/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18808/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18816/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18824/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18832/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18840/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18848/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18856/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18864/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18872/3, count: 1 (Errors) Sep 16 03:52:54 sun kernel: md: disk3 read error (Errors) Sep 16 03:52:54 sun kernel: handle_stripe read error: 18880/3, count: 1 (Errors) Sep 17 13:44:51 sun mountd[1886]: authenticated mount request from 192.168.1.107:798 for /mnt/user/movies (/mnt/user/movies)
September 19, 201114 yr Seems like a drive might be failing. We need the whole system log and a smartctl report on disk3
September 20, 201114 yr Author Here is the drive info...and there certainly is something there but very hard to decrypt. Statistics for /dev/sdh WDC_WD2002FAEX-0_WD-WMAY01287329 smartctl -a -d ata /dev/sdh smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD2002FAEX-007BA0 Serial Number: WD-WMAY01287329 Firmware Version: 05.01D05 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Sep 19 20:37:07 2011 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (30480) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3037) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 8858 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 22 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 4711 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19 194 Temperature_Celsius 0x0022 116 105 000 Old_age Always - 36 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 16 SMART Error Log Version: 1 Warning: ATA error count 20 inconsistent with error log pointer 4 ATA Error Count: 20 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 20 occurred at disk power-on lifetime: 4709 hours (196 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 41 c2 00 e0 Error: UNC 8 sectors at LBA = 0x0000c241 = 49729 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 3f c2 00 e0 08 20d+22:43:34.575 READ DMA ef 10 02 00 00 00 a0 08 20d+22:43:34.575 SET FEATURES [Reserved for Serial ATA] ec 00 00 00 00 00 a0 08 20d+22:43:34.571 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 20d+22:43:34.570 SET FEATURES [set transfer mode] Error 19 occurred at disk power-on lifetime: 4709 hours (196 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 41 c2 00 e0 Error: UNC 8 sectors at LBA = 0x0000c241 = 49729 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 3f c2 00 e0 08 20d+22:43:32.859 READ DMA ca 00 88 b7 c1 00 e0 08 20d+22:43:32.842 WRITE DMA ca 00 08 af c1 00 e0 08 20d+22:43:32.842 WRITE DMA c8 00 90 af c1 00 e0 08 20d+22:43:32.809 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1630 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. syslog-2011-09-19.txt
September 20, 201114 yr The drive looks fine. What PSU? They were not "Media Errors" but instead a drive that did not respond to the read request. The error was DRDY ERR (Drive Ready Error) I'd suspect cabling first, then power supply.
September 20, 201114 yr Author The power supply is a Seasonix X 660Watt. Probably one of the best power supplies out there, so I doubt it could be that. It is also hard to believe a SATA cable all of a sudden is faulty. The server never moves, it is closed up. There is no movement inside at all. They are all locking cables too. EITHER WAY I'll play the game and check the cable connections anyway. Problems like this is what scares me when using the unraid system. I've ran RAID systems for many years and never came across these type anomalies. Time to open up my closed server and check some cables. Thanks!
September 20, 201114 yr The power supply is a Seasonix X 660Watt. Probably one of the best power supplies out there, so I doubt it could be that. It is also hard to believe a SATA cable all of a sudden is faulty. The server never moves, it is closed up. There is no movement inside at all. They are all locking cables too. EITHER WAY I'll play the game and check the cable connections anyway. Problems like this is what scares me when using the unraid system. I've ran RAID systems for many years and never came across these type anomalies. Time to open up my closed server and check some cables. Thanks! It could be the power OR data cable. (or any power splitters) All it has to be is temperature sensitive, the "locking" helps, but is not always a guaranty of a good connection. Connections can migrate over time, especially if there is any tension on the cables.
September 20, 201114 yr The drive looks fine. What PSU? They were not "Media Errors" but instead a drive that did not respond to the read request. The error was DRDY ERR (Drive Ready Error) I'd suspect cabling first, then power supply. I looked more closely at the SMART report. It had these lines in it: After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 41 c2 00 e0 Error: UNC 8 sectors at LBA = 0x0000c241 = 49729 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 3f c2 00 e0 08 20d+22:43:34.575 READ DMA ef 10 02 00 00 00 a0 08 20d+22:43:34.575 SET FEATURES [Reserved for Serial ATA] ec 00 00 00 00 00 a0 08 20d+22:43:34.571 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 20d+22:43:34.570 SET FEATURES [set transfer mode] Error 19 occurred at disk power-on lifetime: 4709 hours (196 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 41 c2 00 e0 Error: UNC 8 sectors at LBA = 0x0000c241 = 49729 It sure looks like a un-correctable read error, but the SMART firmware did not count it as one. I reverse my prior statement. My bet is it is just a flaky sector on the disk. (And SMART firmware that may be equally flaky)
Archived
This topic is now archived and is closed to further replies.