July 15, 201213 yr I am still on RC4 at this stage. I hade a few disks in my array spun up and a few down when I decided to kick off a parity check. Anywyas got these errors in the log straight away, so I stopped parity check then restarted and no errors. Was it just a disk spin up issue or should I be worried. No other errors occured with the rest of the parity check and the disk passed all smart and self tests. Jul 15 23:03:53 Tower kernel: mdcmd (83): check CORRECT Jul 15 23:03:53 Tower kernel: md: recovery thread woken up ... Jul 15 23:03:53 Tower kernel: md: recovery thread checking parity... Jul 15 23:03:53 Tower kernel: md: using 1152k window, over a total of 2930266532 blocks. Jul 15 23:04:08 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 15 23:04:08 Tower kernel: ata3.00: irq_stat 0x40000001 Jul 15 23:04:08 Tower kernel: ata3.00: failed command: READ DMA EXT Jul 15 23:04:08 Tower kernel: ata3.00: cmd 25/00:00:40:1f:00/00:04:00:00:00/e0 tag 0 dma 524288 in Jul 15 23:04:08 Tower kernel: res 51/40:9f:a0:1f:00/00:03:00:00:00/e0 Emask 0x9 (media error) Jul 15 23:04:08 Tower kernel: ata3.00: status: { DRDY ERR } Jul 15 23:04:08 Tower kernel: ata3.00: error: { UNC } Jul 15 23:04:08 Tower kernel: ata3.00: configured for UDMA/133 Jul 15 23:04:08 Tower kernel: ata3: EH complete Jul 15 23:04:11 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 15 23:04:11 Tower kernel: ata3.00: irq_stat 0x40000001 Jul 15 23:04:11 Tower kernel: ata3.00: failed command: READ DMA EXT Jul 15 23:04:11 Tower kernel: ata3.00: cmd 25/00:00:40:1f:00/00:04:00:00:00/e0 tag 0 dma 524288 in Jul 15 23:04:11 Tower kernel: res 51/40:9f:a0:1f:00/00:03:00:00:00/e0 Emask 0x9 (media error) Jul 15 23:04:11 Tower kernel: ata3.00: status: { DRDY ERR } Jul 15 23:04:11 Tower kernel: ata3.00: error: { UNC } Jul 15 23:04:11 Tower kernel: ata3.00: configured for UDMA/133 Jul 15 23:04:11 Tower kernel: ata3: EH complete Jul 15 23:04:13 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 15 23:04:13 Tower kernel: ata3.00: irq_stat 0x40000001 Jul 15 23:04:13 Tower kernel: ata3.00: failed command: READ DMA EXT Jul 15 23:04:13 Tower kernel: ata3.00: cmd 25/00:00:40:1f:00/00:04:00:00:00/e0 tag 0 dma 524288 in Jul 15 23:04:13 Tower kernel: res 51/40:9f:a0:1f:00/00:03:00:00:00/e0 Emask 0x9 (media error) Jul 15 23:04:13 Tower kernel: ata3.00: status: { DRDY ERR } Jul 15 23:04:13 Tower kernel: ata3.00: error: { UNC } Jul 15 23:04:13 Tower kernel: ata3.00: configured for UDMA/133 Jul 15 23:04:13 Tower kernel: ata3: EH complete Jul 15 23:04:16 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 15 23:04:16 Tower kernel: ata3.00: irq_stat 0x40000001 Jul 15 23:04:16 Tower kernel: ata3.00: failed command: READ DMA EXT Jul 15 23:04:16 Tower kernel: ata3.00: cmd 25/00:00:40:1f:00/00:04:00:00:00/e0 tag 0 dma 524288 in Jul 15 23:04:16 Tower kernel: res 51/40:9f:a0:1f:00/00:03:00:00:00/e0 Emask 0x9 (media error) Jul 15 23:04:16 Tower kernel: ata3.00: status: { DRDY ERR } Jul 15 23:04:16 Tower kernel: ata3.00: error: { UNC } Jul 15 23:04:16 Tower kernel: ata3.00: configured for UDMA/133 Jul 15 23:04:16 Tower kernel: ata3: EH complete Jul 15 23:04:18 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 15 23:04:18 Tower kernel: ata3.00: irq_stat 0x40000001 Jul 15 23:04:18 Tower kernel: ata3.00: failed command: READ DMA EXT Jul 15 23:04:18 Tower kernel: ata3.00: cmd 25/00:00:40:1f:00/00:04:00:00:00/e0 tag 0 dma 524288 in Jul 15 23:04:18 Tower kernel: res 51/40:9f:a0:1f:00/00:03:00:00:00/e0 Emask 0x9 (media error) Jul 15 23:04:18 Tower kernel: ata3.00: status: { DRDY ERR } Jul 15 23:04:18 Tower kernel: ata3.00: error: { UNC } Jul 15 23:04:18 Tower kernel: ata3.00: configured for UDMA/133 Jul 15 23:04:18 Tower kernel: ata3: EH complete Jul 15 23:04:21 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jul 15 23:04:21 Tower kernel: ata3.00: irq_stat 0x40000001 Jul 15 23:04:21 Tower kernel: ata3.00: failed command: READ DMA EXT Jul 15 23:04:21 Tower kernel: ata3.00: cmd 25/00:00:40:1f:00/00:04:00:00:00/e0 tag 0 dma 524288 in Jul 15 23:04:21 Tower kernel: res 51/40:9f:a0:1f:00/00:03:00:00:00/e0 Emask 0x9 (media error) Jul 15 23:04:21 Tower kernel: ata3.00: status: { DRDY ERR } Jul 15 23:04:21 Tower kernel: ata3.00: error: { UNC } Jul 15 23:04:21 Tower kernel: ata3.00: configured for UDMA/133 Jul 15 23:04:21 Tower kernel: sd 3:0:0:0: [sdc] Unhandled sense code Jul 15 23:04:21 Tower kernel: sd 3:0:0:0: [sdc] Result: hostbyte=0x00 driverbyte=0x08 Jul 15 23:04:21 Tower kernel: sd 3:0:0:0: [sdc] Sense Key : 0x3 [current] [descriptor] Jul 15 23:04:21 Tower kernel: Descriptor sense data with sense descriptors (in hex): Jul 15 23:04:21 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jul 15 23:04:21 Tower kernel: 00 00 1f a0 Jul 15 23:04:21 Tower kernel: sd 3:0:0:0: [sdc] ASC=0x11 ASCQ=0x4 Jul 15 23:04:21 Tower kernel: sd 3:0:0:0: [sdc] CDB: cdb[0]=0x28: 28 00 00 00 1f 40 00 04 00 00 Jul 15 23:04:21 Tower kernel: end_request: I/O error, dev sdc, sector 8096 Jul 15 23:04:21 Tower kernel: ata3: EH complete Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8032/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8040/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8048/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8056/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8064/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8072/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8080/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8088/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8096/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8104/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8112/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8120/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8128/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8136/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8144/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8152/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8160/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8168/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8176/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8184/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8192/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8200/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8208/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8216/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8224/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8232/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8240/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8248/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8256/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8264/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8272/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8280/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8288/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8296/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8304/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8312/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8320/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8328/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8336/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8344/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8352/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8360/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8368/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8376/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8384/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8392/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8400/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8408/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8416/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8424/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8432/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8440/1, count: 1 Jul 15 23:04:21 Tower kernel: md: disk1 read error Jul 15 23:04:21 Tower kernel: handle_stripe read error: 8448/1, count: 1
July 16, 201213 yr Author Sorry posted in rc forum due to thinking it may be something to do with the RC i was running that may have been changed or upgraded in the next RC ie the hdparm changes etc etc or a newer driver.
July 16, 201213 yr Sorry posted in rc forum due to thinking it may be something to do with the RC i was running that may have been changed or upgraded in the next RC ie the hdparm changes etc etc or a newer driver. No, UNC errors are un-correctable media errors.(un-readable sectors on the disk) It is why you are being asked to get a SMART report for that disk. It has nothing to do with the "rc" release.
July 16, 201213 yr Author OK SO when I saw the errors after starting the parity check I stopped the parity check. Main array status showed 116 errors for drive 1 checked smart report all good ran self test all good now all disks are spun up I re-ran the parity check and no more errors either in the log or on the array status page still at 116 errors for disk 1. Jul 16 08:49:16 Tower kernel: md: sync done. time=34380sec Jul 16 08:49:16 Tower kernel: md: recovery thread sync completion status: 0 Statistics for /dev/sdc WDC_WD30EZRX-00MMMB0_WD-WCAWZ1155791 smartctl -a -d ata /dev/sdc smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD30EZRX-00MMMB0 Serial Number: WD-WCAWZ1155791 Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Jul 16 19:43:00 2012 WST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (51000) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 154 148 021 Pre-fail Always - 9266 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 131 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7700 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 62 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 28 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 116 194 Temperature_Celsius 0x0022 131 108 000 Old_age Always - 21 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 7679 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
July 17, 201213 yr It's odd that the syslog reports media errors but the SMART doesn't. Run a long SMART test. Power off retract indicates a power issue. Check power connections. What PSU?
July 17, 201213 yr It would also be worth running a manufacturer test on the drive. Either of the following should be fine: WD Data Lifeguard http://support.wdc.com/product/download.asp?groupid=608〈=en Seatools (Seagate utility also works for WD drives) http://www.seagate.com/support/downloads/seatools/ Sometimes a drive suspected of being faulty but it still passes SMART & manufacturer tests just fine. If this is the case, try a benchmark tool (something like HDTune) and you can often tell by large dips in r/w performance that a drive is on it's way out- even if it hasn't quite failed just yet.
Archived
This topic is now archived and is closed to further replies.