something fishy Posted March 19, 2013 Share Posted March 19, 2013 Hi I had a drive redballed yesterday. The syslog contains the following lines (the redballed drive is sdb): Mar 18 15:51:41 shortie kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Mar 18 15:51:41 shortie kernel: ata2.00: failed command: READ DMA EXT Mar 18 15:51:41 shortie kernel: ata2.00: cmd 25/00:c0:c8:e7:f3/00:00:5a:01:00/e0 tag 0 dma 98304 in Mar 18 15:51:41 shortie kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Mar 18 15:51:41 shortie kernel: ata2.00: status: { DRDY } Mar 18 15:51:41 shortie kernel: ata2: hard resetting link Mar 18 15:51:51 shortie kernel: ata2: softreset failed (device not ready) Mar 18 15:51:51 shortie kernel: ata2: hard resetting link Mar 18 15:52:01 shortie kernel: ata2: softreset failed (device not ready) Mar 18 15:52:01 shortie kernel: ata2: hard resetting link Mar 18 15:52:12 shortie kernel: ata2: link is slow to respond, please be patient (ready=0) Mar 18 15:52:36 shortie kernel: ata2: softreset failed (device not ready) Mar 18 15:52:36 shortie kernel: ata2: limiting SATA link speed to 1.5 Gbps Mar 18 15:52:36 shortie kernel: ata2: hard resetting link Mar 18 15:52:41 shortie kernel: ata2: softreset failed (device not ready) Mar 18 15:52:41 shortie kernel: ata2: reset failed, giving up Mar 18 15:52:41 shortie kernel: ata2.00: disabled Mar 18 15:52:41 shortie kernel: ata2.00: device reported invalid CHS sector 0 Mar 18 15:52:41 shortie kernel: ata2: EH complete Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x88: 88 00 00 00 00 01 5a f3 e7 c8 00 00 00 c0 00 00 Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901320 Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x88: 88 00 00 00 00 01 5a f3 e8 88 00 00 00 48 00 00 Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901512 Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x8a: 8a 00 00 00 00 01 5a f3 e7 b8 00 00 00 10 00 00 Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901304 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901256/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901264/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901272/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901280/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901288/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901296/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901304/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901312/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901320/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901328/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901336/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901344/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901352/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901360/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901368/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901376/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901384/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901392/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901400/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901408/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901416/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901424/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901432/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901440/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901448/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901456/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901464/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901472/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901480/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901488/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901496/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901504/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 read error Mar 18 15:52:41 shortie kernel: handle_stripe read error: 5820901512/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901240/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901248/1, count: 1 Mar 18 15:52:41 shortie kernel: md: recovery thread woken up ... Mar 18 15:52:41 shortie kernel: md: recovery thread has nothing to resync Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Unhandled error code Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Mar 18 15:52:41 shortie kernel: sd 1:0:0:0: [sdb] CDB: cdb[0]=0x8a: 8a 00 00 00 00 01 5a f3 e7 c8 00 00 01 08 00 00 Mar 18 15:52:41 shortie kernel: end_request: I/O error, dev sdb, sector 5820901320 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901256/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901264/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901272/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901280/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901288/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901296/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901304/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901312/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901320/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901328/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901336/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901344/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901352/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901360/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901368/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901376/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901384/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901392/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901400/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901408/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901416/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901424/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901432/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901440/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901448/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901456/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901464/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901472/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901480/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901488/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901496/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901504/1, count: 1 Mar 18 15:52:41 shortie kernel: md: disk1 write error Mar 18 15:52:41 shortie kernel: handle_stripe write error: 5820901512/1, count: 1 When I tried to run a smart test on the drive it failed with ioctl error: -5 A clean powerdown, a quick reseat of the drives and the drive is accessable but obviously still redballed. The smart results look clean to me (but I'm a rookie): ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 135 135 054 Pre-fail Offline - 86 3 Spin_Up_Time 0x0007 126 126 024 Pre-fail Always - 615 (Average 615) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 967 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 135 135 020 Pre-fail Offline - 26 9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 14099 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 63 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1085 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 1085 194 Temperature_Celsius 0x0002 176 176 000 Old_age Always - 34 (Min/Max 22/47) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 I've bought a couple of replacement drives (one to rebuild onto tonite; one to preclear and keep as a hotswap). I'll obviously stress test the the redballed drive when the array is protected again but given the above is this likely a drive issue or a problem with the PC/controller? The server is built in a HP proliant microserver (36L) disks are Hitachi 7K3000s. Unraid is 5.0 beta 8. Full syslog attached as zipfile. Thanks Eric syslog-20130318-203007.zip Quote Link to comment
dgaschk Posted March 19, 2013 Share Posted March 19, 2013 Run a long SMART test on the drive. Based in the Power-Off_Retract_Count it could be a power issue. Quote Link to comment
something fishy Posted March 20, 2013 Author Share Posted March 20, 2013 Thanks, I did a couple of long smart tests and the results looked identical to me. I've swapped out the drive (for a Toshiba 3Tb equivalent) and the array is fault tolerant again. Am now preclearing a second new 3Tb disk to keep in a box as a spare and the original is on a 4x preclear stress test in my backup server. As an aside I'm noticing how much quicker the new 1Tb platter drives at sequental reads and writes are than my Hitachi 7K3000s; its really visible on the preclears which I started about the same time. What is ioctl error? google doesn't seem to help much. Eric Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.