Xerol Posted February 14, 2012 Share Posted February 14, 2012 Hi All, Looking for a little advice on my situation, but I am pretty sure I have a failed parity drive. I noticed a few errors within the past week on my parity drive, so I decided to run a check yesterday. After quite a while, it failed and my parity drive was red. Stupidly, I shut down my server without grabbing a log. Upon my return home last night, I checked all cables and booted the server back up and it said the parity drive was new. I brought the array up instructing it to re-build the parity based on the data on the disks. At some point overnight, it failed again and the parity drive was red again. This time however, I grabbed a log before shutting it down. Here are the relevant parts from when it went down: Feb 14 07:20:55 Media kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 14 07:20:55 Media kernel: ata10.00: failed command: WRITE DMA EXT Feb 14 07:20:55 Media kernel: ata10.00: cmd 35/00:00:c7:df:af/00:04:2e:00:00/e0 tag 0 dma 524288 out Feb 14 07:20:55 Media kernel: res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 14 07:20:55 Media kernel: ata10.00: status: { DRDY } Feb 14 07:21:00 Media kernel: ata10: link is slow to respond, please be patient (ready=0) Feb 14 07:21:05 Media kernel: ata10: device not ready (errno=-16), forcing hardreset Feb 14 07:21:05 Media kernel: ata10: soft resetting link Feb 14 07:21:10 Media kernel: ata10: link is slow to respond, please be patient (ready=0) Feb 14 07:21:15 Media kernel: ata10: SRST failed (errno=-16) Feb 14 07:21:15 Media kernel: ata10: soft resetting link Feb 14 07:21:20 Media kernel: ata10: link is slow to respond, please be patient (ready=0) Feb 14 07:21:25 Media kernel: ata10: SRST failed (errno=-16) Feb 14 07:21:25 Media kernel: ata10: soft resetting link Feb 14 07:21:30 Media kernel: ata10: link is slow to respond, please be patient (ready=0) Feb 14 07:22:00 Media kernel: ata10: SRST failed (errno=-16) Feb 14 07:22:00 Media kernel: ata10: soft resetting link Feb 14 07:22:05 Media kernel: ata10: SRST failed (errno=-16) Feb 14 07:22:05 Media kernel: ata10: reset failed, giving up Feb 14 07:22:05 Media kernel: ata10.00: disabled Feb 14 07:22:05 Media kernel: ata10: EH complete Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Unhandled error code Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] CDB: cdb[0]=0x2a: 2a 00 2e af df c7 00 04 00 00 Feb 14 07:22:05 Media kernel: end_request: I/O error, dev sdh, sector 783278023 Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Unhandled error code Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] CDB: cdb[0]=0x2a: 2a 00 2e af e3 c7 00 02 98 00 Feb 14 07:22:05 Media kernel: end_request: I/O error, dev sdh, sector 783279047 Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Unhandled error code Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00 Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] CDB: cdb[0]=0x2a: 2a 00 2e af e6 5f 00 02 68 00 Feb 14 07:22:05 Media kernel: end_request: I/O error, dev sdh, sector 783279711 Followed by this error repeating itself hundreds of times: Feb 14 07:22:05 Media kernel: md: disk0 write error Feb 14 07:22:05 Media kernel: handle_stripe write error: 783277960/0, count: 1 Then I get this error repeating itself: Feb 14 07:35:50 Media emhttp: mdcmd: write: Input/output error Feb 14 07:35:50 Media kernel: mdcmd (26): spindown 0 Feb 14 07:35:50 Media kernel: md: disk0: ATA_OP_STANDBYNOW1 ioctl error: -5 I'm pretty sure it is going to result in a failed drive, but is there anything else that I can or should do before purchasing a new drive? Any steps that I am missing? Thanks in advance! Quote Link to comment
mbryanr Posted February 14, 2012 Share Posted February 14, 2012 Capture a smart test. http://lime-technology.com/wiki/index.php?title=Troubleshooting#Obtaining_a_SMART_report Quote Link to comment
whiteatom Posted February 14, 2012 Share Posted February 14, 2012 I know you checked the cables.. but I'd replace the SATA cable before I replaced the drive. I have had 2 "bad drives" that a $0.99 cable "fixed". I have bought a small supply of data cables from mono price and always try that first. A crack or even a slight warp in the fitting on the cables can cause this issue.. if your case is anything like mine, the sata fittings are all pretty squished in. Cheers, whiteatom Quote Link to comment
Xerol Posted February 14, 2012 Author Share Posted February 14, 2012 Hi guys, Thanks for your help! I came home this evening, tried a different SATA cable as I did happen to have one laying around, then RAN a short SMART test. I'm not sure quite how to read it, so I should probably post the whole thing, but this seemed to be the relevant bit to me. SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 10% 8900 19897350 # 2 Extended offline Completed without error 00% 268 - # 3 Short offline Completed without error 00% 263 - I find it a little confusing that it says no errors logged, but then further down it says it had a read failure during the short test. Any help would be appreciated. Thanks! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.