Failed parity drive?


Recommended Posts

Hi All,

Looking for a little advice on my situation, but I am pretty sure I have a failed parity drive.

 

I noticed a few errors within the past week on my parity drive, so I decided to run a check yesterday.  After quite a while, it failed and my parity drive was red.  Stupidly, I shut down my server without grabbing a log.  Upon my return home last night, I checked all cables and booted the server back up and it said the parity drive was new.  I brought the array up instructing it to re-build the parity based on the data on the disks.  At some point overnight, it failed again and the parity drive was red again.  This time however, I grabbed a log before shutting it down.

 

Here are the relevant parts from when it went down:

 

Feb 14 07:20:55 Media kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Feb 14 07:20:55 Media kernel: ata10.00: failed command: WRITE DMA EXT
Feb 14 07:20:55 Media kernel: ata10.00: cmd 35/00:00:c7:df:af/00:04:2e:00:00/e0 tag 0 dma 524288 out
Feb 14 07:20:55 Media kernel:          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 14 07:20:55 Media kernel: ata10.00: status: { DRDY }
Feb 14 07:21:00 Media kernel: ata10: link is slow to respond, please be patient (ready=0)
Feb 14 07:21:05 Media kernel: ata10: device not ready (errno=-16), forcing hardreset
Feb 14 07:21:05 Media kernel: ata10: soft resetting link
Feb 14 07:21:10 Media kernel: ata10: link is slow to respond, please be patient (ready=0)
Feb 14 07:21:15 Media kernel: ata10: SRST failed (errno=-16)
Feb 14 07:21:15 Media kernel: ata10: soft resetting link
Feb 14 07:21:20 Media kernel: ata10: link is slow to respond, please be patient (ready=0)
Feb 14 07:21:25 Media kernel: ata10: SRST failed (errno=-16)
Feb 14 07:21:25 Media kernel: ata10: soft resetting link
Feb 14 07:21:30 Media kernel: ata10: link is slow to respond, please be patient (ready=0)
Feb 14 07:22:00 Media kernel: ata10: SRST failed (errno=-16)
Feb 14 07:22:00 Media kernel: ata10: soft resetting link
Feb 14 07:22:05 Media kernel: ata10: SRST failed (errno=-16)
Feb 14 07:22:05 Media kernel: ata10: reset failed, giving up
Feb 14 07:22:05 Media kernel: ata10.00: disabled
Feb 14 07:22:05 Media kernel: ata10: EH complete
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Unhandled error code
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] CDB: cdb[0]=0x2a: 2a 00 2e af df c7 00 04 00 00
Feb 14 07:22:05 Media kernel: end_request: I/O error, dev sdh, sector 783278023
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Unhandled error code
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] CDB: cdb[0]=0x2a: 2a 00 2e af e3 c7 00 02 98 00
Feb 14 07:22:05 Media kernel: end_request: I/O error, dev sdh, sector 783279047
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Unhandled error code
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] Result: hostbyte=0x04 driverbyte=0x00
Feb 14 07:22:05 Media kernel: sd 10:0:0:0: [sdh] CDB: cdb[0]=0x2a: 2a 00 2e af e6 5f 00 02 68 00
Feb 14 07:22:05 Media kernel: end_request: I/O error, dev sdh, sector 783279711

 

Followed by this error repeating itself hundreds of times:

Feb 14 07:22:05 Media kernel: md: disk0 write error
Feb 14 07:22:05 Media kernel: handle_stripe write error: 783277960/0, count: 1

 

Then I get this error repeating itself:

Feb 14 07:35:50 Media emhttp: mdcmd: write: Input/output error
Feb 14 07:35:50 Media kernel: mdcmd (26): spindown 0
Feb 14 07:35:50 Media kernel: md: disk0: ATA_OP_STANDBYNOW1 ioctl error: -5

 

I'm pretty sure it is going to result in a failed drive, but is there anything else that I can or should do before purchasing a new drive?  Any steps that I am missing?

 

Thanks in advance!

Link to comment

I know you checked the cables.. but I'd replace the SATA cable before I replaced the drive. I have had 2 "bad drives" that a $0.99 cable "fixed". I have bought a small supply of data cables from mono price and always try that first. A crack or even a slight warp in the fitting on the cables can cause this issue.. if your case is anything like mine, the sata fittings are all pretty squished in.

 

Cheers,

 

whiteatom

 

 

Link to comment

Hi guys,

Thanks for your help!  I came home this evening, tried a different SATA cable as I did happen to have one laying around, then RAN a short SMART test.  I'm not sure quite how to read it, so I should probably post the whole thing, but this seemed to be the relevant bit to me.

 

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       10%      8900         19897350
# 2  Extended offline    Completed without error       00%       268         -
# 3  Short offline       Completed without error       00%       263         -

 

I find it a little confusing that it says no errors logged, but then further down it says it had a read failure during the short test.  Any help would be appreciated.

 

Thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.