Disk disabled - happens after a while. Cabling problem?


Recommended Posts

Hi all, i've been having this problem for a while (Ever since moving cases and adding another controller card to my mobo) Every so often my parity disk will fail to write and get disabled by unraid.

 

Sometimes when i boot up all the drives connected to a card will all error, sometimes it will last for an hour or so, then they slowly start to error. Now i've got it down to a few days, however eventually it seems a drive (Usually parity as obviously its most used) will error. I've attatched the bit of the syslog below. The drive when active has 0 health problems and runs fine. However i keep reseating the card into the port however im running out of ideas as to why the card seems to temporarily lose connection. The cards are an aoc-sas2lp-mv8 and aoc-saslp-mv8. I believe it is the newer sas2 card that keeps giving way, although i have had the other card do the same thing, but this is normally noticeable on bootup, so its apparent instantly when there is an issue.

 

All the hard drives are in a backplane, ive tried moving the drives around and dont think the backplane has anything to do with it, am almost convinced its the card just temporarily 'losing connection'

 

What type of things should i check for?

 

Mar 24 10:21:31 jbox kernel: sas: Enter sas_scsi_recover_host busy: 1 failed: 1
Mar 24 10:21:31 jbox kernel: sas: ata7: end_device-1:0: cmd error handler
Mar 24 10:21:31 jbox kernel: sas: ata7: end_device-1:0: dev error handler
Mar 24 10:21:31 jbox kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Mar 24 10:21:31 jbox kernel: ata7.00: failed command: READ DMA EXT
Mar 24 10:21:31 jbox kernel: ata7.00: cmd 25/00:00:b8:0a:d7/00:02:cc:01:00/e0 tag 19 dma 262144 in
Mar 24 10:21:31 jbox kernel:         res 51/40:00:c8:0a:d7/00:02:cc:01:00/e0 Emask 0x9 (media error)
Mar 24 10:21:31 jbox kernel: ata7.00: status: { DRDY ERR }
Mar 24 10:21:31 jbox kernel: ata7.00: error: { UNC }
Mar 24 10:21:31 jbox kernel: sas: ata8: end_device-1:1: dev error handler
Mar 24 10:21:31 jbox kernel: sas: ata9: end_device-1:2: dev error handler
Mar 24 10:21:31 jbox kernel: sas: ata10: end_device-1:3: dev error handler
Mar 24 10:21:31 jbox kernel: sas: ata11: end_device-1:4: dev error handler
Mar 24 10:21:31 jbox kernel: sas: ata12: end_device-1:5: dev error handler
Mar 24 10:21:31 jbox kernel: sas: ata13: end_device-1:6: dev error handler
Mar 24 10:21:31 jbox kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x1)
Mar 24 10:21:31 jbox kernel: ata7.00: revalidation failed (errno=-5)
Mar 24 10:21:31 jbox kernel: ata7: hard resetting link
Mar 24 10:21:31 jbox kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x1)
Mar 24 10:21:31 jbox kernel: ata7.00: revalidation failed (errno=-5)
Mar 24 10:21:36 jbox kernel: ata7: hard resetting link
Mar 24 10:21:37 jbox kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x1)
Mar 24 10:21:37 jbox kernel: ata7.00: revalidation failed (errno=-5)
Mar 24 10:21:37 jbox kernel: ata7.00: disabled
Mar 24 10:21:37 jbox kernel: sd 1:0:0:0: [sdc]
Mar 24 10:21:37 jbox kernel: Result: hostbyte=0x00 driverbyte=0x08
Mar 24 10:21:37 jbox kernel: sd 1:0:0:0: [sdc]
Mar 24 10:21:37 jbox kernel: Sense Key : 0x3 [current] [descriptor]
Mar 24 10:21:37 jbox kernel: Descriptor sense data with sense descriptors (in hex):
Mar 24 10:21:37 jbox kernel:        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 01
Mar 24 10:21:37 jbox kernel:        cc d7 0a c8
Mar 24 10:21:37 jbox kernel: sd 1:0:0:0: [sdc]
Mar 24 10:21:37 jbox kernel: ASC=0x11 ASCQ=0x4
Mar 24 10:21:37 jbox kernel: sd 1:0:0:0: [sdc] CDB:
Mar 24 10:21:37 jbox kernel: cdb[0]=0x88: 88 00 00 00 00 01 cc d7 0a b8 00 00 02 00 00 00
Mar 24 10:21:37 jbox kernel: blk_update_request: I/O error, dev sdc, sector 7731612360
Mar 24 10:21:37 jbox kernel: ata7: EH complete
Mar 24 10:21:37 jbox kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
Mar 24 10:21:37 jbox kernel: md: disk0 read error, sector=7731612296
Mar 24 10:21:37 jbox kernel: md: disk0 read error, sector=7731612304
Mar 24 10:21:37 jbox kernel: md: disk0 read error, sector=7731612312
Mar 24 10:21:37 jbox kernel: md: disk0 read error, sector=7731612320

log.zip

Link to comment

One thing that is not always obvious is to check how well the SAS card is seated into the motherboard slot.  I have found that sometime the bracket at the back is not perfectly aligned so that when you screw it in it ends up putting some pressure on the motherboard socket to push it out of being perfectly seated.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.