LSI 9207-8e Randomly Disables Disks


Recommended Posts

Hello, 

 

I have ran into an issue with my sata card randomly disabling disks attached to it and I cannot figure out why. Below is the syslog output that I can find. The SMART settings say it is healthy. Any ideas what could cause this? I have had to rebuild the data onto the same disk twice in 2 days now.

 

May  3 02:54:08 kernel: sd 7:0:0:0: attempting task abort!scmd(0x000000007cd90eff), outstanding for 15497 ms & timeout 15000 ms
May  3 02:54:08 kernel: sd 7:0:0:0: [sdh] tag#2356 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
May  3 02:54:08 kernel: scsi target7:0:0: handle(0x000a), sas_address(0x5001e677bbe6dfe0), phy(0)
May  3 02:54:08 kernel: scsi target7:0:0: enclosure logical id(0x5001e677bbe6dfff), slot(0) 
May  3 02:54:08 kernel: sd 7:0:0:0: device_block, handle(0x000a)
May  3 02:54:10 kernel: sd 7:0:0:0: device_unblock and setting to running, handle(0x000a)
May  3 02:54:10 kernel: sd 7:0:0:0: [sdh] Synchronizing SCSI cache
May  3 02:54:10 kernel: sd 7:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
May  3 02:54:10 rc.diskinfo[8086]: SIGHUP received, forcing refresh of disks info.
May  3 02:54:12 kernel: scsi 7:0:0:0: [sdh] tag#7232 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=19s
May  3 02:54:12 kernel: scsi 7:0:0:0: [sdh] tag#7232 CDB: opcode=0x88 88 00 00 00 00 02 02 51 49 00 00 00 00 40 00 00
May  3 02:54:12 kernel: blk_update_request: I/O error, dev sdh, sector 8628816128 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816064
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816072
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816080
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816088
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816096
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816104
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816112
May  3 02:54:12 kernel: md: disk4 read error, sector=8628816120
May  3 02:54:12 kernel: scsi 7:0:0:0: task abort: SUCCESS scmd(0x000000007cd90eff)
May  3 02:54:12 kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x5001e677bbe6dfe0)
May  3 02:54:12 kernel: mpt2sas_cm0: removing handle(0x000a), sas_addr(0x5001e677bbe6dfe0)
May  3 02:54:12 kernel: mpt2sas_cm0: enclosure logical id(0x5001e677bbe6dfff), slot(0)
May  3 02:54:13 kernel: scsi 7:0:2:0: Direct-Access     ATA      ST8000VN004-2M21 SC60 PQ: 0 ANSI: 6
May  3 02:54:13 kernel: scsi 7:0:2:0: SATA: handle(0x000a), sas_addr(0x5001e677bbe6dfe0), phy(0), device_name(0x0000000000000000)
May  3 02:54:13 kernel: scsi 7:0:2:0: enclosure logical id (0x5001e677bbe6dfff), slot(0) 
May  3 02:54:13 kernel: scsi 7:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May  3 02:54:13 kernel: scsi 7:0:2:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
May  3 02:54:13 kernel: sd 7:0:2:0: Attached scsi generic sg7 type 0
May  3 02:54:13 kernel: end_device-7:0:2: add: handle(0x000a), sas_addr(0x5001e677bbe6dfe0)
May  3 02:54:13 kernel: sd 7:0:2:0: Power-on or device reset occurred
May  3 02:54:13 kernel: sd 7:0:2:0: [sdi] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
May  3 02:54:13 kernel: sd 7:0:2:0: [sdi] 4096-byte physical blocks
May  3 02:54:13 kernel: sd 7:0:2:0: [sdi] Write Protect is off
May  3 02:54:13 kernel: sd 7:0:2:0: [sdi] Mode Sense: 7f 00 10 08
May  3 02:54:13 kernel: sd 7:0:2:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
May  3 02:54:13 kernel: sdi: sdi1
May  3 02:54:13 kernel: sd 7:0:2:0: [sdi] Attached SCSI disk
May  3 02:54:13 rc.diskinfo[8086]: SIGHUP received, forcing refresh of disks info.
May  3 02:54:14 unassigned.devices: Disk with serial 'ST8000VN004-2M2101_WRD0FLA6', mountpoint 'ST8000VN004-2M2101_WRD0FLA6' is not set to auto mount.
May  3 02:54:15 kernel: br0: received packet on bond0 with own address as source address (addr:38:d5:47:aa:c8:f3, vlan:0)

 

Link to comment

Well, it wasn't the cable. I replaced it and the issue still exists. The disk log shows this happening during the synchronizing SCSI cache. However, the disk was originally mounted at sdd and now it is trying to synch sdi. I am not sure why this is happening but it is making me like unraid more and more. I just want to set and forget it. 

 

May 8 02:05:00 kernel: sd 7:0:5:0: [sdi] tag#9555 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
May 8 02:05:03 kernel: sd 7:0:5:0: [sdi] Synchronizing SCSI cache
May 8 02:05:03 kernel: sd 7:0:5:0: [sdi] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
May 8 02:05:04 kernel: scsi 7:0:5:0: [sdi] tag#9554 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=20s
May 8 02:05:04 kernel: scsi 7:0:5:0: [sdi] tag#9554 CDB: opcode=0x88 88 00 00 00 00 02 85 d1 6a 80 00 00 01 00 00 00
May 8 02:05:04 kernel: blk_update_request: I/O error, dev sdi, sector 10835028608 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0

 

 

Also, I can't even just re-mount the disk to move the files onto one of the non-SFF-8088 cabled disks. I thought if a data disk fails, we can just mount it and access the files?

Now, it seems like I need to wait another 11 hours for the parity-rebuild to finish and then hope it doesn't fail again while I move the files to more stable drives using unbalance and then ditch the 8-bay external enclosure. 

Edited by edrohler
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.