March 20, 20251 yr A few days ago a drive had some errors and got disabled. I have done a SMART test and the drive looks to report okay. Now today I have had another drive get disabled with errors. These are both on the same controller card (LSI SAS 9207-8i) so I think it is a problem with the card rather than the drives. Looking things up I've read they can overheat and cause issues so I have now added a 40mm fan to the heat sink of the card. One of the drives is a data drive and the other is a parity drive (running 2 Parity, 7 data drives). Whats the safest way to try and recover these drives? System log of the drives and smart results attached. Mar 17 17:19:28 Chaos kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Mar 17 17:19:28 Chaos kernel: sd 4:0:5:0: [sdq] tag#3181 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=4s Mar 17 17:19:28 Chaos kernel: sd 4:0:5:0: [sdq] tag#3181 Sense Key : 0x2 [current] Mar 17 17:19:28 Chaos kernel: sd 4:0:5:0: [sdq] tag#3181 ASC=0x4 ASCQ=0x0 Mar 17 17:19:28 Chaos kernel: sd 4:0:5:0: [sdq] tag#3181 CDB: opcode=0x88 88 00 00 00 00 01 a5 a1 52 a0 00 00 00 18 00 00 Mar 17 17:19:28 Chaos kernel: I/O error, dev sdq, sector 7073780384 op 0x0:(READ) flags 0x0 phys_seg 3 prio class 0 Mar 17 17:19:28 Chaos kernel: md: disk6 read error, sector=7073780320 Mar 17 17:19:28 Chaos kernel: md: disk6 read error, sector=7073780328 Mar 17 17:19:28 Chaos kernel: md: disk6 read error, sector=7073780336 Mar 17 17:19:30 Chaos kernel: sd 4:0:5:0: [sdq] tag#3186 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s Mar 17 17:19:30 Chaos kernel: sd 4:0:5:0: [sdq] tag#3186 Sense Key : 0x2 [current] Mar 17 17:19:30 Chaos kernel: sd 4:0:5:0: [sdq] tag#3186 ASC=0x4 ASCQ=0x0 Mar 17 17:19:30 Chaos kernel: sd 4:0:5:0: [sdq] tag#3186 CDB: opcode=0x8a 8a 00 00 00 00 01 a5 a1 52 a0 00 00 00 18 00 00 Mar 17 17:19:30 Chaos kernel: I/O error, dev sdq, sector 7073780384 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 0 Mar 17 17:19:30 Chaos kernel: md: disk6 write error, sector=7073780320 Mar 17 17:19:30 Chaos kernel: md: disk6 write error, sector=7073780328 Mar 17 17:19:30 Chaos kernel: md: disk6 write error, sector=7073780336 Mar 17 22:01:27 Chaos kernel: DMAR: DRHD: handling fault status reg 2 Mar 17 22:01:28 Chaos kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xffffb80223fc4000 [fault reason 0x07] Next page table ptr is invalid Mar 17 22:01:28 Chaos kernel: DMAR: DRHD: handling fault status reg 3 Mar 17 22:01:28 Chaos kernel: DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0xffffb80223e24000 [fault reason 0x07] Next page table ptr is invalid Mar 17 22:01:28 Chaos kernel: DMAR: DRHD: handling fault status reg 3 Mar 17 22:01:28 Chaos kernel: DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0xffffb80223e2f000 [fault reason 0x07] Next page table ptr is invalid Mar 17 22:01:28 Chaos kernel: DMAR: DRHD: handling fault status reg 2 Mar 20 11:18:01 Chaos kernel: sd 4:0:7:0: device_block, handle(0x0010) Mar 20 11:18:02 Chaos kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Mar 20 11:18:03 Chaos kernel: sd 4:0:7:0: device_unblock and setting to running, handle(0x0010) Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635184 Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635192 Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635200 Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635208 Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635216 Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635224 Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635232 Mar 20 11:18:03 Chaos kernel: md: disk29 read error, sector=6994635240 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635248, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635184 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635256, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635192 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635264, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635200 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635272, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635208 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635280, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635216 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635288, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635224 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635296, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635232 Mar 20 11:18:03 Chaos kernel: unraidd6: attempt to access beyond end of device Mar 20 11:18:03 Chaos kernel: sds: rw=1, sector=6994635304, nr_sectors = 8 limit=0 Mar 20 11:18:03 Chaos kernel: md: disk29 write error, sector=6994635240 Mar 20 11:18:03 Chaos kernel: sd 4:0:7:0: [sds] Synchronizing SCSI cache Mar 20 11:18:03 Chaos kernel: sd 4:0:7:0: [sds] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK Mar 20 11:18:03 Chaos kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221107000000) Mar 20 11:18:03 Chaos kernel: mpt2sas_cm0: removing handle(0x0010), sas_addr(0x4433221107000000) Mar 20 11:18:03 Chaos kernel: mpt2sas_cm0: enclosure logical id(0x500605b006a8c290), slot(4) Mar 20 11:18:05 Chaos emhttpd: offline: TOSHIBA_MG09ACA18TE_83D0A00LFJDH (sds) 512 35156656128 Mar 20 11:18:14 Chaos kernel: mpt2sas_cm0: handle(0x10) sas_address(0x4433221107000000) port_type(0x1) Mar 20 11:18:14 Chaos kernel: scsi 4:0:8:0: Direct-Access ATA TOSHIBA MG09ACA1 0105 PQ: 0 ANSI: 6 Mar 20 11:18:14 Chaos kernel: scsi 4:0:8:0: SATA: handle(0x0010), sas_addr(0x4433221107000000), phy(7), device_name(0x0000000000000000) Mar 20 11:18:14 Chaos kernel: scsi 4:0:8:0: enclosure logical id (0x500605b006a8c290), slot(4) Mar 20 11:18:14 Chaos kernel: scsi 4:0:8:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Mar 20 11:18:14 Chaos kernel: scsi 4:0:8:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1) Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: Attached scsi generic sg18 type 0 Mar 20 11:18:14 Chaos kernel: end_device-4:8: add: handle(0x0010), sas_addr(0x4433221107000000) Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: Power-on or device reset occurred Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: [sdt] 35156656128 512-byte logical blocks: (18.0 TB/16.4 TiB) Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: [sdt] 4096-byte physical blocks Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: [sdt] Write Protect is off Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: [sdt] Mode Sense: 7f 00 10 08 Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: [sdt] Write cache: enabled, read cache: enabled, supports DPO and FUA Mar 20 11:18:14 Chaos kernel: sdt: sdt1 Mar 20 11:18:14 Chaos kernel: sd 4:0:8:0: [sdt] Attached SCSI disk Mar 20 11:18:16 Chaos unassigned.devices: Partition '/dev/sdt1' does not have a file system and cannot be mounted. Mar 20 11:18:17 Chaos emhttpd: online: TOSHIBA_MG09ACA18TE_83D0A00LFJDH (sdt) 512 35156656128 Mar 20 11:18:17 Chaos emhttpd: read SMART /dev/sdt chaos-smart-20250320-1604.zip chaos-smart-20250320-1602.zip
March 21, 20251 yr Community Expert Diags are after rebooting, so the errors are no longer there, based on the snippet posted before, looks more like a power/connection issue, but you be good to have the full log.
March 21, 20251 yr Author What is the best way to add these drives back as one is parity and one is data? Is it best to rebuild the parity drive first or do both at the same time?
March 21, 20251 yr Community Expert Solution You can do both at the same time, but make sure the emulated disk is mounting and contents look correct before rebuilding on top.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.