March 31, 20197 yr I've had this problem occur twice now. I get a fault on the 9211 card, and then all my devices start erroring out. The error looks like this -- Mar 31 05:24:59 Tower kernel: mpt2sas_cm0: fault_state(0x2622)! Mar 31 05:24:59 Tower kernel: mpt2sas_cm0: sending diag reset !! Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00) Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: Protocol=( Mar 31 05:25:00 Tower kernel: Initiator Mar 31 05:25:00 Tower kernel: ,Target Mar 31 05:25:00 Tower kernel: ), Mar 31 05:25:00 Tower kernel: Capabilities=( Mar 31 05:25:00 Tower kernel: TLR Mar 31 05:25:00 Tower kernel: ,EEDP Mar 31 05:25:00 Tower kernel: ,Snapshot Buffer Mar 31 05:25:00 Tower kernel: ,Diag Trace Buffer Mar 31 05:25:00 Tower kernel: ,Task Set Full Mar 31 05:25:00 Tower kernel: ,NCQ Mar 31 05:25:00 Tower kernel: ) Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: sending port enable !! After this occurs all my devices have I/O errors -- Mar 31 05:26:20 Tower kernel: print_req_error: I/O error, dev sdr, sector 3264152816 Mar 31 05:26:20 Tower kernel: sd 7:0:4:0: [sdf] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Mar 31 05:26:20 Tower kernel: sd 7:0:17:0: [sds] tag#50 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Mar 31 05:26:20 Tower kernel: sd 7:0:4:0: [sdf] tag#2 CDB: opcode=0x88 88 00 00 00 00 02 2f 8c 76 c8 00 00 00 08 00 00 Mar 31 05:26:20 Tower kernel: sd 7:0:17:0: [sds] tag#50 CDB: opcode=0x88 88 00 00 00 00 00 c2 8f 04 f0 00 00 00 08 00 00 x The motherboard is a X9SCL, in a super micro SC846 (SAS2 backplane), ECC ram The 9211-8i card has the p20 firmware. Has anyone had this occur? I've attached the diagnostics as well tower-diagnostics-20190331-1416.zip
April 1, 20197 yr Community Expert Try a different PCIe slot, also make sure there is some airflow around it, if that doesn't help it might be a bad or failing controller.
April 21, 20215 yr @unraid_chris did you ever figure out what the problem was exactly? Edited April 21, 20215 yr by Marc_G2
April 21, 20215 yr My card is giving the same code. But it started happening right after switching to another mother board. So I'm not sure about the card being at fault. Also in both instances the card was under hardly any load. So overheating doesn't seem likely either https://forums.unraid.net/topic/106631-disk-read-errors-on-multiple-disk-need-help-diagnosing Edited April 21, 20215 yr by Marc_G2
April 22, 20215 yr Author I'm trying to recall exactly what I did. I do remember putting a fan in a bracket above the card to provide more airflow, and I think that helped. I ended up switching away from unRAID and back to ZFS. I'd like the idea and features of unRAID, but with situations like this, or improper shutdowns, completely invalidating the array and requiring a party check, made me a bit uneasy of data loss. After switching to Ubuntu and ZFS, I haven't had any issues. Good luck!
Archived
This topic is now archived and is closed to further replies.