unraid_chris Posted March 31, 2019 Share Posted March 31, 2019 I've had this problem occur twice now. I get a fault on the 9211 card, and then all my devices start erroring out. The error looks like this -- Mar 31 05:24:59 Tower kernel: mpt2sas_cm0: fault_state(0x2622)! Mar 31 05:24:59 Tower kernel: mpt2sas_cm0: sending diag reset !! Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00) Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: Protocol=( Mar 31 05:25:00 Tower kernel: Initiator Mar 31 05:25:00 Tower kernel: ,Target Mar 31 05:25:00 Tower kernel: ), Mar 31 05:25:00 Tower kernel: Capabilities=( Mar 31 05:25:00 Tower kernel: TLR Mar 31 05:25:00 Tower kernel: ,EEDP Mar 31 05:25:00 Tower kernel: ,Snapshot Buffer Mar 31 05:25:00 Tower kernel: ,Diag Trace Buffer Mar 31 05:25:00 Tower kernel: ,Task Set Full Mar 31 05:25:00 Tower kernel: ,NCQ Mar 31 05:25:00 Tower kernel: ) Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: sending port enable !! After this occurs all my devices have I/O errors -- Mar 31 05:26:20 Tower kernel: print_req_error: I/O error, dev sdr, sector 3264152816 Mar 31 05:26:20 Tower kernel: sd 7:0:4:0: [sdf] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Mar 31 05:26:20 Tower kernel: sd 7:0:17:0: [sds] tag#50 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Mar 31 05:26:20 Tower kernel: sd 7:0:4:0: [sdf] tag#2 CDB: opcode=0x88 88 00 00 00 00 02 2f 8c 76 c8 00 00 00 08 00 00 Mar 31 05:26:20 Tower kernel: sd 7:0:17:0: [sds] tag#50 CDB: opcode=0x88 88 00 00 00 00 00 c2 8f 04 f0 00 00 00 08 00 00 x The motherboard is a X9SCL, in a super micro SC846 (SAS2 backplane), ECC ram The 9211-8i card has the p20 firmware. Has anyone had this occur? I've attached the diagnostics as well tower-diagnostics-20190331-1416.zip Quote Link to comment
JorgeB Posted April 1, 2019 Share Posted April 1, 2019 Try a different PCIe slot, also make sure there is some airflow around it, if that doesn't help it might be a bad or failing controller. Quote Link to comment
Marc_G2 Posted April 21, 2021 Share Posted April 21, 2021 (edited) @unraid_chris did you ever figure out what the problem was exactly? Edited April 21, 2021 by Marc_G2 Quote Link to comment
Marc_G2 Posted April 21, 2021 Share Posted April 21, 2021 (edited) My card is giving the same code. But it started happening right after switching to another mother board. So I'm not sure about the card being at fault. Also in both instances the card was under hardly any load. So overheating doesn't seem likely either https://forums.unraid.net/topic/106631-disk-read-errors-on-multiple-disk-need-help-diagnosing Edited April 21, 2021 by Marc_G2 Quote Link to comment
unraid_chris Posted April 22, 2021 Author Share Posted April 22, 2021 I'm trying to recall exactly what I did. I do remember putting a fan in a bracket above the card to provide more airflow, and I think that helped. I ended up switching away from unRAID and back to ZFS. I'd like the idea and features of unRAID, but with situations like this, or improper shutdowns, completely invalidating the array and requiring a party check, made me a bit uneasy of data loss. After switching to Ubuntu and ZFS, I haven't had any issues. Good luck! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.