kernel: mpt2sas_cm0: fault_state(0x2622)!


Recommended Posts

I've had this problem occur twice now.


I get a fault on the 9211 card, and then all my devices start erroring out.

 

The error looks like this --

 

Mar 31 05:24:59 Tower kernel: mpt2sas_cm0: fault_state(0x2622)!
Mar 31 05:24:59 Tower kernel: mpt2sas_cm0: sending diag reset !!
Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS
Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: Protocol=(
Mar 31 05:25:00 Tower kernel: Initiator
Mar 31 05:25:00 Tower kernel: ,Target
Mar 31 05:25:00 Tower kernel: ), 
Mar 31 05:25:00 Tower kernel: Capabilities=(
Mar 31 05:25:00 Tower kernel: TLR
Mar 31 05:25:00 Tower kernel: ,EEDP
Mar 31 05:25:00 Tower kernel: ,Snapshot Buffer
Mar 31 05:25:00 Tower kernel: ,Diag Trace Buffer
Mar 31 05:25:00 Tower kernel: ,Task Set Full
Mar 31 05:25:00 Tower kernel: ,NCQ
Mar 31 05:25:00 Tower kernel: )
Mar 31 05:25:00 Tower kernel: mpt2sas_cm0: sending port enable !!

 

After this occurs all my devices have I/O errors --

Mar 31 05:26:20 Tower kernel: print_req_error: I/O error, dev sdr, sector 3264152816
Mar 31 05:26:20 Tower kernel: sd 7:0:4:0: [sdf] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
Mar 31 05:26:20 Tower kernel: sd 7:0:17:0: [sds] tag#50 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
Mar 31 05:26:20 Tower kernel: sd 7:0:4:0: [sdf] tag#2 CDB: opcode=0x88 88 00 00 00 00 02 2f 8c 76 c8 00 00 00 08 00 00
Mar 31 05:26:20 Tower kernel: sd 7:0:17:0: [sds] tag#50 CDB: opcode=0x88 88 00 00 00 00 00 c2 8f 04 f0 00 00 00 08 00 00

x

The motherboard is a X9SCL, in a super micro SC846 (SAS2 backplane), ECC ram

 

The 9211-8i card has the p20 firmware.

 

Has anyone had this occur? 

 

I've attached the diagnostics as well

tower-diagnostics-20190331-1416.zip

Link to comment
  • 2 years later...

My card is giving the same code.  But it started happening right after switching to another mother board.  So I'm not sure about the card being at fault.  Also in both instances the card was under hardly any load.  So overheating doesn't seem likely either

 

https://forums.unraid.net/topic/106631-disk-read-errors-on-multiple-disk-need-help-diagnosing

 

Edited by Marc_G2
Link to comment

I'm trying to recall exactly what I did.  I do remember putting a fan in a bracket above the card to provide more airflow, and I think that helped.

 

I ended up switching away from unRAID and back to ZFS.  I'd like the idea and features of unRAID, but with situations like this, or improper shutdowns, completely invalidating the array and requiring a party check, made me a bit uneasy of data loss.  After switching to Ubuntu and ZFS, I haven't had any issues.

 

Good luck!

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.