[SOLVED] Possible controller error, disks going offline.


nicr4wks

Recommended Posts

Hi All,

 

I've recently migrated from 12 x 1tb disks to 4 x 8tb. Also moved to a completely new hardware configuration (from a HP z620 machine to a HP ml350p g8). Was running great for the first couple of weeks but now I'm getting disks going offline 'contents emulated' with what I think might be controller issues.

 

Dec 20 00:04:58 Tower kernel: mpt2sas_cm0: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01)
Dec 20 00:04:58 Tower kernel: mpt2sas_cm0: log_info(0x31110101): originator(PL), code(0x11), sub_code(0x0101)
### [PREVIOUS LINE REPEATED 3 TIMES] ###
Dec 20 00:04:58 Tower kernel: sd 1:0:2:0: [sdd] tag#371 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Dec 20 00:04:58 Tower kernel: sd 1:0:2:0: [sdd] tag#371 CDB: opcode=0x8a 8a 00 00 00 00 03 43 5d 63 18 00 00 04 00 00 00
Dec 20 00:04:58 Tower kernel: print_req_error: I/O error, dev sdd, sector 14015095576
Dec 20 00:04:58 Tower kernel: md: disk0 write error, sector=14015095512

 

I have a 4 bay drive cage connected via mini-sas to a HP expander card, which is linked to a 9210-8i in IT mode.

I also have an 8 bay SFF cage connected to the same expander (sdb-sdk disks are unassigned).

 

I've removed all array disks and re-installed them to different slots, I've also re-seated power and sas cables, this same drive cage and sas cable were working fine on my last hardware configuration.

The issue does not follow the disk and it does not stick with the same drive bay on the cage. Prior to swapping the drives around I was getting the same error on ST8000DM004-2CX188_ZCT3CWNF (sdc) sd 1:0:1:0 AFTER swapping the disks the errors are now on ST8000DM004-2CX188_ZCT3BKZH (sdd) sd 1:0:2:0

 

Before swapping the disks I ran an extended SMART test on ST8000DM004-2CX188_ZCT3CWNF which came back OK.

 

Anyone know what could be causing this?

Thanks.

 

 

tower-diagnostics-20201220-1204.zip

Link to comment
11 hours ago, JorgeB said:

This error is happening to multiple devices:

Good catch, I only focused on the disk that dropped 🤦‍♂️

 

It seems to be all of the disks in the same cage chucking up that error, I might remove the backplane and hook up a sas breakout cable for testing.

Link to comment
  • 2 weeks later...
  • JorgeB changed the title to [SOLVED] Possible controller error, disks going offline.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.