9206-16e and disk errors


Recommended Posts

Ive recently upgraded my server to use 2 9206-16e drives, flashed with the latest firmware.  Ive got 2 systems that have this setup and both are experiencing the same issues. One system has large disks, most around 8TB, the one has only 2TB disks.  When a parity check is issued, the parity disk will throw read errors.  I am really at a loss.  Attached are diagnostics.

 

Previously I had the 9201-16E card in without issues. I would have kept the card but I needed to upgrade to a half-height card due to server only having 1 full height and 1 half height PCI-E slot.

unraid-backup-diagnostics-20200101-0823.zip

Edited by icemansid
Link to comment
Jan  1 04:42:52 UnRAID-Backup kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Jan  1 04:42:52 UnRAID-Backup kernel: sd 4:0:1:0: Power-on or device reset occurred
Jan  1 04:42:52 UnRAID-Backup kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
### [PREVIOUS LINE REPEATED 2 TIMES] ###
Jan  1 04:42:52 UnRAID-Backup kernel: sd 4:0:1:0: Power-on or device reset occurred
Jan  1 04:42:53 UnRAID-Backup kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
### [PREVIOUS LINE REPEATED 3 TIMES] ###
Jan  1 04:42:53 UnRAID-Backup kernel: sd 4:0:1:0: Power-on or device reset occurred
Jan  1 04:42:53 UnRAID-Backup kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
### [PREVIOUS LINE REPEATED 3 TIMES] ###
Jan  1 04:42:53 UnRAID-Backup kernel: sd 4:0:1:0: Power-on or device reset occurred
Jan  1 04:42:54 UnRAID-Backup kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)

 

 

This looks more a like a power/connection issue, try replacing/swapping cables/backplanes/PSU

 

 

Link to comment

Seems you run in SATA protocol whole path in those cable ( 8088, 8087, 8644 ), if true, you must change the design.

 

I connect HBA (2 port) to external chassis by two 2 meter cable,  then input to a expander and fan out to 12 disks.

 

So HBA to Expander was SAS protocol and last mile was SATA protocol. Never make SATA run too long.

Edited by Benson
Link to comment
21 minutes ago, johnnie.black said:

Benson is correct, unless you're using a SAS expander max total cable length from HBA to SATA disks is 1 meter, any longer and you'll have issues.

 

 

Interesting.  I'm using 3 x SFF8088 -> 4 Sata from a 9201-16e at 1.5m with no problems.  I guess YMMV

Link to comment
3 hours ago, Benson said:

Seems you run in SATA protocol whole path in those cable ( 8088, 8087, 8644 ), if true, you must change the design.

 

I connect HBA (2 port) to external chassis by two 2 meter cable,  then input to a expander and fan out to 12 disks.

 

So HBA to Expander was SAS protocol and last mile was SATA protocol. Never make SATA run too long.

Can you expand on what SAS expander you are using and is there any IO limitation by accessing 12 disks over a single SAS cable?  Also what HBA are you using for this?

Link to comment
11 minutes ago, icemansid said:

Can you expand on what SAS expander you are using and is there any IO limitation by accessing 12 disks over a single SAS cable?  Also what HBA are you using for this?

You may ref below post

 

One link may have some performance hit for 12disks, SAS2, 6Gx4 ~ 2GB/s bandwidth, so each disk got 166MB/s

 

HBA was 9211-8i, the 2 meter cable was a 8087 to 8087.

Edited by Benson
Link to comment

Quick update - the replacement cables are in and so far, issues seem to have been resolved, even though I am over the 1M total length spec. Currently at 1.5M but running a parity check and no errors. I also ordered a SAS expander so will be implementing that later this week as well.

Link to comment

Update #2 - for some reason, single disk speeds are normal, but when reading from all disks, speeds are slow.  This was apparent with a parity check and using the DiskSpeed container and benchmarking all disks on the controller. By all accounts, the new SAS card should be much faster than the old SAS card.

diskspeed.JPG

diskspeed2.JPG

Link to comment

Check if you have write cache enabled on your drives:

hdparm -W 1 /dev/sdX   -> enables write cache

hdparm -W 0 /dev/sdX  -> disables write cache

 

Do this for all your drives, maybe that solves that problem, could be related to their issues:

https://forums.unraid.net/topic/79966-enable-write-cache/

https://forums.unraid.net/topic/80074-sata-parity-write-cache-disabled/page/2/

 

Edited by RedReddington
Link to comment
37 minutes ago, RedReddington said:

Check if you have write cache enabled on your drives:

hdparm -W 1 /dev/sdX   -> enables write cache

hdparm -W 0 /dev/sdX  -> disables write cache

 

Do this for all your drives, maybe that solves that problem, could be related to their issues:

https://forums.unraid.net/topic/79966-enable-write-cache/

https://forums.unraid.net/topic/80074-sata-parity-write-cache-disabled/page/2/

 

No real change in multi-drive reads.  It does appear to have increased the single drive speed reads though.  Also I want to mention - the ONLY thing that I changed was the SAS card and cables. 

40 minutes ago, johnnie.black said:

Check HBA link speed/width, I believe diskspeed shows current one, if not you can check with lspci -vv

Ive added that to the image.  Comparing that to my other system with H310 cards installed, this one should be faster.

 

diskspeed3.JPG

Link to comment
5 minutes ago, icemansid said:

Comparing that to my other system with H310 cards installed, this one should be faster.

It wounld't be faster since 8 disks won't use all the available bandwidth on a Perc H310, but it shouldn't be slower, strange combined speed being so low, if you want post the diags, might be something visible there.

Link to comment
5 minutes ago, johnnie.black said:

It wounld't be faster since 8 disks won't use all the available bandwidth on a Perc H310, but it shouldn't be slower, strange combined speed being so low, if you want post the diags, might be something visible there.

diags are at the top of this post - haven't rebooted since but i can pull a new one if you like.  There are a ton of drive errors which have been resolved due to bad SAS cables but speeds are still way off from previous SAS card.  The H310 shows max throughput of only 4 GB/s compared to the 6GB/s on the 2308.

Link to comment
11 minutes ago, Benson said:

I am not family with disk speed docker, but seems abnormal of simultaneous bandwidth only 347MB/s.

Here is the same benchmark from the H310 - technically slower card - though it has much better/faster drives attached to it. Single disk reads are nearly identical to multi-disk reads. This is what I would expect the 2308 card to perform like.

diskspeed4.JPG

Link to comment
12 minutes ago, icemansid said:

Here is the same benchmark from the H310 - technically slower card - though it has much better/faster drives attached to it. Single disk reads are nearly identical to multi-disk reads. This is what I would expect the 2308 card to perform like.

diskspeed4.JPG

I know, I haven't speed bottleneck for 2008 and 2308 chip to connect 12 and 16 disks which have same result as above.

 

9206-16e have two 2308 chips, I suppose it should have a PCIe switch on HBA in order to connect two chips, there may be some compatibility issue with mother board.

 

Or could you check relate overheat on HBA ?? But I am not sure HBA would or wont throttle because I never see that.

Edited by Benson
Link to comment
28 minutes ago, Benson said:

I know, I haven't speed bottleneck for 2008 and 2308 chip to connect 12 and 16 disks which have same result as above.

 

9206-16e have two 2308 chips, I suppose it should have a PCIe switch on HBA in order to connect two chips, there may be some compatibility issue with mother board.

 

Or could you check relate overheat on HBA ?? But I am not sure HBA would or wont throttle because I never see that.

It has 4 ports on it - 2 connect to 1 chip, two connect to the other.

 

Also on the over-heating - its unlikely as this card is in a 1U HP enterprise server. It has enough fan power to take flight.

Edited by icemansid
more detials
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.