Jump to content
RobJ

Marvell disk controller chipsets and virtualization

85 posts in this topic Last Reply

Recommended Posts

Note: this is NOT an unRAID bug, but a defect in Marvell disk controller chipset modules.

 

unRAID OS Version:  Probably all v6 releases with virtualization support enabled

 

Description:  There is a bad bug in the Marvell code for certain disk controller chipsets that causes connected drives to be unable to communicate when IOMMU is enabled.  If VT-d or AMD-Vi are turned on, then DMA reads fail, and the drives are unavailable.  Once turned off, the drives appear and work fine.  These cards worked fine in v5, probably because IOMMU support was not included, even if enabled in the BIOS.  Because there appears to be a patch that fixes the problem, I believe some priority should be put on integrating this patch (v6.0.N?), because there will be a few v5 users converting to v6 and discovering all drives attached to this card have disappeared (for an example, see this).  The workaround for now is to turn off virtualization.  (I'm not sure if it is only necessary to turn off VT-d/AMD-Vi or whether you also have to turn off VT-x/AMD-V.)

 

How to reproduce:  Run unRAID v6, with virtualization settings turned on in BIOS, and a disk controller (card or onboard) with a Marvell chipset from this list: 9120, 9123, 9125, 9128, 9130, 9143, 9172, 9230 (usually referred to by full model number, e.g. 88SE9123).  I do not know if that list is complete.  Edit: found more - 9215, 9220, 9230, 9485, and a PCI-X card 88SX6081.

 

Expected results:  All connected drives working fine, and able to fully utilize virtualization.

 

Actual results:  No connected drives available, if BIOS virtualization settings turned on.  Drives are available and work fine if virtualization settings are turned off (user turned off 'SVM' in updated Gigabyte BIOS).

 

Other information:

Bug URL:  DMA Read on Marvell 88SE9128 fails when Intel's IOMMU is on (also applies to AMD)

Patch diff URL:  https://bugzilla.kernel.org/attachment.cgi?id=124001&action=diff (there may be a better URL for this patch, and I have no idea how usable this patch is for us)

Example support post:  STLab A-520 fails with Unraid 6.0rc6 (with before and after diagnostics files)

Update:  A potential workaround - some users are reporting success with the following workaround, add iommu=pt to the append line of your syslinux.conf file on your boot flash.

  Example - change this

      append  initrd=/bzroot

  To this

      append  iommu=pt  initrd=/bzroot

Also, try updating to the latest firmware for the card, no guarantees but it *may* help.

 

Symptoms of failure:

For AMD, the following syslog error is logged, on first trying to identify connected drives (don't know yet what appears in Intel syslog):

Jun 13 17:45:52 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=01:00.0 domain=0x0019 address=0x000000000009ea00 flags=0x0000]

 

For each connected drive, a SATA link is usually successful, then one or both of the following sequences of logged entries:

Either -

Jun 13 17:45:52 Tower kernel: ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Jun 13 17:45:52 Tower kernel: ata12.00: qc timeout (cmd 0xec)

Jun 13 17:45:52 Tower kernel: ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Jun 13 17:45:52 Tower kernel: ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Jun 13 17:45:52 Tower kernel: ata12.00: qc timeout (cmd 0xec)

Jun 13 17:45:52 Tower kernel: ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Jun 13 17:45:52 Tower kernel: ata12: limiting SATA link speed to 3.0 Gbps

Jun 13 17:45:52 Tower kernel: ata12.00: qc timeout (cmd 0xec)

Jun 13 17:45:52 Tower kernel: ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Or -

Jun 13 17:45:52 Tower kernel: ata9: link is slow to respond, please be patient (ready=0)

Jun 13 17:45:52 Tower kernel: ata9: COMRESET failed (errno=-16)

Jun 13 17:45:52 Tower kernel: ata9: link is slow to respond, please be patient (ready=0)

Jun 13 17:45:52 Tower kernel: ata9: COMRESET failed (errno=-16)

Jun 13 17:45:52 Tower kernel: ata9: link is slow to respond, please be patient (ready=0)

Jun 13 17:45:52 Tower kernel: ata9: COMRESET failed (errno=-16)

Jun 13 17:45:52 Tower kernel: ata9: limiting SATA link speed to 3.0 Gbps

Jun 13 17:45:52 Tower kernel: ata9: COMRESET failed (errno=-16)

Jun 13 17:45:52 Tower kernel: ata9: reset failed, giving up

 

Share this post


Link to post

Wow nice find and nice report!  This patch will not get into 6.0, but there needs to be 6.0.1 right?

Share this post


Link to post

I'm going to suggest putting a low priority on this one, as so far, only one unRAID user has been affected.  The problem is unmistakeable, so if it was a big deal, then we should have seen more reports, and we haven't.  There was one hint somewhere that the patch *may* already be included, but I couldn't confirm.  More research is clearly needed, but until it becomes a more pressing problem, we all have more important things to do.

Share this post


Link to post

I'm going to suggest putting a low priority on this one, as so far, only one unRAID user has been affected.  The problem is unmistakeable, so if it was a big deal, then we should have seen more reports, and we haven't.  There was one hint somewhere that the patch *may* already be included, but I couldn't confirm.  More research is clearly needed, but until it becomes a more pressing problem, we all have more important things to do.

 

This affected me back in early v6 betas.  I worked around it by using a different machine... 

 

I'd be surprised if there weren't more users affected.

Share this post


Link to post

Works for me in 6.0.0.

 

I had been running with VT-d disabled in BIOS since the first beta I tried in order to get the onboard 88SE9172 to work in 6. Tried enabling VT-d now after RobJ's post, and all my disks are still present and functional. Yay! :-)

 

Share this post


Link to post

Works for me in 6.0.0.

 

I had been running with VT-d disabled in BIOS since the first beta I tried in order to get the onboard 88SE9172 to work in 6. Tried enabling VT-d now after RobJ's post, and all my disks are still present and functional. Yay! :-)

 

This is the second report I've seen that a card that *shouldn't* work (according to outside reports) *does* work.  The other report seemed to be a card with a newer firmware, so perhaps Marvell has quietly fixed the problem (but only available in new purchases).

Share this post


Link to post

...  this is ... a defect in Marvell disk controller chipset modules ...

 

I think it's a Linux driver issue, not an actual chipset defect.    I can't confirm that, but I HAD a GA-P55A-UD5 with a Core i7-870 until late 2013, and ran a BUNCH of VMware VM's on it.    Had both vt-x and vt-d enabled and never had a problem.    The VM's were all run under VMware Workstation running on Windows 7 Ultimate.  That board used the Marvell 88SE9128, so I assume this issue wasn't a problem with Windows.    I also don't know if the symptoms might be different with an AMD processor.

 

 

 

Share this post


Link to post

Works for me in 6.0.0.

 

I had been running with VT-d disabled in BIOS since the first beta I tried in order to get the onboard 88SE9172 to work in 6. Tried enabling VT-d now after RobJ's post, and all my disks are still present and functional. Yay! :-)

 

This is the second report I've seen that a card that *shouldn't* work (according to outside reports) *does* work.  The other report seemed to be a card with a newer firmware, so perhaps Marvell has quietly fixed the problem (but only available in new purchases).

 

I'm more inclined to believe that the kernel now includes fixes than that someone replaced my hardware without me noticing.  :D

 

The 88SE9172 did not work properly in 6b8; drives dropped out immediately on boot after a bunch DMAr errors until I connected the IOMMU dots and disabled VT-d. In 6.0.0 on the other hand the drives still work after enabling VT-d. The only hardware changes between the 6b8 and 6.0.0 has been the addition of a couple of more hard drives and a new NIC. No firmware updates or motherboard replacements...

 

 

 

Share this post


Link to post

All,

 

This issue has affected me, unbeknownst and thankfully I found this post.

 

I raised a separate issue whereby I could not get my HighPoint 620A to work in UnRAID. I followed a script that Elkay14 submitted to no joy. I have tried BIOS update on controller to no joy. Tried different drives/PCIe slots/Cables, different machines, you name it I have tried it. the thing that was bugging me is that the card was being seen in the Device Config in UnRAID but the drives where not appearing in the array.

 

The moment I popped into BIOS and disabled the VT-d setting my drives appeared in UnRAID, so this is very much a relevant issue.

 

 

Share this post


Link to post

Very odd. I have a 9120 in my HP 620, but it is about 2 years? old now ... maybe that is why it still works ... an older non-buggy bios.

Share this post


Link to post

Hello

I was going to upgrade my main unRaid Server this weekend

 

ran parity checks, moved files around this week, moved plugins to my backup server, prepared a new parity drive to add ...

 

I have two AOC-SASLP-MV8 cards in it and was curious if they were affected by this bug?

these should be 6480 chips

and what should be Rosewill 4 port PCI-E card or I might just be using the MB ports

(not in the list in the first post)

 

(and another server with two AOC-SASLP-MV8 cards in it under esxi 5 update 1) these should be 9480 chips

(not in the list in the first post)

but I am very sure I have had a card in this server at one time that had one of the listed chipsets

 

I want to move all servers from unRaid 5.05 to unRaid 6 with Hardware Virtualization and Dockers eventually

 

I'm asking because the preclears I'm running to torture test the drive should be finished at 5:55pm PDT tonight

 

When I will begin the process

 

 

 

Linux 3.9.11p-unRAID.
root@storage:~# cd /boot/custom
root@storage:/boot/custom# lspci
00:00.0 Host bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external gfx0 port B) (rev 02)
00:00.2 Generic system peripheral [0806]: ATI Technologies Inc Unknown device 5a23
00:02.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port B)
00:03.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C)
00:09.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port H)
00:0a.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external gfx1 port A)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller (rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:15.0 PCI bridge: ATI Technologies Inc Unknown device 43a0
00:15.1 PCI bridge: ATI Technologies Inc Unknown device 43a1
00:15.2 PCI bridge: ATI Technologies Inc Unknown device 43a2
00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Unknown device 1600
00:18.1 Host bridge: Advanced Micro Devices [AMD] Unknown device 1601
00:18.2 Host bridge: Advanced Micro Devices [AMD] Unknown device 1602
00:18.3 Host bridge: Advanced Micro Devices [AMD] Unknown device 1603
00:18.4 Host bridge: Advanced Micro Devices [AMD] Unknown device 1604
00:18.5 Host bridge: Advanced Micro Devices [AMD] Unknown device 1605
01:00.0 RAID bus controller: Unknown device 1b4b:9485 (rev c3)
02:00.0 RAID bus controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)
03:00.0 USB Controller: Unknown device 1b6f:7023 (rev 01)
04:00.0 SATA controller: Unknown device 1b4b:9172 (rev 11)
05:06.0 VGA compatible controller: nVidia Corporation NV44A [GeForce 6200] (rev a1)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
07:00.0 USB Controller: Unknown device 1b6f:7023 (rev 01)
08:00.0 SATA controller: Unknown device 1b4b:9172 (rev 11)
root@storage:/boot/custom# 

 

Thanks for your time and help,

Bobby

 

and if they're working without trouble I will update so others know they're working

Share this post


Link to post

01:00.0 RAID bus controller: Unknown device 1b4b:9485 (rev c3)

02:00.0 RAID bus controller: Marvell Technology Group Ltd. MV64460/64461/64462 System Controller, Revision B (rev 01)

04:00.0 SATA controller: Unknown device 1b4b:9172 (rev 11)

08:00.0 SATA controller: Unknown device 1b4b:9172 (rev 11)

and if they're working without trouble I will update so others know they're working

 

According to your lspci report, you have 2 of the suspect cards, so it will be interesting to hear of your results.  The suspect controllers usually have 1b4b: plus the number from the list above, yours is 9172.

Share this post


Link to post

Hello

 

it looks like my upgrade went smoothly so far

 

How long did it take for those errors to start popping up? 

(immediately, with missing drives?)

 

I can see and access all drives in unRaid 6.0.1,

so currently I'm not sure if the patch has already been applied

 

I rebooted a few times to see if I had missed something

I double checked iommu was enabled,

in the GigaByte EFI I could only find one setting - IOMMU Enabled

 

 

I do not have Dockers or VM's enabled yet so I'm not sure if that would hide the errors

 

 

Included is my diagnostics file from this morning

 

For the upgrade

I shut down the server and made several copies of the USB key's contents in different easily accessible places

made a V5 directory on the USB Key and copied all files / folders from the root directory under it

downloaded and unziped Unraid 6.0.1

Copied the files and folders from that directory to the root of the USB drive

ran make_bootable.bat with "Run as administrator" on the USB Key

then from the old config directory I copied the old shares directory and the disk.cfg, ident.cfg, passwd, Pro2.key, shadow, share.cfg, smb-extra.conf, smbpasswd, super.dat, super.old files all to the new config directory.

Ejected the USB key and waited for it to say it was safe to remove

inserted the USB key and booted my primary unRaid server.

 

It booted and recognized all disks and started the array

 

I spent several hours looking for something that went wrong ... VFS Recycle even seems to be working smoothly.

 

An Incredibly easy upgrade, Thank you to the people at Limetech

 

Hope this helps

Bobby

storage-diagnostics-20150627-0809.zip

Share this post


Link to post

All reports indicate the drives would not show up at all, from the very start, so you are fine!  Thanks for the diagnostics, when I have a little time, I'll take a look.

Share this post


Link to post

On Monday morning if times permits I'll try copying 6.0.0 on to the server and see if there is a difference

 

I'll video the BIOS POST to make sure what cards throw their BIOS screens up for confirmation of ID's

Share this post


Link to post

I have a Supermicro SAS and a Supermicro SAS2 installed. Both have the 9480 chipset but I don't see anything like that in my syslog. I'm having a very hard time getting a parity done, so something is wrong somewhere.

Share this post


Link to post

Are the system details in your sig correct?

 

i.e. you're still using RC4 and the Intel DQ35MP?

 

Did this combination work okay with v5?    ... and if so, has it always had issues after you upgraded to v6?  [and if the info in your sig is correct, WHY are you using RC4 instead of the final release?]

 

 

Share this post


Link to post

I have a Supermicro SAS and a Supermicro SAS2 installed. Both have the 9480 chipset but I don't see anything like that in my syslog. I'm having a very hard time getting a parity done, so something is wrong somewhere.

 

This bug does not apply to you.  I don't know of any reports concerning the 9480, just the ones listed in the first post.  And if it did apply, you wouldn't be complaining about parity, you wouldn't see any of the attached drives at all.

Share this post


Link to post

Are the system details in your sig correct?

 

i.e. you're still using RC4 and the Intel DQ35MP?

 

Did this combination work okay with v5?    ... and if so, has it always had issues after you upgraded to v6?  [and if the info in your sig is correct, WHY are you using RC4 instead of the final release?]

 

In the transition of a build, so my signature isn't correct. Always worked good on V5 and seems to be ok on V6. No errors in my syslog.

Share this post


Link to post

I have a Supermicro SAS and a Supermicro SAS2 installed. Both have the 9480 chipset but I don't see anything like that in my syslog. I'm having a very hard time getting a parity done, so something is wrong somewhere.

 

This bug does not apply to you.  I don't know of any reports concerning the 9480, just the ones listed in the first post.  And if it did apply, you wouldn't be complaining about parity, you wouldn't see any of the attached drives at all.

 

Ok thanks. Still checking why parity so slow.

 

Share this post


Link to post

I have a Supermicro SAS and a Supermicro SAS2 installed. Both have the 9480 chipset but I don't see anything like that in my syslog. I'm having a very hard time getting a parity done, so something is wrong somewhere.

 

This bug does not apply to you.  I don't know of any reports concerning the 9480, just the ones listed in the first post.  And if it did apply, you wouldn't be complaining about parity, you wouldn't see any of the attached drives at all.

 

Hi RobJ,

I have one of the Supermicro SAS2LP-MV8 cards with the 9480 chipset and i am getting this issue when i have IOMMU enabled. I have been running ok with IOMMU disabled while on the betas since getting the card but I just upgraded to 6.0.1 and re-tested and I am still getting the problem.

I seem to recall there was meant to be a patch for this problem introduced into kernel 3.17 (based on this post: http://lime-technology.com/forum/index.php?topic=35190.0) any updates on this or workaround for this problem on the horizon?

Thanks

syslog.txt

lspci.txt

Share this post


Link to post

I have one of the Supermicro SAS2LP-MV8 cards with the 9480 chipset and i am getting this issue when i have IOMMU enabled. I have been running ok with IOMMU disabled while on the betas since getting the card but I just upgraded to 6.0.1 and re-tested and I am still getting the problem.

I seem to recall there was meant to be a patch for this problem introduced into kernel 3.17 (based on this post: http://lime-technology.com/forum/index.php?topic=35190.0) any updates on this or workaround for this problem on the horizon?

 

I can confirm you do have the issue.  The chipset appears to be a 9485 (possibly in the 9480 family), and I have added it to the first post.

 

For now, you cannot turn on IOMMU.

 

I don't know what the current state of the patch is, as it's rather confusing presently.  I suspect a version of the patch is in now, added by the kernel people, and while it may have helped some users, it did not help every one.  Do make sure you have the latest firmware for the card.

Share this post


Link to post

This probably explains why my SATA card (Startech 4-port) didn't work, I thought the card had died.  I just used a different, non-Marvell card.

 

Oddly, the onboard Marvell on my X10SBA works fine.

Share this post


Link to post

I have a Supermicro SAS and a Supermicro SAS2 installed. Both have the 9480 chipset but I don't see anything like that in my syslog. I'm having a very hard time getting a parity done, so something is wrong somewhere.

 

This bug does not apply to you.  I don't know of any reports concerning the 9480, just the ones listed in the first post.  And if it did apply, you wouldn't be complaining about parity, you wouldn't see any of the attached drives at all.

 

Hi RobJ,

I have one of the Supermicro SAS2LP-MV8 cards with the 9480 chipset and i am getting this issue when i have IOMMU enabled. I have been running ok with IOMMU disabled while on the betas since getting the card but I just upgraded to 6.0.1 and re-tested and I am still getting the problem.

I seem to recall there was meant to be a patch for this problem introduced into kernel 3.17 (based on this post: http://lime-technology.com/forum/index.php?topic=35190.0) any updates on this or workaround for this problem on the horizon?

Thanks

 

I'm running two SAS2LP cards with no problems. Been very stable and have IOMMU enabled. I guess it is with certain hardware and not all.

Share this post


Link to post

I have Gigabyte X58A-UD3R motherboard and I have this problem.

Motherboard has 2 SATA-ports with Marvell 9128 controller.

unRAID 6.1.2

 

Any idea how to apply that patch? Or do you plan to update unRAID with this patch?

 

 

Sep 17 17:48:52 UNRAID kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 17 17:48:52 UNRAID kernel: ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata18.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 17 17:48:52 UNRAID kernel: ata18: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 17 17:48:52 UNRAID kernel: ata11.00: qc timeout (cmd 0xec)
Sep 17 17:48:52 UNRAID kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata11: limiting SATA link speed to 3.0 Gbps
Sep 17 17:48:52 UNRAID kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Sep 17 17:48:52 UNRAID kernel: ata12.00: qc timeout (cmd 0xec)
Sep 17 17:48:52 UNRAID kernel: ata18.00: qc timeout (cmd 0xa1)
Sep 17 17:48:52 UNRAID kernel: ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata12: limiting SATA link speed to 3.0 Gbps
Sep 17 17:48:52 UNRAID kernel: ata18.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata18: limiting SATA link speed to 1.5 Gbps
Sep 17 17:48:52 UNRAID kernel: ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Sep 17 17:48:52 UNRAID kernel: ata18: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep 17 17:48:52 UNRAID kernel: ata11.00: qc timeout (cmd 0xec)
Sep 17 17:48:52 UNRAID kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Sep 17 17:48:52 UNRAID kernel: ata12.00: qc timeout (cmd 0xec)
Sep 17 17:48:52 UNRAID kernel: ata18.00: qc timeout (cmd 0xa1)
Sep 17 17:48:52 UNRAID kernel: ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata18.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Sep 17 17:48:52 UNRAID kernel: ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Sep 17 17:48:52 UNRAID kernel: ata18: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.