Missing iommu dma-alias-v4 patches that resolve marvell controller issues


Recommended Posts

Hi,

 

I was wondering if you could include the dma-alias-v4 patch written by Alex Williamson (https://lkml.org/lkml/2014/5/22/685) in your next kernel please.  The git for the patches is: git://github.com/awilliam/linux-vfio.git dma-alias-v4 .

 

Using Unraid 6b7, when vt-d is enabled, a number of marvell controllers, including the 9172's that supply 6 sata ports on my motherboard, experience errors on startup that leave the drives offline.  The dma-alias-v4 patch only went upstream with 3.17-rc1.  I compiled 3.17-rc2 (was current at the time) for my unraid box and the problem is now resolved.  Given the data corruption regression that has just been found with reiserFS, I would much prefer to be running the stock unraid kernel so I'm confident I have any XEN/KVM/reiserFS patches that you guys have added yourselves but aren't yet upstream such as the ACS override patches.

 

Thanks to the whole Limetech team for the recent impressive progress, Docker and KVM are simply brilliant.

 

Link to comment
  • 3 weeks later...

Wanted to write back on this since we hadn't gotten back to you yet.  I think we investigated this and found that the patch you refer to didn't actually solve the issue after all.  I'll ask Tom about this next week when he returns to see where we left this.  Keep in mind, a patch like this would be HIGHLY scrutinized by the team here.  Alex Williamson is a ROCK STAR, no doubt, but he won't be the one to have to support you if this patch caused any problems with your SATA devices ;-)

 

All of this said, we at least owe you a response on this and will do so next week.  Thanks.

Link to comment

Thanks for the reply Jon.  If you read the changelog for 3.17-rc1 you will see that these patches were included, since they are in the 3.17 kernel now I would expect them to be safe, people have been testing the patch since it's final version (v4) back in May 2014.  I've been compiling my own kernel for unraid to fix the issue, 3.17-rc1 and 3.17-rc4 and these kernels definitely resolved my issue and my unraid has been running on beta-9 with 3.17-rc4 since the day beta9 was released without any issues whatsoever (and with beta8/3.17-rc1 before that).  I have been trying to overcome the dreaded nvidia code 43 error however, which is why I'd prefer to be running the unraid kernel instead - I've obviously missed some other patch that you've included.

 

In order for these patches to have their desired effect, the specific sata controller model that is having an issue also needs a line in the quirks.c file for the patch to have any effect.  In the 3.17-rc1+ kernels at least the quirk.c file has the required lines for my Marvell 9172 controller.  If you are missing the quirks.c line for your affected controllers you will get the same result as not having the patch at all.

 

Given the need to ensure unraid is stable for all I appreciate your concerns.  Perhaps the route to take, at least until 3.17 final is released, is to compile and post a kernel with this patch added so that affected unraiders can make their own call and use that custom kernel instead (accepting all risks as they do).  For me the bug is a total show stopper as 6 of my 12 sata ports are based on 9172 controllers (ga-x79-up4 motherboard), so I can't start my array without the patch and I have too much data to reduce my unraid array by 50% of my total disks.

 

Thanks again for the response, I appreciate knowing you guys are thinking about the issue.

Link to comment

Thanks for the reply Jon.  If you read the changelog for 3.17-rc1 you will see that these patches were included, since they are in the 3.17 kernel now I would expect them to be safe, people have been testing the patch since it's final version (v4) back in May 2014.  I've been compiling my own kernel for unraid to fix the issue, 3.17-rc1 and 3.17-rc4 and these kernels definitely resolved my issue and my unraid has been running on beta-9 with 3.17-rc4 since the day beta9 was released without any issues whatsoever (and with beta8/3.17-rc1 before that).  I have been trying to overcome the dreaded nvidia code 43 error however, which is why I'd prefer to be running the unraid kernel instead - I've obviously missed some other patch that you've included.

 

In order for these patches to have their desired effect, the specific sata controller model that is having an issue also needs a line in the quirks.c file for the patch to have any effect.  In the 3.17-rc1+ kernels at least the quirk.c file has the required lines for my Marvell 9172 controller.  If you are missing the quirks.c line for your affected controllers you will get the same result as not having the patch at all.

 

Given the need to ensure unraid is stable for all I appreciate your concerns.  Perhaps the route to take, at least until 3.17 final is released, is to compile and post a kernel with this patch added so that affected unraiders can make their own call and use that custom kernel instead (accepting all risks as they do).  For me the bug is a total show stopper as 6 of my 12 sata ports are based on 9172 controllers (ga-x79-up4 motherboard), so I can't start my array without the patch and I have too much data to reduce my unraid array by 50% of my total disks.

 

Thanks again for the response, I appreciate knowing you guys are thinking about the issue.

 

Thank you so much for taking the time to share all this information.  I wasn't even aware that this patch had made it into 3.17-rc1 (hadn't looked either, tbh).

 

Tom M gets back on Tuesday and I will review with him then.  As an FYI, the dreaded code 43 error can be resolved with this patch from Alex Williamson:

 

http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00319.html

 

In case you feel adventurous and want to compile another custom kernel and get GPU pass through working ;-).

Link to comment

My apologies, I forgot to link back from here, to another user with the same problem, but with a 9128 board.  See here.  I included a link there to this thread, plus a link to full discussion of the issue and typical error messages and behavior, plus a link to the final patch.  I believe this patch is exactly what he needs, so long as it includes an ID for his 9128.  And I suspect there are a significant number of others with SAS controllers with Marvell chipsets that could also benefit from this patch.

Link to comment

My apologies, I forgot to link back from here, to another user with the same problem, but with a 9128 board.  See here.  I included a link there to this thread, plus a link to full discussion of the issue and typical error messages and behavior, plus a link to the final patch.  I believe this patch is exactly what he needs, so long as it includes an ID for his 9128.  And I suspect there are a significant number of others with SAS controllers with Marvell chipsets that could also benefit from this patch.

 

Thanks Rob.  I'll check with Tom on this next week then.  If there is a large # of users affected and the patch has already been merged into the upstream, I can't see a reason why we wouldn't want to include this.

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.