Parity and ATA errors on SAS2LP-MV8 when GPU installed.


Recommended Posts

Have been loving unRAID for about 3 years now, but recently added a new drive to my array, pre-cleared, then added, then while parity syncing it had corrected errors, after the weekly parity sync, it corrected more errors, which led me to checking the logs to then discover that...

I'm having 'ata' errors on drives attached to my Supermicro SAS2LP-MV8, which is causing parity sync errors in the hundreds.

 

After a week long of troubleshooting, checking cables, buying new cables, reseating the PCI cards, I have isolated the issue to only happen when I have my GTX 670 GPU installed in my machine. I have tried with my SAS card in the PCI-e 2.0 slot and the GPU in the 3.0 slot, and have also tried the other way round, if I have both cards installed I get the errors every 10 minutes or so. I have also tried with the GPU installed but the PCI power from PSU not attached, possibly ruling out a bad PSU.

 

I have had my GPU out for 48 hours now with no errors, so i'm almost certain that having it installed is causing the issues.

 

Unfortunately I don't have the full diagnostics but have the full syslog from when the errors are occurring, as I have came to post in the forums and just now read the read-first post, and worried about my data getting corrupted if I put my GPU back in.

 

I have attached my full syslog from when the errors where occurring, and also my full diagnostics.

 

 

Many thanks in advance for your help.

syslog-witherrors.txt.zip

ness-diagnostics-20160327-0759-noerrors.zip

Link to comment

I meant, are they both being allocated the same IRQ? But, looking at your syslog, it looks like a DMA problem:

 

Mar 23 23:58:44 Ness kernel: dmar: DMAR:[DMA Write] Request device [04:00.0] fault addr ff600000 
Mar 23 23:58:44 Ness kernel: DMAR:[fault reason 05] PTE Write access is not set
Mar 23 23:58:44 Ness kernel: dmar: DRHD: handling fault status reg 3
Mar 23 23:58:44 Ness kernel: dmar: DMAR:[DMA Write] Request device [04:00.0] fault addr ff601000 
Mar 23 23:58:44 Ness kernel: DMAR:[fault reason 05] PTE Write access is not set
Mar 23 23:58:44 Ness kernel: dmar: DRHD: handling fault status reg 3

 

Link to comment

Those look to me like the errors that affect a few users with the SAS2LP and v6.

 

AFAIK there's no solution as it only affects a few users and the issue could not be replicated by LT, the SAS2LP works well for most, myself included.

 

If it was me I'd try to sell it on ebay and buy a LSI based controller like the IBM M1015 or Dell H310, you can probably do it without loosing any money.

 

In the meantime you can try to move all the array disks you can to the onboard ports and see if it gets better.

Link to comment

I have had my controller for about a year and a half now, Originally had it in an old dell server, then moved my unRAID into my gaming machine, and had been running for 6 months with no issues, I added a new drive to the system, all went well, parity sync fine, then a week later was seeing these errors.

Link to comment

Okay after some more research with my machine, I have got my GPU and Marvell controller installed after doing a BIOS update VT-d was disabled and no errors have occurred in 48 hours.

 

So I'm almost certain this is related to this bug here.

https://lime-technology.com/forum/index.php?topic=40683.0

 

Although I'm completely baffled as to why i'm only getting these errors now, and never before.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.