VT-d causing NVME Kernel DMAR Handling Fault


Recommended Posts

Hello! I think I am having a incompatibility between my NVME SSD and UnRaid with VT-d enabled.

 

System Specification 

MB: Asus B560M-Plus

CPU: i5-11600K

RAM: 16GB 3200MHz

NVMe SSD: Micron 2200 256GB

 

 

The NVMe device drops offline after a few minutes when VT-d is enabled. The device is just sitting as an unassigned disk.

 

Testing completed so far without issues with VT-d enabled:

- Ubuntu 21.04 Desktop Try Ubuntu mode.

-- Format Drive ext4

-- Complete read and write benchmarks with Disk Utility.

-- Smartctl returns valid smart data.

-- nvme-cli tests run

- Install Windows 10 on the drive

-- Run benchmarks in Windows 10

 

With VT-d disabled:

- In Unraid I am able to see all the disk information and smart data after clicking on the drive on the main page.

- Run preclear script on the drive and it shows running status and progress

 

With VT-d enabled:

- Clicking on the drive after boot results in the Unraid loading icon on the smart data fields

- Running Preclear script does not start correctly

- Log is filled with on repeat

read SMART /dev/sda
UNRAID1 kernel: dmar_fault: 1406 callbacks suppressed
UNRAID1 kernel: DMAR: DRHD: handling fault status reg 3

It then goes to a timeout, reset and the device is disconected. I have saved my diagnosis zip file for the full log.

 

Other info:

 

I have updated BIOS and Intel ME to the latest in attempt to correct the issue.

 

Details from bios are;

ME FW Version: 15.0.21.1549

System Agent (SA) Configuration
- System Agent Bridge Name: RocketLake

- SA PCIe Code Version 11.1.55.80

unraid1-diagnostics-20210718-2104.zip

Link to comment

I found another setting in BIOS that did not correct any issue previous to updating BIOS and ME.

 

Control Iommu Pre-boot Behavior.

Default is "Disable Iommu"

I set it to "Enable Iommu during boot"

 

The unraid booted fine and the drive was showing the smart status correctly. All values.

 

I was able to start preclear.

 

When it was getting close to being done, I clicked the dev link to the device to go back to the smart settings and the UI shows the same loading animation as previously and the log fills with errors. The preclear script also failed and device drops offline.

 

I have also included the diagnostic zip for this boot. Perhaps I should just return the drive?

 

When I enter bios now after a hot restart, the drive is not listed in nvme menu.

 

A full power off and the drive is back.

 

It seems a strange issue to me, hardware issue with the smart status? How can a prove this with a RMA as the drive works fine when not Unraid.

 

 

unraid1-diagnostics-20210719-1953.zip

Link to comment

Thanks for letting me know.

 

I was only running the preclear script as a test of IO / stability rather than a requirement. I will be sure to stop them in future just before the end of the read test.

 

The temperature I don't believe is an issue because I smashed it with read and write under Windows and ubuntu no issues.

 

For unraid it fails after the system has been on only a few minute total from cold.

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.