VM Crashing unRaid on ROMED8-2T Motherboard


Recommended Posts

Hello,

 

I am trying to create a Windows 10 gaming VM on my unRaid host. The VM is causing the entire host to reboot. This usually occurs when the VM reboots, but it has also happened while installing Windows.

 

Sometimes, but not every time, VM Services > Enable VMs will have switched from "Yes" to "No" after the crash/reboot.

 

  • Using unRaid 6.10.0-rc2
    • (using this version for compatibility with AMD-Vendor-Reset plugin)
  • I have uninstalled AMD-Vendor-Reset and physically removed the GPU for troubleshooting.
  • No GPU is passed thru to VM. Only VNC.
     
  • ROMED8-2T motherboard
  • Epyc 7343 processor
  • Passing thru nvme drive (details below)

 

This motherboard has a ton of BIOS options. I'm thinking there's a BIOS setting that needs changing, but there are a lot of new things in there I haven't heard of.

 

Here is an error message from System Log. It's not always this exact message but it's usually the same theme. PCIe error.

Quote

May 6 19:35:40 SuperKServ kernel: BERT: Error records from previous boot:
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: event severity: fatal
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: Error 0, type: fatal
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: fru_text: PcieError
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: section_type: PCIe error
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: port_type: 4, root port
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: version: 0.2
|May 6 19:35:40 SuperKServ kernel: [Hardware Error]: command: 0x0003, status: 0x0010
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: device_id: 0000:40:01.1
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: slot: 34
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: secondary_bus: 0x41
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: vendor_id: 0x1022, device_id: 0x1483
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: class_code: 060400
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0010
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: aer_uncor_status: 0x00200000, aer_uncor_mask: 0x04004000
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: aer_uncor_severity: 0x00476030
May 6 19:35:40 SuperKServ kernel: [Hardware Error]: TLP Header: 34000000 0000007f 00000001 08000000

 

Here is my VM config:

Spoiler

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>win-vm1</name>
  <uuid>36288546-0904-bf4d-9e6e-aeb596fb8fa2</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='20'/>
    <vcpupin vcpu='2' cpuset='5'/>
    <vcpupin vcpu='3' cpuset='21'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='22'/>
    <vcpupin vcpu='6' cpuset='7'/>
    <vcpupin vcpu='7' cpuset='23'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-6.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/36288546-0904-bf4d-9e6e-aeb596fb8fa2_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Win10_1909_English_x64.iso'/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.190-1.iso'/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:d0:43:0c'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

 

Here are my IOMMU Groups:

Spoiler

IOMMU group 0:[1022:1482] c0:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 1:[1022:1482] c0:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 2:[1022:1482] c0:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 3:[1022:1482] c0:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 4:[1022:1482] c0:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 5:[1022:1482] c0:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 6:[1022:1484] c0:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 7:[1022:1482] c0:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 8:[1022:1484] c0:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 9:[1022:148a] c1:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function

IOMMU group 10:[1022:1498] c1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 11:[1022:1485] c2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP

IOMMU group 12:[1022:1498] c2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 13:[1022:1482] 80:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 14:[1022:1482] 80:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 15:[1022:1482] 80:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 16:[1022:1482] 80:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 17:[1022:1482] 80:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 18:[1022:1482] 80:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 19:[1022:1484] 80:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 20:[1022:1482] 80:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 21:[1022:1484] 80:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 22:[1022:148a] 81:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function

IOMMU group 23:[1022:1498] 81:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 24:[1022:1485] 82:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP

IOMMU group 25:[1022:1498] 82:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 26:[1022:1482] 40:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 27:[1022:1483] 40:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

IOMMU group 28:[1022:1483] 40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

IOMMU group 29:[1022:1483] 40:01.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

IOMMU group 30:[1022:1483] 40:01.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

IOMMU group 31:[1022:1482] 40:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 32:[1022:1482] 40:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 33:[1022:1482] 40:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 34:[1022:1482] 40:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 35:[1022:1482] 40:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 36:[1022:1484] 40:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 37:[1022:1482] 40:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 38:[1022:1484] 40:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 39:[1022:1484] 40:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 40:[144d:a809] 41:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980

This controller is bound to vfio, connected drives are not visible.

IOMMU group 41:[8086:1563] 42:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)

IOMMU group 42:[8086:1563] 42:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)

IOMMU group 43:[1b21:2142] 43:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller

Bus 001 Device 001 Port 1-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 002 Device 001 Port 2-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

IOMMU group 44:[1a03:1150] 44:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)

[1a03:2000] 45:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)

IOMMU group 45:[1022:148a] 46:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function

IOMMU group 46:[1022:1498] 46:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 47:[1022:1485] 47:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP

IOMMU group 48:[1022:1486] 47:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP

IOMMU group 49:[1022:1498] 47:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 50:[1022:148c] 47:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller

Bus 003 Device 001 Port 3-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 003 Device 002 Port 3-1 ID 05e3:0608 Genesys Logic, Inc. Hub

Bus 003 Device 003 Port 3-2 ID 046b:ff01 American Megatrends, Inc. Virtual Hub

Bus 003 Device 004 Port 3-1.1 ID 090c:1000 Silicon Motion, Inc. - Taiwan (formerly Feiya Technology Corp.) Flash Drive

Bus 003 Device 005 Port 3-2.4 ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse

Bus 003 Device 006 Port 3-2.3 ID 046b:ffb0 American Megatrends, Inc. Virtual Ethernet

Bus 004 Device 001 Port 4-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

IOMMU group 51:[1022:1487] 47:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller

IOMMU group 52:[1022:7901] 48:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

[5:0:0:0] disk ATA SanDisk SD8TB8U2 0101 /dev/sdb 256GB

[6:0:0:0] disk ATA ST4000DM004-2CV1 0001 /dev/sdc 4.00TB

[7:0:0:0] disk ATA ST4000DM004-2CV1 0001 /dev/sdd 4.00TB

IOMMU group 53:[1022:1482] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 54:[1022:1482] 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 55:[1022:1482] 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 56:[1022:1483] 00:03.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

IOMMU group 57:[1022:1482] 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 58:[1022:1482] 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 59:[1022:1482] 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 60:[1022:1484] 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 61:[1022:1482] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge

IOMMU group 62:[1022:1484] 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]

IOMMU group 63:[1022:790b] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)

[1022:790e] 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)

IOMMU group 64:[1022:1650] 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 0

[1022:1651] 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 1

[1022:1652] 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 2

[1022:1653] 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 3

[1022:1654] 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 4

[1022:1655] 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 5

[1022:1656] 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 6

[1022:1657] 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 7

IOMMU group 65:[144d:a809] 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980

This controller is bound to vfio, connected drives are not visible.

IOMMU group 66:[1022:148a] 02:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function

IOMMU group 67:[1022:1498] 02:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 68:[1022:1485] 03:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP

IOMMU group 69:[1022:1498] 03:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA

IOMMU group 70:[1022:148c] 03:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller

Bus 005 Device 001 Port 5-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 005 Device 002 Port 5-2 ID 413c:2107 Dell Computer Corp. Dell USB Entry Keyboard

Bus 006 Device 001 Port 6-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

 

Here is my VFIO-PCI Log:

Spoiler

Loading config from /boot/config/vfio-pci.cfg
BIND=0000:41:00.0|144d:a809 0000:01:00.0|144d:a809
---
Processing 0000:41:00.0 144d:a809
Vendor:Device 144d:a809 found at 0000:41:00.0

IOMMU group members (sans bridges):
/sys/bus/pci/devices/0000:41:00.0/iommu_group/devices/0000:41:00.0

Binding...
Successfully bound the device 144d:a809 at 0000:41:00.0 to vfio-pci
---
Processing 0000:01:00.0 144d:a809
Vendor:Device 144d:a809 found at 0000:01:00.0

IOMMU group members (sans bridges):
/sys/bus/pci/devices/0000:01:00.0/iommu_group/devices/0000:01:00.0

Binding...
Successfully bound the device 144d:a809 at 0000:01:00.0 to vfio-pci
---
vfio-pci binding complete

Devices listed in /sys/bus/pci/drivers/vfio-pci:
lrwxrwxrwx 1 root root 0 May 7 00:35 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:03.5/0000:01:00.0
lrwxrwxrwx 1 root root 0 May 7 00:35 0000:41:00.0 -> ../../../../devices/pci0000:40/0000:40:01.1/0000:41:00.0

ROMED8-2T.jpg

Link to comment

Hello it seems related to nvme(s) and its/their power savings.

Try to disable at all the nvme power saving with nvme_core.default_ps_max_latency_us=0 as a kernel parameter.

Modify your syslinux and add it to the append line (for unraid Os without gui):

append nvme_core.default_ps_max_latency_us=0 initrd=/bzroot

 

Link to comment

ghost82 -

 

Thank you for your help.

 

It seems a little more stable but the whole unRaid host still rebooted when the Windows VM tried to reboot for a Windows update. I crashed it twice. The PCIe hardware error is no longer reported in the System Log. I had the VM log open but nothing was logged at the time of the crash.

 

I may be seeing a pattern emerge that I can reboot the Windows VM once, but crash on second reboot. Maybe a coincidence.

 

Here are a few other error messages I see in System Log. I assumed they aren't related.

Spoiler

May  7 10:29:30 SuperKServ kernel: [Firmware Warn]: HEST: Duplicated hardware error source ID: 4096.

May  7 10:29:30 SuperKServ kernel: ERST: Error Record Serialization Table (ERST) support is initialized.

May  7 10:29:33 SuperKServ mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor.  Please use the edac_mce_amd module instead.

May  7 10:31:03 SuperKServ root: Template parsing error: template: :1:24: executing "" at <.AuxiliaryAddresses>: map has no entry for key "AuxiliaryAddresses"
May  7 10:31:03 SuperKServ root: Template parsing error: template: :1:24: executing "" at <.AuxiliaryAddresses>: map has no entry for key "AuxiliaryAddresses"
May  7 10:31:03 SuperKServ root: Template parsing error: template: :1:24: executing "" at <.AuxiliaryAddresses>: map has no entry for key "AuxiliaryAddresses"

 

I am attaching a full Diagnostics report.

 

Thanks,

-Keith

 

superkserv-diagnostics-20220507-1036.zip

Link to comment
19 hours ago, casperse said:

Just want to hear if you got the ROMED8-2T motherboard stable running VM's?

Are you happy with this MB & CPU? - I am asking because I am in the middle of ordering the same MB as an upgrade to my stable Xeon MB

 

@casperse - Short answer is "yes".

 

I was able to pass the NVMEs thru another way and get pretty good performance. I'm not sure what this method is called. It looks like passing thru the raw partitions but it's really the root above the partitions. Using the VirtIO driver yields fast speeds at the expense of CPU cycles. Nice thing about Epyc is that we've got enough cores to afford it. (screenshot attached)

 

I will continue to tinker with passing the bare-metal NVMe thru. I have not been able to pass the bare-metal NVMe controller thru without severe crashes. I was trying to bind them to VFIO and pass them thru like a GPU. I feel like this is an issue with the motherboard or BIOS. Before this board, I was trying to use Supermicro's equivalent. I was able to pass the bare-metal NVMes on the SuperMicro but had to RMA three of them for dead BMCs before getting the AsRockRack ROMED8-2T.

 

The BIOS has a lot of options. It took me a few sessions to get the BIOS dialed in for my workload. It's definitely an enthusiast board. Having said that, I don't think Ryzen 3 requires all the tinkering/optimization that I've read about from Ryzen 1 & 2. I have two of my three GPUs passed thru. I expect the third to work just fine, I just haven't set it up yet. Sometimes I have to disconnect my USB devices to boot after a complete disconnect from power.

Screen Shot 2022-05-14 at 9.39.42 AM.png

Link to comment
2 hours ago, keith8496 said:

 

@casperse - Short answer is "yes".

 

I was able to pass the NVMEs thru another way and get pretty good performance. I'm not sure what this method is called. It looks like passing thru the raw partitions but it's really the root above the partitions. Using the VirtIO driver yields fast speeds at the expense of CPU cycles. Nice thing about Epyc is that we've got enough cores to afford it. (screenshot attached)

 

I will continue to tinker with passing the bare-metal NVMe thru. I have not been able to pass the bare-metal NVMe controller thru without severe crashes. I was trying to bind them to VFIO and pass them thru like a GPU. I feel like this is an issue with the motherboard or BIOS. Before this board, I was trying to use Supermicro's equivalent. I was able to pass the bare-metal NVMes on the SuperMicro but had to RMA three of them for dead BMCs before getting the AsRockRack ROMED8-2T.

 

The BIOS has a lot of options. It took me a few sessions to get the BIOS dialed in for my workload. It's definitely an enthusiast board. Having said that, I don't think Ryzen 3 requires all the tinkering/optimization that I've read about from Ryzen 1 & 2. I have two of my three GPUs passed thru. I expect the third to work just fine, I just haven't set it up yet. Sometimes I have to disconnect my USB devices to boot after a complete disconnect from power.

Screen Shot 2022-05-14 at 9.39.42 AM.png

Thanks for the feedback!

 

I have a Xeon E2100G and even this setup is causing problems during passthrough of the NVidia cards 😞

I am starting to consider a setup like yours with plenty of possibilities for future expansion!

But NOT running Unraid as a bare-metal hypervisor (Like I always have done!)

But instead run Proxmox as my hypervisor and then having all the VM's running here! (Passthrough should be much easier)

And Unraid running as a virtual setup on Proxmox with HW passthrough of all the drives

That way I could re-boot Unraid and keep all my VM running + the backup of VM's and snapshots would be built into Proxmox

Link to comment
1 hour ago, casperse said:

But instead run Proxmox as my hypervisor and then having all the VM's running here! (Passthrough should be much easier)

Obviously you can run the os you want but take into account that it wont be easier.

Both unraid and proxmox are based on qemu+kvm and libvirt, proxmox and unraid are linux oses.

You can obtain the same with any linux distribution as far as qemu and libvirt are installed, so why it should be easier? ;)

 

Link to comment

Linus Tech Tips just built the BIG brother to my rig. He's using the same motherboard with a lot more CPU and RAM. I originally got the idea from his 3- and 7-Gamers 1 CPU videos. I've seen enough of his videos to walk into my project knowing it would be like building a race car in my garage. There will be fun and there will be challenges.
 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.