keith8496 Posted May 7, 2022 Share Posted May 7, 2022 Hello, I am trying to create a Windows 10 gaming VM on my unRaid host. The VM is causing the entire host to reboot. This usually occurs when the VM reboots, but it has also happened while installing Windows. Sometimes, but not every time, VM Services > Enable VMs will have switched from "Yes" to "No" after the crash/reboot. Using unRaid 6.10.0-rc2 (using this version for compatibility with AMD-Vendor-Reset plugin) I have uninstalled AMD-Vendor-Reset and physically removed the GPU for troubleshooting. No GPU is passed thru to VM. Only VNC. ROMED8-2T motherboard Epyc 7343 processor Passing thru nvme drive (details below) This motherboard has a ton of BIOS options. I'm thinking there's a BIOS setting that needs changing, but there are a lot of new things in there I haven't heard of. Here is an error message from System Log. It's not always this exact message but it's usually the same theme. PCIe error. Quote May 6 19:35:40 SuperKServ kernel: BERT: Error records from previous boot: May 6 19:35:40 SuperKServ kernel: [Hardware Error]: event severity: fatal May 6 19:35:40 SuperKServ kernel: [Hardware Error]: Error 0, type: fatal May 6 19:35:40 SuperKServ kernel: [Hardware Error]: fru_text: PcieError May 6 19:35:40 SuperKServ kernel: [Hardware Error]: section_type: PCIe error May 6 19:35:40 SuperKServ kernel: [Hardware Error]: port_type: 4, root port May 6 19:35:40 SuperKServ kernel: [Hardware Error]: version: 0.2 |May 6 19:35:40 SuperKServ kernel: [Hardware Error]: command: 0x0003, status: 0x0010 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: device_id: 0000:40:01.1 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: slot: 34 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: secondary_bus: 0x41 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: vendor_id: 0x1022, device_id: 0x1483 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: class_code: 060400 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0010 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: aer_uncor_status: 0x00200000, aer_uncor_mask: 0x04004000 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: aer_uncor_severity: 0x00476030 May 6 19:35:40 SuperKServ kernel: [Hardware Error]: TLP Header: 34000000 0000007f 00000001 08000000 Here is my VM config: Spoiler <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm'> <name>win-vm1</name> <uuid>36288546-0904-bf4d-9e6e-aeb596fb8fa2</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>16777216</memory> <currentMemory unit='KiB'>16777216</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='20'/> <vcpupin vcpu='2' cpuset='5'/> <vcpupin vcpu='3' cpuset='21'/> <vcpupin vcpu='4' cpuset='6'/> <vcpupin vcpu='5' cpuset='22'/> <vcpupin vcpu='6' cpuset='7'/> <vcpupin vcpu='7' cpuset='23'/> </cputune> <os> <type arch='x86_64' machine='pc-q35-6.1'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/36288546-0904-bf4d-9e6e-aeb596fb8fa2_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> </features> <cpu mode='host-passthrough' check='none' migratable='on'> <topology sockets='1' dies='1' cores='4' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/Win10_1909_English_x64.iso'/> <target dev='hda' bus='sata'/> <readonly/> <boot order='2'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/virtio-win-0.1.190-1.iso'/> <target dev='hdb' bus='sata'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='usb' index='0' model='qemu-xhci' ports='15'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='1' port='0x10'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/> </controller> <controller type='pci' index='2' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='2' port='0x11'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/> </controller> <controller type='pci' index='3' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='3' port='0x12'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/> </controller> <controller type='pci' index='4' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='4' port='0x13'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:d0:43:0c'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <audio id='1' type='none'/> <video> <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </video> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </hostdev> <memballoon model='none'/> </devices> </domain> Here are my IOMMU Groups: Spoiler IOMMU group 0:[1022:1482] c0:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 1:[1022:1482] c0:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 2:[1022:1482] c0:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 3:[1022:1482] c0:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 4:[1022:1482] c0:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 5:[1022:1482] c0:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 6:[1022:1484] c0:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 7:[1022:1482] c0:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 8:[1022:1484] c0:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 9:[1022:148a] c1:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function IOMMU group 10:[1022:1498] c1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 11:[1022:1485] c2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP IOMMU group 12:[1022:1498] c2:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 13:[1022:1482] 80:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 14:[1022:1482] 80:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 15:[1022:1482] 80:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 16:[1022:1482] 80:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 17:[1022:1482] 80:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 18:[1022:1482] 80:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 19:[1022:1484] 80:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 20:[1022:1482] 80:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 21:[1022:1484] 80:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 22:[1022:148a] 81:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function IOMMU group 23:[1022:1498] 81:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 24:[1022:1485] 82:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP IOMMU group 25:[1022:1498] 82:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 26:[1022:1482] 40:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 27:[1022:1483] 40:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 28:[1022:1483] 40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 29:[1022:1483] 40:01.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 30:[1022:1483] 40:01.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 31:[1022:1482] 40:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 32:[1022:1482] 40:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 33:[1022:1482] 40:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 34:[1022:1482] 40:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 35:[1022:1482] 40:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 36:[1022:1484] 40:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 37:[1022:1482] 40:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 38:[1022:1484] 40:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 39:[1022:1484] 40:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 40:[144d:a809] 41:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980 This controller is bound to vfio, connected drives are not visible. IOMMU group 41:[8086:1563] 42:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01) IOMMU group 42:[8086:1563] 42:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01) IOMMU group 43:[1b21:2142] 43:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller Bus 001 Device 001 Port 1-0 ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001 Port 2-0 ID 1d6b:0003 Linux Foundation 3.0 root hub IOMMU group 44:[1a03:1150] 44:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04) [1a03:2000] 45:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41) IOMMU group 45:[1022:148a] 46:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function IOMMU group 46:[1022:1498] 46:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 47:[1022:1485] 47:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP IOMMU group 48:[1022:1486] 47:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP IOMMU group 49:[1022:1498] 47:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 50:[1022:148c] 47:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller Bus 003 Device 001 Port 3-0 ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002 Port 3-1 ID 05e3:0608 Genesys Logic, Inc. Hub Bus 003 Device 003 Port 3-2 ID 046b:ff01 American Megatrends, Inc. Virtual Hub Bus 003 Device 004 Port 3-1.1 ID 090c:1000 Silicon Motion, Inc. - Taiwan (formerly Feiya Technology Corp.) Flash Drive Bus 003 Device 005 Port 3-2.4 ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 003 Device 006 Port 3-2.3 ID 046b:ffb0 American Megatrends, Inc. Virtual Ethernet Bus 004 Device 001 Port 4-0 ID 1d6b:0003 Linux Foundation 3.0 root hub IOMMU group 51:[1022:1487] 47:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller IOMMU group 52:[1022:7901] 48:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) [5:0:0:0] disk ATA SanDisk SD8TB8U2 0101 /dev/sdb 256GB [6:0:0:0] disk ATA ST4000DM004-2CV1 0001 /dev/sdc 4.00TB [7:0:0:0] disk ATA ST4000DM004-2CV1 0001 /dev/sdd 4.00TB IOMMU group 53:[1022:1482] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 54:[1022:1482] 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 55:[1022:1482] 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 56:[1022:1483] 00:03.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge IOMMU group 57:[1022:1482] 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 58:[1022:1482] 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 59:[1022:1482] 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 60:[1022:1484] 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 61:[1022:1482] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge IOMMU group 62:[1022:1484] 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] IOMMU group 63:[1022:790b] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) [1022:790e] 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) IOMMU group 64:[1022:1650] 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 0 [1022:1651] 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 1 [1022:1652] 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 2 [1022:1653] 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 3 [1022:1654] 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 4 [1022:1655] 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 5 [1022:1656] 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 6 [1022:1657] 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Milan Data Fabric; Function 7 IOMMU group 65:[144d:a809] 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980 This controller is bound to vfio, connected drives are not visible. IOMMU group 66:[1022:148a] 02:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function IOMMU group 67:[1022:1498] 02:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 68:[1022:1485] 03:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP IOMMU group 69:[1022:1498] 03:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PTDMA IOMMU group 70:[1022:148c] 03:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller Bus 005 Device 001 Port 5-0 ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 005 Device 002 Port 5-2 ID 413c:2107 Dell Computer Corp. Dell USB Entry Keyboard Bus 006 Device 001 Port 6-0 ID 1d6b:0003 Linux Foundation 3.0 root hub Here is my VFIO-PCI Log: Spoiler Loading config from /boot/config/vfio-pci.cfg BIND=0000:41:00.0|144d:a809 0000:01:00.0|144d:a809 --- Processing 0000:41:00.0 144d:a809 Vendor:Device 144d:a809 found at 0000:41:00.0 IOMMU group members (sans bridges): /sys/bus/pci/devices/0000:41:00.0/iommu_group/devices/0000:41:00.0 Binding... Successfully bound the device 144d:a809 at 0000:41:00.0 to vfio-pci --- Processing 0000:01:00.0 144d:a809 Vendor:Device 144d:a809 found at 0000:01:00.0 IOMMU group members (sans bridges): /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices/0000:01:00.0 Binding... Successfully bound the device 144d:a809 at 0000:01:00.0 to vfio-pci --- vfio-pci binding complete Devices listed in /sys/bus/pci/drivers/vfio-pci: lrwxrwxrwx 1 root root 0 May 7 00:35 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:03.5/0000:01:00.0 lrwxrwxrwx 1 root root 0 May 7 00:35 0000:41:00.0 -> ../../../../devices/pci0000:40/0000:40:01.1/0000:41:00.0 Quote Link to comment
ghost82 Posted May 7, 2022 Share Posted May 7, 2022 Hello it seems related to nvme(s) and its/their power savings. Try to disable at all the nvme power saving with nvme_core.default_ps_max_latency_us=0 as a kernel parameter. Modify your syslinux and add it to the append line (for unraid Os without gui): append nvme_core.default_ps_max_latency_us=0 initrd=/bzroot Quote Link to comment
keith8496 Posted May 7, 2022 Author Share Posted May 7, 2022 ghost82 - Thank you for your help. It seems a little more stable but the whole unRaid host still rebooted when the Windows VM tried to reboot for a Windows update. I crashed it twice. The PCIe hardware error is no longer reported in the System Log. I had the VM log open but nothing was logged at the time of the crash. I may be seeing a pattern emerge that I can reboot the Windows VM once, but crash on second reboot. Maybe a coincidence. Here are a few other error messages I see in System Log. I assumed they aren't related. Spoiler May 7 10:29:30 SuperKServ kernel: [Firmware Warn]: HEST: Duplicated hardware error source ID: 4096. May 7 10:29:30 SuperKServ kernel: ERST: Error Record Serialization Table (ERST) support is initialized. May 7 10:29:33 SuperKServ mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor. Please use the edac_mce_amd module instead. May 7 10:31:03 SuperKServ root: Template parsing error: template: :1:24: executing "" at <.AuxiliaryAddresses>: map has no entry for key "AuxiliaryAddresses" May 7 10:31:03 SuperKServ root: Template parsing error: template: :1:24: executing "" at <.AuxiliaryAddresses>: map has no entry for key "AuxiliaryAddresses" May 7 10:31:03 SuperKServ root: Template parsing error: template: :1:24: executing "" at <.AuxiliaryAddresses>: map has no entry for key "AuxiliaryAddresses" I am attaching a full Diagnostics report. Thanks, -Keith superkserv-diagnostics-20220507-1036.zip Quote Link to comment
casperse Posted May 13, 2022 Share Posted May 13, 2022 Just want to hear if you got the ROMED8-2T motherboard stable running VM's? Are you happy with this MB & CPU? - I am asking because I am in the middle of ordering the same MB as an upgrade to my stable Xeon MB Quote Link to comment
keith8496 Posted May 14, 2022 Author Share Posted May 14, 2022 19 hours ago, casperse said: Just want to hear if you got the ROMED8-2T motherboard stable running VM's? Are you happy with this MB & CPU? - I am asking because I am in the middle of ordering the same MB as an upgrade to my stable Xeon MB @casperse - Short answer is "yes". I was able to pass the NVMEs thru another way and get pretty good performance. I'm not sure what this method is called. It looks like passing thru the raw partitions but it's really the root above the partitions. Using the VirtIO driver yields fast speeds at the expense of CPU cycles. Nice thing about Epyc is that we've got enough cores to afford it. (screenshot attached) I will continue to tinker with passing the bare-metal NVMe thru. I have not been able to pass the bare-metal NVMe controller thru without severe crashes. I was trying to bind them to VFIO and pass them thru like a GPU. I feel like this is an issue with the motherboard or BIOS. Before this board, I was trying to use Supermicro's equivalent. I was able to pass the bare-metal NVMes on the SuperMicro but had to RMA three of them for dead BMCs before getting the AsRockRack ROMED8-2T. The BIOS has a lot of options. It took me a few sessions to get the BIOS dialed in for my workload. It's definitely an enthusiast board. Having said that, I don't think Ryzen 3 requires all the tinkering/optimization that I've read about from Ryzen 1 & 2. I have two of my three GPUs passed thru. I expect the third to work just fine, I just haven't set it up yet. Sometimes I have to disconnect my USB devices to boot after a complete disconnect from power. Quote Link to comment
casperse Posted May 14, 2022 Share Posted May 14, 2022 2 hours ago, keith8496 said: @casperse - Short answer is "yes". I was able to pass the NVMEs thru another way and get pretty good performance. I'm not sure what this method is called. It looks like passing thru the raw partitions but it's really the root above the partitions. Using the VirtIO driver yields fast speeds at the expense of CPU cycles. Nice thing about Epyc is that we've got enough cores to afford it. (screenshot attached) I will continue to tinker with passing the bare-metal NVMe thru. I have not been able to pass the bare-metal NVMe controller thru without severe crashes. I was trying to bind them to VFIO and pass them thru like a GPU. I feel like this is an issue with the motherboard or BIOS. Before this board, I was trying to use Supermicro's equivalent. I was able to pass the bare-metal NVMes on the SuperMicro but had to RMA three of them for dead BMCs before getting the AsRockRack ROMED8-2T. The BIOS has a lot of options. It took me a few sessions to get the BIOS dialed in for my workload. It's definitely an enthusiast board. Having said that, I don't think Ryzen 3 requires all the tinkering/optimization that I've read about from Ryzen 1 & 2. I have two of my three GPUs passed thru. I expect the third to work just fine, I just haven't set it up yet. Sometimes I have to disconnect my USB devices to boot after a complete disconnect from power. Thanks for the feedback! I have a Xeon E2100G and even this setup is causing problems during passthrough of the NVidia cards 😞 I am starting to consider a setup like yours with plenty of possibilities for future expansion! But NOT running Unraid as a bare-metal hypervisor (Like I always have done!) But instead run Proxmox as my hypervisor and then having all the VM's running here! (Passthrough should be much easier) And Unraid running as a virtual setup on Proxmox with HW passthrough of all the drives That way I could re-boot Unraid and keep all my VM running + the backup of VM's and snapshots would be built into Proxmox Quote Link to comment
ghost82 Posted May 14, 2022 Share Posted May 14, 2022 1 hour ago, casperse said: But instead run Proxmox as my hypervisor and then having all the VM's running here! (Passthrough should be much easier) Obviously you can run the os you want but take into account that it wont be easier. Both unraid and proxmox are based on qemu+kvm and libvirt, proxmox and unraid are linux oses. You can obtain the same with any linux distribution as far as qemu and libvirt are installed, so why it should be easier? Quote Link to comment
keith8496 Posted May 16, 2022 Author Share Posted May 16, 2022 Linus Tech Tips just built the BIG brother to my rig. He's using the same motherboard with a lot more CPU and RAM. I originally got the idea from his 3- and 7-Gamers 1 CPU videos. I've seen enough of his videos to walk into my project knowing it would be like building a race car in my garage. There will be fun and there will be challenges. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.