fae76 Posted January 29, 2018 Posted January 29, 2018 Hi I just bought hardware to replace my 8 years old machine. So far I used a RAID-Controller and was running RAID-5. Before investing into a new RAID-Controller I would like to try Unraid. My question therefore is, does the current stable version of Unraid support my new hardware. Hardware: ---------------- - Asus Corsair 6 Hero - Ryzen 7 - 32GB DDR4 - AMD Radeon Vega 64 - 2x 500GB SATA3 SSD for caching (mirrored) - 4x 8TB WD-red for array (parity) - 1x NVMe SSD for Win10 VM Use Cases: ------------------- - NAS - run dockers - run Win10 virtual machine with hardware pass-through. Hardware pass-through: --------------------------------------- - I only have 1 GPU and I would like to pass this one through to the Win10 virtual machine - mouse & keyboard pass-through - onboard audio pass-through - NVMe pass-through - USB pass-through (at least some ports) general questions: ------------------------------ - can I install and run Unraid with an AMD Vega GPU? I have read, that proper Vega support requires Kernel 4.15. Does this mean that I cannot use Unraid with my HW until Unraid includes Kernel 4.15? (sorry for the noob question, I didn't touch Linux for years and last time I did I had to give up because my hardware was not (yet) supported) - does passing through the only GPU in the system work with AMD cards especially Vega? I only found hacks for nVidia cards requiring dumping vbios. Can I do the same with AMD cards? Thank you for reading this far. Any help is appreciated.
Frank1940 Posted January 29, 2018 Posted January 29, 2018 First, I do not run any VM's in my setup. However, as you have already surmised, unRAID does have some hardware requirements that have to be met. There are a number of folks who are running Ryzen 1700 and they are using VM's. So you should be alright on that score. You might want to read this portion of the unRAID manual which discusses VM's: http://lime-technology.com/wiki/index.php/UnRAID_Manual_6#Using_Virtual_Machines Be sure to go to the spreadsheet of user tested configuration found this portion of the manual to see if you can find any about your choice of hardware: http://lime-technology.com/wiki/index.php/UnRAID_Manual_6#Assigning_Graphics_Devices_to_Virtual_Machines_.28GPU_Pass_Through.29 Good Luck
fae76 Posted January 29, 2018 Author Posted January 29, 2018 Thanks Frank I was reading and searching the forums quite a bit already and watched a bunch of tutorial videos. My main concern is Vega, I can't find any posts about someone running Unraid with Vega and I also can't find anything about single AMD GPU pass-through. I checked the spreadsheet before posting today, no luck about Vega, neither positive nor negative. There is one row mentioning my motherboard but unfortunately the important cells are left empty. I know Vega pass-through is possible with KVM: https://forum.level1techs.com/t/threadripper-gpu-passthrough-working-with-vega/120594. This has been done on Fedora with some kernel hacks though. So my question remains, does Vega work with Unraid?
Frank1940 Posted January 29, 2018 Posted January 29, 2018 You could edit your first post and change the thread title to indicate that you are wanting info on passing your AMD Vega card through to a VM. EDIT: you could also setup an unRAID system using a 30 Day trial license and see what results are. (I seem to recall that you can get an extension for that license after the 30 days are up.)
fae76 Posted February 3, 2018 Author Posted February 3, 2018 I have everything up and running except for GPU passthrough. As long as I start my Win10 VM with VNC graphics everything works fine. I can passthrough my NVMe drive as well as onboard audio, mouse and keyboard. Terminating the VM will connect mouse and keyboard back to Unraid. With GPU passthrough enabled my screen just freezes. Whatever image was on there stays. In order to stop the VM I have to "force stop" it. Below are my IOMMU groups and the VM XML file. What else do you need me to provide in order to help me? Thank you very much!!! IOMMU group 0: [1022:1452] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge IOMMU group 1: [1022:1453] 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge IOMMU group 2: [1022:1453] 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge IOMMU group 3: [1022:1452] 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge IOMMU group 4: [1022:1452] 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge IOMMU group 5: [1022:1453] 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge IOMMU group 6: [1022:1452] 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge IOMMU group 7: [1022:1452] 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge IOMMU group 8: [1022:1454] 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B IOMMU group 9: [1022:1452] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge IOMMU group 10: [1022:1454] 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B IOMMU group 11: [1022:790b] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59) [1022:790e] 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) IOMMU group 12: [1022:1460] 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1461] 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1462] 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1463] 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1464] 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1465] 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1466] 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1467] 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 IOMMU group 13: [144d:a804] 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 IOMMU group 14: [1022:43b9] 02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43b9 (rev 02) [1022:43b5] 02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43b5 (rev 02) [1022:43b0] 02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b0 (rev 02) [1022:43b4] 03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) [1022:43b4] 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) [1022:43b4] 03:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) [1022:43b4] 03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) [1022:43b4] 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) [1022:43b4] 03:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) [1022:43b4] 03:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02) [1b21:1343] 04:00.0 USB controller: ASMedia Technology Inc. Device 1343 [8086:1539] 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) IOMMU group 15: [1022:1470] 0b:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1470 (rev c1) IOMMU group 16: [1022:1471] 0c:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1471 IOMMU group 17: [1002:687f] 0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) IOMMU group 18: [1002:aaf8] 0d:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf8 IOMMU group 19: [1022:145a] 0e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a IOMMU group 20: [1022:1456] 0e:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor IOMMU group 21: [1022:145c] 0e:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller IOMMU group 22: [1022:1455] 0f:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455 IOMMU group 23: [1022:7901] 0f:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) IOMMU group 24: [1022:1457] 0f:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller Win10 VM config: win10vm.txt VM logs: 2018-02-03 20:43:08.889+0000: starting up libvirt version: 3.8.0, qemu version: 2.10.2, hostname: RedStoneTower LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/sbin/qemu -name 'guest=Win10 - Felix,debug-threads=on' -S -object 'secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-Win10 - Felix/master-key.aes' -machine pc-i440fx-2.10,accel=kvm,usb=off,dump-guest-core=off,mem-merge=off -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=none -drive file=/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/etc/libvirt/qemu/nvram/c0b11a24-0155-0b78-9ea3-2ff17db22c8e_VARS-pure-efi.fd,if=pflash,format=raw,unit=1 -m 20480 -realtime mlock=off -smp 8,sockets=1,cores=8,threads=1 -uuid c0b11a24-0155-0b78-9ea3-2ff17db22c8e -display none -no-user-config -nodefaults -chardev 'socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-Win10 - Felix/monitor.sock,server,nowait' -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-hpet -no-shutdown -boot strict=on -o-pci,host=0f:00.3,id=hostdev1,bus=pci.0,addr=0x5 -device vfio-pci,host=0d:00.1,id=hostdev2,bus=pci.0,addr=0x6 -device vfio-pci,host=01:00.0,id=hostdev3,bus=pci.0,addr=0x8 -device usb-host,hostbus=1,hostaddr=3,id=hostdev4,bus=usb.0,port=1 -device usb-host,hostbus=1,hostaddr=2,id=hostdev5,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -msg timestamp=on 2018-02-03 20:43:08.889+0000: Domain id=1 is tainted: high-privileges 2018-02-03 20:43:08.889+0000: Domain id=1 is tainted: host-cpu 2018-02-03T20:43:08.936893Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0) 2018-02-03T20:43:11.670147Z qemu-system-x86_64: -device vfio-pci,host=0d:00.0,id=hostdev0,bus=pci.0,addr=0x4: Failed to mmap 0000:0d:00.0 BAR 0. Performance may be slow 2018-02-03T20:43:12.951965Z qemu-system-x86_64: vfio: Cannot reset device 0000:0f:00.3, depends on group 22 which is not owned. 2018-02-03T20:43:13.063715Z qemu-system-x86_64: vfio: Cannot reset device 0000:0f:00.3, depends on group 22 which is not owned.
Siwat2545 Posted February 4, 2018 Posted February 4, 2018 Is your graphic card used by unraid to boot ? if it is, Enable your iGPU and let unraid grab that and It will work (I have this issue before)
fae76 Posted February 4, 2018 Author Posted February 4, 2018 Yes it is. I cant, I do not have another GPU. There are multiple posts in this forum which state that this should work fine with AMD cards. The nVidia cards need a vBIOS hack but can do it too. Maybe I have to manually manipulate some config files in order to make this work, but I don't know where and what I have to add.
Siwat2545 Posted February 4, 2018 Posted February 4, 2018 Then try setting your boot mode to legacy and set your pcie option rom of your gpu to EFI or disable (Preferably EFI) This will result in unraid not detecting it while booting and you can pass it through to the vms
Handl3vogn Posted February 4, 2018 Posted February 4, 2018 To passtrough my primary gpu on my ryzen setup I had to do these things. Boot unraid in legacy mode (UEFI would not work). Make a copy of my gpu BIOS and pass it through to the VM via the XML config. This worked for me on a AMD Rx 550.
fae76 Posted February 4, 2018 Author Posted February 4, 2018 OK, thanks Siwat. I will try that as soon as parity-sync is done (it's currently at 90% ~2h to go). In the meantime I will google how to implement your suggestions. Thanks a lot. I will post the results.
fae76 Posted February 4, 2018 Author Posted February 4, 2018 Thanks Handl3vogn. Your suggesting the same things as Siwat. I will definitively try this. I have to google how first.
Siwat2545 Posted February 4, 2018 Posted February 4, 2018 Please refer to page 84 in you motherboard manual http://dlcdnet.asus.com/pub/ASUS/mb/SocketAM4/CROSSHAIR-VI-HERO/E12601_CROSSHAIR_VI_HERO_UM_V3_WEB.pdf
fae76 Posted February 4, 2018 Author Posted February 4, 2018 I am booting up Unraid in legacy mode now. I also dumped my vBIOS using GPU-Z (booted Win10 from an additional SSD) and also tried a vBIOS from TechPowerUP. Now at least my screen flickers when I start the VM. However the screen stays black afterwards. I'm not sure how to verify if the dumped vBIOS is working. According to the how-to video from Spaceinvader at least for nVidia cards it's necessary to edit the dumped file. I cannot find the start of the real rom-file he is mentioning in his video in my vBIOS. Any more hints and suggestions? EDIT: The screen flicker already happens when I start Unraid in legacy mode. Without using vBIOS, it seems that at least one of the VMs I created with VNC is booting up (HD led is flickering like crazy). As soon as I add the vBIOS the HD led does not flicker anymore. I think the vBIOS is not valid and causes the VM to crash.
Siwat2545 Posted February 4, 2018 Posted February 4, 2018 I did not even say to dump your vbios,any way can you send me your sysdev and syslinux ?
fae76 Posted February 4, 2018 Author Posted February 4, 2018 Handl3vogn mentioned vBIOS. I also edited my last post and added some more info regarding vBIOS. Please check it out. Attached my syslinux.cfg (the one from the syslinux folder). For sysdev I assume you mean the output of the system devices tool right? I dumped that into a text file. I also added my VMs config XML. syslinux.cfg sysdev.txt win10vm_vbios.xml
Siwat2545 Posted February 4, 2018 Posted February 4, 2018 Replace your syslinux with default menu.c32 menu title Lime Technology, Inc. prompt 0 timeout 50 label unRAID OS menu default kernel /bzimage append vfio-pci.ids=1002:687f,1002:aaf8 initrd=/bzroot label unRAID OS GUI Mode kernel /bzimage append initrd=/bzroot,/bzroot-gui label unRAID OS Safe Mode (no plugins, no GUI) kernel /bzimage append initrd=/bzroot unraidsafemode label unRAID OS GUI Safe Mode (no plugins) kernel /bzimage append initrd=/bzroot,/bzroot-gui unraidsafemode label Memtest86+ kernel /memtest
fae76 Posted February 4, 2018 Author Posted February 4, 2018 I did replace the the config, rebooted, tried without vBIOS, rebooted tried with vBIOS. Unfortunately there is no change, the screen flickers then stays black.
Siwat2545 Posted February 4, 2018 Posted February 4, 2018 default menu.c32menu title Lime Technology, Inc.prompt 0timeout 50label unRAID OS menu default kernel /bzimage append vfio-pci.ids=1002:687f,1002:aaf8 disable_vga=1 initrd=/bzrootlabel unRAID OS GUI Mode kernel /bzimage append initrd=/bzroot,/bzroot-guilabel unRAID OS Safe Mode (no plugins, no GUI) kernel /bzimage append initrd=/bzroot unraidsafemodelabel unRAID OS GUI Safe Mode (no plugins) kernel /bzimage append initrd=/bzroot,/bzroot-gui unraidsafemodelabel Memtest86+ kernel /memtest
Siwat2545 Posted February 5, 2018 Posted February 5, 2018 Screen shot of your uefi/bios pci-e configurations would be great PS i think it is the advance menu
fae76 Posted February 5, 2018 Author Posted February 5, 2018 Here are my current Bios PCIe settings (sorry for the bad pics of my cell-phone). I also started to learn how to debug in this environment and already learnt quite a bit. GPU passthrough does still not work, but I think I get a better understanding of what's going on. For example syslog was full of these messages: Quote Feb 5 20:36:56 RedStoneTower kernel: pcieport 0000:00:01.3: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000b(Receiver ID) Feb 5 20:36:56 RedStoneTower kernel: pcieport 0000:00:01.3: device [1022:1453] error status/mask=00000040/00006000 Feb 5 20:36:56 RedStoneTower kernel: pcieport 0000:00:01.3: [ 6] Bad TLP Feb 5 20:36:57 RedStoneTower kernel: pcieport 0000:00:01.3: AER: Corrected error received: id=0000 After downgrading my PCIe form 3.0 to 2.0 they disappeared. I'm not sure yet what's going on but I suspect that either my mainboard has a hardware or bios bug (I have the newest one installed) or my Vega64 has a problem. Or maybe my motherboard applies some overclock to PCIe. Whatever this does, it does not seem to affect GPU passthrough. The next thing I tracked down were these here: Quote Feb 5 22:24:55 RedStoneTower kernel: vfio-pci 0000:0d:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem Feb 5 22:24:56 RedStoneTower kernel: vfio-pci 0000:0d:00.0: BAR 0: can't reserve [mem 0xe0000000-0xefffffff 64bit pref] Feb 5 22:24:59 RedStoneTower kernel: vfio-pci 0000:0d:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff Feb 5 22:24:59 RedStoneTower kernel: vfio-pci 0000:0d:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff Adding the vBIOS ROM file does solve this issue. Attaching multiple GPUs to the VM (VNC as primary, Vega as secondary) does boot my Linux VM, the GPU however is not passed through (not claimed by kvm, not visible in the VM). Starting the same VM with the ROM file option results in a screen flicker and black screen but the VM will not boot or crash really early in the cycle (no SSH, no ping, but at least the GPU gets claimed by kvm). Unfortunately I do not yet know how to further follow and debug this hint. My gut feeling is that there is some kind of memory mapping or addressing issue. For all the tests above I went back to the stock syslinux.conf since the modifications you suggested so far did not seem to have any effect. The next thing I have to report are host crashes. Since I started with Unraid my host crashed at least 4 times already, not just the Web Interface but also SSH, and ping did not work anymore. I had to hard reset. I can not find any clues at all why this is happening. Even after starting logging syslog to my USB stick I cant find any hints why this is happening. There is no load on CPU/GPU. Only the parity synch was running in each case and 2 or 3 SSH connections. During the weekend I was stress-testing CPU, RAM and GPU simultaneously using AIDA64's stability test under Win10 for 24h without any issues. CPU and GPU and all the other sensors reported max temps under 60°C during this 24h period so cooling is working as well. Booting up Mint Linux from a USB Stick in UEFI mode worked flawlessly even though the image I used was quite old and still on a 4.4 kernel. Mint booted to GUI (no driver claimed my Vega since no AMD drivers were installed). This test shows that Vega does not necessarily need 4.15 kernel with full Vega support. For now I'm out of ideas how to proceed. There seem to be multiple issues and only a few clues if at all, so any input is appreciated.
Siwat2545 Posted February 6, 2018 Posted February 6, 2018 What are the option in pci-e x16 mode in advance menu ?
Recommended Posts
Archived
This topic is now archived and is closed to further replies.