[KERNEL]custom kernel build with treaks (2020.03.07 v6.8.3|5.5.8|4.19.108|NAVI|VEGA|NFSv4|R8125|Zen2)


Recommended Posts

12 hours ago, Critica1Err0r said:

vfio-pci 0000:0c:00.4: not ready 1023ms after FLR; waiting

 

ive had it appended like this and it still shows the same thing

Use the latest one I've uploaded and try again. The earlier version doesn't have support of pcie_no_flr kernel option.

Link to comment

I ended up getting a new board. Aorus x570 Master and almost everything passed fine. Bluetooth wont connect to my keyboard still. and

 

2020-04-20T22:17:49.302385Z qemu-system-x86_64: vfio: Cannot reset device 0000:0d:00.4, depends on group 45 which is not owned.

 

IOMMU group 45:[1022:1485] 0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP

 

 append pcie_no_flr=1022:1487,1022:1485 video=efifb:off vfio -pci .ids=10de:1b06,10de:10ef,144d:a808,1022:1485,1022:149c,1022:149c,8086:2723,8086:1539

 

am i doing this right?

Link to comment

Hello, i have a Problem with my 5700xt and i need some help with it.

I have Unraid 6.8.3 with the kernel from this topic runnig on a x570 Aorus Elite with a 3600 processor.

For me ist seems that the reset from GPU works but not from Audio part. In vfio i can see that only the Audio Device is not resetable. The Log of the VM gives in my opinion the same answer.

The VM Crashs when a programm tries to use a audio device

If i change the GPU to my old 7870 everything works without Problems.

 

I hope someone have a idea what i can do to solve this Problem. 

Fehler_VM_5700xt_1.png

Fehler_VM_5700xt_2.png

Edited by alejanson
Link to comment

@Leoyzen First of all thanks for your work to provide custom kernels. 👍

 

Let me explain the issue i have. I upgraded from a TR4 1950x to a Threadripper 3960x on an Gigabyte Aorus Extreme TRX40 and kinda see the same issues people having with passing through the onboard audio. In my case it's the following device:

IOMMU group 42:	[1022:1487] 23:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
root@UNRAID:~# lspci -v -s 23:00.4
23:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
        Subsystem: Advanced Micro Devices, Inc. [AMD] Device d102
        Flags: bus master, fast devsel, latency 0, IRQ 154
        Memory at b1400000 (32-bit, non-prefetchable) [size=32K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [2a0] Access Control Services
        Capabilities: [370] Transaction Processing Hints
        Kernel driver in use: vfio-pci

With the default 6.8.3 Kernel I have the same issue like other users with an 3rd gen Ryzen. Passing through the onboard audio controller to a VM and starting it up causing the server to freeze/lockup. No matter if I bind the device to vfio or not. Only way to get the server back to a working state is to force a shutdown.

Short snippet in the state when the server locks up. I can't pull the full diagnostic at this point, not via ssh nor via webui.

Apr 25 15:46:29 UNRAID kernel: vfio-pci 0000:23:00.4: not ready 1023ms after FLR; waiting
Apr 25 15:46:31 UNRAID kernel: vfio-pci 0000:23:00.4: not ready 2047ms after FLR; waiting
Apr 25 15:46:34 UNRAID kernel: vfio-pci 0000:23:00.4: not ready 4095ms after FLR; waiting
Apr 25 15:46:39 UNRAID kernel: vfio-pci 0000:23:00.4: not ready 8191ms after FLR; waiting
Apr 25 15:46:48 UNRAID kernel: vfio-pci 0000:23:00.4: not ready 16383ms after FLR; waiting
Apr 25 15:47:06 UNRAID kernel: vfio-pci 0000:23:00.4: not ready 32767ms after FLR; waiting
Apr 25 15:47:44 UNRAID kernel: vfio-pci 0000:23:00.4: not ready 65535ms after FLR; giving up
Apr 25 15:48:16 UNRAID nginx: 2020/04/25 15:48:16 [error] 11486#11486: *1680 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.7, server: , request: "POST /plugins/dynamix.vm.manager/include/VMajax.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "unraid.local", referrer: "https://unraid.local/VMs"
Apr 25 15:48:43 UNRAID kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Apr 25 15:48:43 UNRAID kernel: rcu: 	31-....: (59 ticks this GP) idle=a66/1/0x4000000000000000 softirq=131201/131201 fqs=14340 
Apr 25 15:48:43 UNRAID kernel: rcu: 	(detected by 32, t=60002 jiffies, g=412949, q=631295)
Apr 25 15:48:43 UNRAID kernel: Sending NMI from CPU 32 to CPUs 31:
Apr 25 15:48:43 UNRAID kernel: NMI backtrace for cpu 31
Apr 25 15:48:43 UNRAID kernel: CPU: 31 PID: 47678 Comm: qemu-system-x86 Tainted: G           O      4.19.107-Unraid #1
Apr 25 15:48:43 UNRAID kernel: Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS XTREME/TRX40 AORUS XTREME, BIOS F4d 03/05/2020
Apr 25 15:48:43 UNRAID kernel: RIP: 0010:pci_mmcfg_read+0x98/0xa6
Apr 25 15:48:43 UNRAID kernel: Code: 83 fe 02 74 15 41 83 fe 04 74 1a 41 ff ce 75 1d 48 01 d8 8a 00 0f b6 c0 eb 10 48 01 d8 66 8b 00 0f b7 c0 eb 05 48 01 d8 8b 00 <89> 45 00 31 c0 5b 5d 41 5c 41 5d 41 5e c3 81 fa ff 00 00 00 41 56
Apr 25 15:48:43 UNRAID kernel: RSP: 0018:ffffc900122e3cc8 EFLAGS: 00000286
Apr 25 15:48:43 UNRAID kernel: RAX: 00000000ffffffff RBX: 0000000000000ffc RCX: 0000000000000ffc
Apr 25 15:48:43 UNRAID kernel: RDX: 000000000000007f RSI: 0000000000000023 RDI: ffffc90008000000
Apr 25 15:48:43 UNRAID kernel: RBP: ffffc900122e3cfc R08: 0000000000000004 R09: ffffc900122e3cfc
Apr 25 15:48:43 UNRAID kernel: R10: 0000000000000004 R11: 0000000000000084 R12: 0000000002304000
Apr 25 15:48:43 UNRAID kernel: R13: 0000000000000004 R14: 0000000000000004 R15: ffff888ebcb25b40
Apr 25 15:48:43 UNRAID kernel: FS:  00001524a07b4e00(0000) GS:ffff88902d5c0000(0000) knlGS:0000000000000000
Apr 25 15:48:43 UNRAID kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 15:48:43 UNRAID kernel: CR2: 000014f97fed62b0 CR3: 0000000f3cf9a000 CR4: 0000000000340ee0
Apr 25 15:48:43 UNRAID kernel: Call Trace:
Apr 25 15:48:43 UNRAID kernel: pci_bus_read_config_dword+0x44/0x65
Apr 25 15:48:43 UNRAID kernel: pci_find_next_ext_capability+0x9e/0xc9
Apr 25 15:48:43 UNRAID kernel: ? _raw_spin_unlock_irqrestore+0xc/0x12
Apr 25 15:48:43 UNRAID kernel: pci_restore_vc_state+0x20/0x5c
Apr 25 15:48:43 UNRAID kernel: pci_restore_state+0xd0/0x26e
Apr 25 15:48:43 UNRAID kernel: pci_dev_restore+0x18/0x34
Apr 25 15:48:43 UNRAID kernel: pci_try_reset_function+0x3f/0x4e
Apr 25 15:48:43 UNRAID kernel: vfio_pci_open+0x7e/0x3af
Apr 25 15:48:43 UNRAID kernel: vfio_group_fops_unl_ioctl+0x355/0x42e
Apr 25 15:48:43 UNRAID kernel: vfs_ioctl+0x19/0x26
Apr 25 15:48:43 UNRAID kernel: do_vfs_ioctl+0x533/0x55d
Apr 25 15:48:43 UNRAID kernel: ? __se_sys_newlstat+0x48/0x6b
Apr 25 15:48:43 UNRAID kernel: ksys_ioctl+0x37/0x56
Apr 25 15:48:43 UNRAID kernel: __x64_sys_ioctl+0x11/0x14
Apr 25 15:48:43 UNRAID kernel: do_syscall_64+0x57/0xf2
Apr 25 15:48:43 UNRAID kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 25 15:48:43 UNRAID kernel: RIP: 0033:0x1524a1fa24b7
Apr 25 15:48:43 UNRAID kernel: Code: 00 00 90 48 8b 05 d9 29 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 29 0d 00 f7 d8 64 89 01 48
Apr 25 15:48:43 UNRAID kernel: RSP: 002b:00007fff5878e818 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 25 15:48:43 UNRAID kernel: RAX: ffffffffffffffda RBX: 000015241dfc14e0 RCX: 00001524a1fa24b7
Apr 25 15:48:43 UNRAID kernel: RDX: 000015241c9a1520 RSI: 0000000000003b6a RDI: 000000000000001b
Apr 25 15:48:43 UNRAID kernel: RBP: 00001524181605c0 R08: 000015241c9a1520 R09: 00007fff5878c783
Apr 25 15:48:43 UNRAID kernel: R10: 0000000000000018 R11: 0000000000000246 R12: 00001524181605c0
Apr 25 15:48:43 UNRAID kernel: R13: 000015241c9a1520 R14: 00007fff5878fa30 R15: 000015241dfc0c00

 

At this point, the server becomes basically unusable.

 

If I use your custom Kernel "6.8.3-5.5.8-2" you posted 2 pages back and the kernel agument "pcie_no_flr"

append pcie_no_flr=1022:1487 vfio-pci.ids=1022:1487,1b21:2142 isolcpus=12-23,36-47 initrd=/bzroot

I'am able to passthrough the device to a Windows VM without freezing the whole server, BUT the device isn't shown in the device manager as new device. 1b21:2142 is one of the USB onboard controllers and passthrough is working fine. Drivers are installed in a bare metal Windows install on an NVME which I use in this VM. I can't trigger Windows to show it in the device list. No errors in the VM log except for some warning:

2020-04-25T15:24:26.974831Z qemu-system-x86_64: vfio: Cannot reset device 0000:23:00.4, depends on group 39 which is not owned.
2020-04-25T15:24:29.155826Z qemu-system-x86_64: vfio: Cannot reset device 0000:23:00.4, depends on group 39 which is not owned.
IOMMU group 39:	[1022:1485] 23:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP

The "Starship/Matisse Reserved SPP" isn't bind to vfio or used for passthrough either. Do I have to bind and passthrough that device as well???

 

Any idea for an workaround to have the onboard audio device to show up in the Windows VM? I really wanna figure this out and maybe try to help @limetech to implement the fix for the onboard audio issue in the next builds hopefully. I'am willing to test to find a solution for this. From all I've read so far, a couple people have issues passing these devices to a VM on the latest AMD platforms. Maybe we can workout a solution thats worth to implement in the future Unraid builds. I've not ried the 6.9 beta build yet. It's kinda the next step for me to try, but I think this won't make any difference.

 

Quote

This initial -beta1 release is exactly the same code set of 6.8.3 except that the Linux kernel has been updated to 5.5.8 (latest stable as of this date).  We have only done basic testing on this release; normally we would not release 'beta' code but some users require the hardware support offered by the latest kernel along with security updates added into 6.8.

 

I'll report back soon. Thanks for the help ❤️

 

syslog_683_default_kernel.txt syslog_patched_5.5.8-2_kernel.txt Win10_VM_log_patched.txt 01_W10.xml IOMMU.txt

 

 

 

EDIT:

 

Tried the 6.9 beta 1, same issue as with unpatched 6.8.3 kernel. Same freezes, same errors. No passthrough of the onboard audio possible. After reverting back to 6.8.3 with the patched 5.5.8-2 Kernel I tried a couple combinations of binding and passing through the device in group 39 with no success. Either the "FLR; waiting" error occures and server freezes or VM starts fine without any extra audio device showing up in the device manager.

 

 

 

Edited by bastl
Link to comment

How could I missed that????WTF!!1!!!1

 

Ok, so during all my testings I completely missed that the Aorus Extreme TRX40 provides 2 USB audio devices

 

grafik.png.1f557b23f927e5d098b95fbcad07b67f.png

 

The first one is for the front panel audio and the second for the rear 5.1/7.1 audio links. I always tried to passthrough the onboard "[AMD] Starship/Matisse HD Audio Controller" without success. 😑

 

Both these controllers are connected to the chipset and can be passed through to different VMs at the same time.

 

RTFM 😂

 

grafik.png.5c07f883f8e03853634eda09ac997bdb.png

 

I reverted back to default 6.8.3 kernel and had no issues to passthrough both controllers to different vms, running them simultaneously with 5.1 speakers at the back and headset attached to the front without any issues. No patched kernel needed 👍

 

  • Like 1
Link to comment

@Leoyzen Thanks for that hint, but it is not in the same group with other devices.

IOMMU group 29:	[1022:148a] 22:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function
IOMMU group 30:	[1022:1485] 23:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
IOMMU group 31:	[1022:1486] 23:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
IOMMU group 32:	[1022:148c] 23:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller
IOMMU group 33:	[1022:1487] 23:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
IOMMU group 34:	[1022:1482] 40:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
IOMMU group 35:	[1022:1483] 40:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
IOMMU group 36:	[1022:1483] 40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
IOMMU group 37:	[1022:1482] 40:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
IOMMU group 38:	[1022:1482] 40:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
IOMMU group 39:	[1022:1483] 40:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge

 

Link to comment
  • 3 weeks later...

has anyone tried this ubuntu VM works fine with Windows but after on Ubuntu after a restart i just get blank screen and have to force stop the VM 

2020-05-27T09:43:44.670228Z qemu-system-x86_64: vfio: Cannot reset device 0000:33:00.1, no available reset mechanism.
2020-05-27T09:43:44.674223Z qemu-system-x86_64: vfio: Cannot reset device 0000:33:00.1, no available reset mechanism.
2020-05-27T09:44:17.353232Z qemu-system-x86_64: vfio: Cannot reset device 0000:33:00.1, no available reset mechanism.
2020-05-27T09:44:17.357244Z qemu-system-x86_64: vfio: Cannot reset device 0000:33:00.1, no available reset mechanism.
2020-05-27T09:45:08.197995Z qemu-system-x86_64: terminating on signal 15 from pid 9026 (/usr/sbin/libvirtd)
libusb: error [do_close] Device handle closed while transfer was still being processed, but the device is still connected as far as we know
libusb: error [do_close] A cancellation hasn't even been scheduled on the transfer for which the device is closing
libusb: error [do_close] Device handle closed while transfer was still being processed, but the device is still connected as far as we know
libusb: error [do_close] A cancellation hasn't even been scheduled on the transfer for which the device is closing
libusb: error [do_close] Device handle closed while transfer was still being processed, but the device is still connected as far as we know
libusb: error [do_close] A cancellation hasn't even been scheduled on the transfer for which the device is closing
2020-05-27 09:45:10.399+0000: shutting down, reason=destroyed

and after that i can neither get a output from windows or ubuntu VM i need to restart unraid server to get it working again 

Link to comment

@Leoyzen thanks for the patch!  My USB passthrough issues have been resolved. However the Navi reset patch doesn't seem to work for me. Is it vendor specific like the Vega patch appears to be? It appears for me that the HDMI audio device doesn't reset no matter if cleanly shut down the VM or force it.

 

2020-05-29T00:18:08.955204Z qemu-system-x86_64: vfio: Cannot reset device 0000:4b:00.1, no available reset mechanism.
2020-05-29T00:18:08.987175Z qemu-system-x86_64: vfio: Cannot reset device 0000:4b:00.1, no available reset mechanism.

 

Am I missing a step? Has anyone successfully tested the reset functionality on Navi? 

Link to comment

@Leoyzen: Thank you for all of your tips here. I've used your KVM settings from github and some of your EFI configuration with success, but I'm now encountering the Navi reset bug. I really appreciate your attempts to better understand how OS X is behaving under virtualization.

 

I know that you're mostly focused on UnRaid, but I'm trying to set this all up on Ubuntu 20.04 with the 5.4 kernel. You've done a great job of tracking down other issues around Hackintosh/OpenCore/QEMU/KVM, so I'm wondering if you have deeper thoughts on the reset bug.

 

I don't get any errors about VFIO being unable to reset the device like some of the other reports--do I need to enable some QEMU logging to see those messages?

 

I do occasionally see this:

[ 2963.732284] vfio-pci 0000:4d:00.1: Refused to change power state, currently in D0

 

The weirdest thing to me is the inconsistency:

Sometimes I'm able to boot OS X and it works fine and then I can shut down and that too works fine and leaves the device in an okay state. Other times I can boot fine but then on shutdown the device hangs and I have to force it off which leaves the PCI config corrupted and results in a 127 error on attempting to reboot the guest. Other times the device boots fine in terms of the 127 error, but then it seems to hang and display a black screen instead of the login screen (similar to what happened before I set agdpmod=pikera).

 

Is all of this due to the same reset bug or are there multiple things going on?

Link to comment
On 8/13/2019 at 6:23 AM, Leoyzen said:
  • Add Vega Reset Patch
  • Add Navi Reset Patch
  • Enable NFSv4 in kernel(God damned, we finnaly get nfsv4 to work)
  • Add R8125 out tree driver.
  • AMD onboard audio/usb controller flr patch.

This is awesome work @Leoyzen !  👍🏻 I’m trying to setup an ASRock TRX40 creator with Radeon VII running Catalina. All seems to run ok apart from the reset bug and usb passthrough. I have been using proxmox is there any description how to apply these patches to the proxmox kernel?

Link to comment
On 4/17/2020 at 12:57 AM, Leoyzen said:

Use the latest one I've uploaded and try again. The earlier version doesn't have support of pcie_no_flr kernel option.

@Leoyzen - I've never used a custom Kernel before, but I am running into a Ryzen 3000 FLR error.  Does the Kernel you provide cover this issue?  I found the commit that's [going to be??] included:  https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/virtualization&id=0d14f06cd6657ba3446a5eb780672da487b068e7

 

What version of Linux kernel will that make it into?  5.7?

 

Link to comment
13 hours ago, JoeBloggs said:

Am I right in saying that to use the precompiled kernel, we simply copy it across to /boot/ ? Or have I got that completely wrong?

@JoeBloggs - I just used the kernel.  Yes, just copy it to your flash drive (the /boot/ directory).  Save the stock kernels as .bak or something in case you need them.  Everything will boot like normal.  Just make sure you match the version.  Leoyzen has an attached version earlier on this page for Unraid 6.8.3, same as I am using.

 

@Leoyzen - Wanted to say thank you for the FLR fix, I used your kernel, added the parameters I needed and am up and running with the USB controller in my VM!

 

Added `append pcie_no_flr=1022:149c,1022:1487,1022:1485` and used vfio-append to those same ones.

 

Thanks again!

Link to comment

Hi,

 

Can I get a comment from someone with a TRX40 Gigabyte Extreme that was usb onboard pass through working with this patch? I tried both 4.19 and 5 with the same vm freeze issue unfortunately. I know the creator doesnt have a TRX40 but looked like the same issue to me.

Link to comment
  • 3 weeks later...
On 8/13/2019 at 6:23 AM, Leoyzen said:

Hi I just build a kernel to support X570 motherboard (mine is msi x570 ace) and latest AMD Ryzen 2 3000 family CPU.

6.8.3 is out, here is the new kernel and some tweaks:

  • Add Vega Reset Patch
  • Add Navi Reset Patch
  • Enable NFSv4 in kernel(God damned, we finnaly get nfsv4 to work)
  • Add R8125 out tree driver.
  • AMD onboard audio/usb controller flr patch.
  • Provide two version (linux-5.5.8 and linux-4.19.108) in case of bug. Notice that linux-4.19.108 still don't have AMD Zen 2 suppport.

Hello,

do you plan to make a custom firmware for 6.9beta22?

 

Thanks for your work.

Link to comment
On 6/6/2020 at 9:51 PM, Cadal said:

Hi,

 

Can I get a comment from someone with a TRX40 Gigabyte Extreme that was usb onboard pass through working with this patch? I tried both 4.19 and 5 with the same vm freeze issue unfortunately. I know the creator doesnt have a TRX40 but looked like the same issue to me.

 

I have both Starship USB controllers passed through, the USB controllers on the CPU (Matisse) don't work yet AFAIK. As mentioned above, I had to add pcie_no_flr=1022:148 to my config  fix the FLR issue. They work perfectly now.   

  • Thanks 1
Link to comment
On 6/29/2020 at 10:50 PM, rachid596 said:

Hello,

do you plan to make a custom firmware for 6.9beta22?

 

Thanks for your work.

Also requesting patch if possible

@limetech - Could you please apply this fix ? many users with 3xxx Ryzen are having this issue, would really appreciate it to release quick fix for that.

 

Thanks!

 

Link to comment
Also requesting patch if possible
@limetech - Could you please apply this fix ? many users with 3xxx Ryzen are having this issue, would really appreciate it to release quick fix for that.
 
Thanks!
 
Limetech will implément it in the next releases. Patch for usb and audio for x570

Envoyé de mon HD1913 en utilisant Tapatalk

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.