February 1, 20188 yr Hi all, I'm running 6.4.1-rc1. I have two VMs that each use a passthrough GPU. Initially, my VMs would start fine, and then when audio came through my USB headset, it would stop working after a few seconds. Now, even my usb keyboard isn't working. The keyboard seems to work okay when controlling unraid's command prompt, but doesn't seem to work when passed to the VM. Looking in the syslog, I see some scary PCI bus error messages: Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: AER: Multiple Corrected error received: id=0000 Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=4019(Transmitter ID) Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: device [1022:1453] error status/mask=00001180/00006000 Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: [ 7] Bad DLLP Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: [ 8] RELAY_NUM Rollover Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: [12] Replay Timer Timeout Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: AER: Multiple Corrected error received: id=0000 Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=4019(Transmitter ID) Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: device [1022:1453] error status/mask=00001100/00006000 Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: [ 8] RELAY_NUM Rollover Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: [12] Replay Timer Timeout and shortly afterward some scary USB reset messages: Jan 31 21:32:25 storage kernel: vfio_ecap_init: 0000:41:00.0 hiding ecap 0x1e@0x258 Jan 31 21:32:25 storage kernel: vfio_ecap_init: 0000:41:00.0 hiding ecap 0x19@0x900 Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 4 Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 6 Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 10 Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 11 Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 12 Jan 31 21:32:41 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:32:41 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:32:41 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:32:42 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:32:42 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd Jan 31 21:32:43 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd Jan 31 21:33:25 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:25 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:25 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:26 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:26 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:26 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:27 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:27 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd Jan 31 21:33:28 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd Jan 31 21:33:28 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd Jan 31 21:33:28 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd Jan 31 21:33:29 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd Jan 31 21:36:37 storage ntpd[2458]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized I'm guessing that starting the VM somehow causes the USB reset, which kills my keyboard?
February 1, 20188 yr Dunno about the PCIe bus error, but the usb device resets are normal. Maybe post entire diagnostics.zip.
February 1, 20188 yr The PCIe errors can sometimes be fixed by using the offending controller on a different PCIe slot (ideally changing from a CPU slot to a PCH slot or vice versa) or with a bios update.
February 1, 20188 yr Author Here's the diagnostics zip. Sorry, I meant to attach it before. I vaguely remember having a similar problem and it was due to the CPU not being seated properly. Sound plausible? I'm also wondering if maybe the USB ports aren't getting enough power. Maybe the motherboard is bad? The only PCI cards I have are the 2 GPUs, and I was hoping to leave them in the motherboard-recommended configuration. GPUs seem to work okay. Sounds like the USB issue might not be related anyway? Thanks! storage-diagnostics-20180131-2137.zip
February 1, 20188 yr 2 minutes ago, coppit said: The only PCI cards I have are the 2 GPUs In that case not much you can do, keep an eye out for a bios update, a newer kernel might also make a difference.
February 2, 20188 yr Author More info... The keyboards not working I think were due to putting them both in the same USB hub. They're different brands, but apparently the wireless part is the same manufacturer. (They have the same device IDs.) The audio cuts out after a few seconds on a youtube video, but if I stop the video then go into device manager and disable/enable the device, that fixes it. While playing overwatch, the audio will stop then the whole game freezes for 5 seconds or so, then everything recovers, including the audio. I checked the Windows event log and there's nothing suspicious. I double-checked that the usb selective suspend setting is disabled in the power profile. I also checked that the root hub option to sleep the device is disabled. Is there any chance that the kernel is to culprit? I could try downgrading, but then I'd have to use the PCI config hack to get the GPU to work. (Dunno if it will even work.) Playing audio from the GPU works fine. With my intel setup, I passed through my entire USB controller, but with this setup, the IOMMU groups aren't letting me do that -- even with a PCI USB card. :-(
February 24, 20188 yr hi coppit, A similar issue just happened to me. The WM is a ubuntu server 16.04 LTS and I am passing to it a TEMPer usb device, plugged directly to the onboard usb - motherboard's back i/o - so no hub involved. Whenever i am trying to access/read the TEMPer device from inside the VM, the device is getting reset - in the unraid log i get messages like this: Feb 24 17:42:02 Towerx48 kernel: hid-generic 0003:0C45:7401.0003: input,hidraw0: USB HID v1.10 Keyboard [RDing TEMPerV1.4] on usb-0000:00:1d.2-1/input0 Feb 24 17:42:02 Towerx48 kernel: hid-generic 0003:0C45:7401.0004: hiddev96,hidraw1: USB HID v1.10 Device [RDing TEMPerV1.4] on usb-0000:00:1d.2-1/input1 The device becomes unavailable / inaccessible in the VM - does not show up anymore in lsusb. A restart of VM is required to see it again in VM. I hope I solved it: I changed the VM USB definition from ehci to xhci, although the motherboard does not have usb3.0 (rampage formula x48). If in the past i was always getting the usb reset, now after changing to xhci, i can consistently read it. I will update if i get issues in the following of days... But worth to try, in case you haven't already... good luck alex LE: i see your device recognized as xhci: Jan 31 21:32:41 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd but not sure if this is because of VM definition, or how the unraid sees it (most probably the later) anyhow, it's worth toggling this definition in VM... Edited February 24, 20188 yr by alexciurea
Archived
This topic is now archived and is closed to further replies.