Jump to content
coppit

PCI and USB device resets

7 posts in this topic Last Reply

Recommended Posts

Hi all,

 

I'm running 6.4.1-rc1. I have two VMs that each use a passthrough GPU.

 

Initially, my VMs would start fine, and then when audio came through my USB headset, it would stop working after a few seconds. Now, even my usb keyboard isn't working. The keyboard seems to work okay when controlling unraid's command prompt, but doesn't seem to work when passed to the VM.

 

Looking in the syslog, I see some scary PCI bus error messages:

Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: AER: Multiple Corrected error received: id=0000
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=4019(Transmitter ID)
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1:   device [1022:1453] error status/mask=00001180/00006000
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1:    [ 7] Bad DLLP              
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1:    [ 8] RELAY_NUM Rollover    
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1:    [12] Replay Timer Timeout  
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: AER: Multiple Corrected error received: id=0000
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=4019(Transmitter ID)
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1:   device [1022:1453] error status/mask=00001100/00006000
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1:    [ 8] RELAY_NUM Rollover    
Jan 31 21:32:22 storage kernel: pcieport 0000:40:03.1:    [12] Replay Timer Timeout  

and shortly afterward some scary USB reset messages:

Jan 31 21:32:25 storage kernel: vfio_ecap_init: 0000:41:00.0 hiding ecap 0x1e@0x258
Jan 31 21:32:25 storage kernel: vfio_ecap_init: 0000:41:00.0 hiding ecap 0x19@0x900
Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 4
Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 6
Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 10
Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 11
Jan 31 21:32:26 storage acpid: input device has been disconnected, fd 12
Jan 31 21:32:41 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:32:41 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:32:41 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:32:42 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:32:42 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd
Jan 31 21:32:43 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd
Jan 31 21:33:25 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:25 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:25 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:26 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:26 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:26 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:27 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:27 storage kernel: usb 1-14: reset full-speed USB device number 4 using xhci_hcd
Jan 31 21:33:28 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd
Jan 31 21:33:28 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd
Jan 31 21:33:28 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd
Jan 31 21:33:29 storage kernel: usb 5-2: reset full-speed USB device number 2 using xhci_hcd
Jan 31 21:36:37 storage ntpd[2458]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

I'm guessing that starting the VM somehow causes the USB reset, which kills my keyboard?

Share this post


Link to post

Dunno about the PCIe bus error, but the usb device resets are normal.  Maybe post entire diagnostics.zip.

Share this post


Link to post

The PCIe errors can sometimes be fixed by using the offending controller on a different PCIe slot (ideally changing from a CPU slot to a PCH slot or vice versa) or with a bios update.

 

 

Share this post


Link to post

Here's the diagnostics zip. Sorry, I meant to attach it before.

 

I vaguely remember having a similar problem and it was due to the CPU not being seated properly. Sound plausible?

 

I'm also wondering if maybe the USB ports aren't getting enough power. Maybe the motherboard is bad?

 

The only PCI cards I have are the 2 GPUs, and I was hoping to leave them in the motherboard-recommended configuration. GPUs seem to work okay. Sounds like the USB issue might not be related anyway?

 

Thanks!

 

storage-diagnostics-20180131-2137.zip

Share this post


Link to post
2 minutes ago, coppit said:

The only PCI cards I have are the 2 GPUs

In that case not much you can do, keep an eye out for a bios update, a newer kernel might also make a difference.

Share this post


Link to post

More info...

 

The keyboards not working I think were due to putting them both in the same USB hub. They're different brands, but apparently the wireless part is the same manufacturer. (They have the same device IDs.)

 

The audio cuts out after a few seconds on a youtube video, but if I stop the video then go into device manager and disable/enable the device, that fixes it. While playing overwatch, the audio will stop then the whole game freezes for 5 seconds or so, then everything recovers, including the audio.

 

I checked the Windows event log and there's nothing suspicious. I double-checked that the usb selective suspend setting is disabled in the power profile. I also checked that the root hub option to sleep the device is disabled.

 

Is there any chance that the kernel is to culprit? I could try downgrading, but then I'd have to use the PCI config hack to get the GPU to work. (Dunno if it will even work.)

 

Playing audio from the GPU works fine.

 

With my intel setup, I passed through my entire USB controller, but with this setup, the IOMMU groups aren't letting me do that -- even with a PCI USB card. :-(

Share this post


Link to post

hi coppit,

 

A similar issue just happened to me.

 

The WM is a ubuntu server 16.04 LTS and I am passing to it a TEMPer usb device, plugged directly to the onboard usb - motherboard's back i/o -  so no hub involved.

 

Whenever i am trying to access/read the TEMPer device from inside the VM, the device is getting reset - in the unraid log i get messages like this:

Feb 24 17:42:02 Towerx48 kernel: hid-generic 0003:0C45:7401.0003: input,hidraw0: USB HID v1.10 Keyboard [RDing TEMPerV1.4] on usb-0000:00:1d.2-1/input0
Feb 24 17:42:02 Towerx48 kernel: hid-generic 0003:0C45:7401.0004: hiddev96,hidraw1: USB HID v1.10 Device [RDing TEMPerV1.4] on usb-0000:00:1d.2-1/input1

 

The device becomes unavailable / inaccessible in the VM - does not show up anymore in lsusb.

A restart of VM is required to see it again in VM.

 

I hope I solved it: I changed the VM USB definition from ehci to xhci, although the motherboard does not have usb3.0 (rampage formula x48).

If in the past i was always getting the usb reset, now after changing to xhci, i can consistently read it.

I will update if i get issues in the following of days...

But worth to try, in case you haven't already...

 

good luck

alex

 

LE: i see your device recognized as xhci:

Jan 31 21:32:41 storage kernel: usb 5-4: reset full-speed USB device number 4 using xhci_hcd

but not sure if this is because of VM definition, or how the unraid sees it (most probably the later)

anyhow, it's worth toggling this definition in VM...

Edited by alexciurea

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.