Jump to content

1st Win10 VM pauses (stops with vfio_err_notifier_handler (address) Unrecoverable error detected) after I start 2nd one (passthrough issues)


Rive

Recommended Posts

Hello!

 

I've always had 2 Windows 10 VMs in my unRAID, let's call them SPC and HTPC. They both have had NVIDIA GPUs and USB controllers passed through from the very beginning.

 

And everything was fine until I updated from some older unRAID to 6.5.2.

 

1. Now when I start HTPC, my SPC can freeze immediately with 

qemu-system-x86_64: vfio_err_notifier_handler(harware address) Unrecoverable error detected. Please collect any data possible and then kill the guest

In this case the hardware failed is usually the video card with its audio card (sometimes it's Renesas USB card).

 

2. If I manage to start HTPC and then restart SPC, HTPC might freeze randomly after some time with the same error for its videocard (with its audiocard).

 

3. Or they just might freeze both randomly, if they work at the same time.

 

What I've noticed:

 

1. SPC works for weeks if I don't start HTPC. (cannot test vice versa, because I need SPC everyday).

2. If HTPC freezes shortly after boot it usually happens after I go to full screen video (YT in Chrome) or after screensaver starts.

 

What I tried:

 

1. Reinstalling Win10 on both VMs.

2. Slighly changing VMs parameters like memory and even recreating them completely from the ground with new drive images.

 

Additional info:

 

1. I turn my unRAID server off when I don't need it (it's not 24/7). In fact I upgraded from older unRAID ver to 6.5.2 because that older ver couldn't shutdown the machine half of the time - it stayed powered on but completely dead. 6.5.2 turns the machine off properly everytime.

 

2. I have a 3rd VM there - pfSense - and it's never had any issues.

 

3. I'm running unRAID on Core i7-6850K on an Asus X99 motherboard with 32 GB of RAM. SPC: GTX 1050Ti + Renesas USB controller, HTPC: GT 740 + onboard ASMedia USB 3.1 controller.

Edited by Rive
Link to comment

Ok, a small update.

I physically tossed cards inside the machine - moved Renesas USB and a network card (the one that pfSense uses) to other PCIe slots. The situation has changed a bit: now sometimes when I start HTPC VM, pfSense freezes with its network card vfio_err_notifier_handler error (never had an ussue with pfSense VM before).

 

BTW, testing is a little bit hard, 'cause half of the times HTPC spends a couple of minutes on Tianocore splash screen and sometimes its USB controller just doesn't work (onboard ASMedia USB 3.1) - the machine seems fine, just no controls besides stopping it from the VM manager... These issues aren't new, HTPC VM has always had them.

Link to comment

Can you upload a copy of your system diagnostics, preferably showing the errors you see?  I know that I've seen some of these issues with particular USB controllers, but it can be finicky at times and sometimes motherboard dependent.  I tend to lean more toward ASRock nowadays over Asus, though I still will prefer Asus over Gigabyte and MSI.

  • Like 1
Link to comment

Well, another issue has happend. I started my SPC VM after booting unRAID and the whole system just got frozen (no error msgs, nothing, and no any connectivity).

I reset the machine it and there was no VM tab in the web-interface. Never seen that before. The array is running. No VM tab, but I can see Docker tab. 

I don't understand what's going on. Tried to reboot several times, the tab still didn't reappear. Now doing parity check.

 

The diag info file is attached.

 

P.S. VM freeze problems started when I updated from 6.3.1 to 6.5.2. But now it's way worse. And I wish I could change hardware, but I can't. Everything was pretty stable on 6.3.1...

srv-diagnostics-20180930-0352.zip

 

UPDATE: Got the VMs back. VM functionality was disabled in the settings. Why? How? No idea.

Edited by Rive
UPD: Got VMs back
Link to comment

Ok, I did one more hardware change - I added one more Renesas USB card instead of onboard ASMedia USB 3.1 controller. I used that for my HTPC VM.

 

I haven't done much testing yet, but that seems more stable.

 

I have only one additional question: how do I stub PCIe cards these days? pci-stub.ids? vfio-pci.ids? xen-pciback.hide? Some combination? I really can stub all hardware VMs using, not a problem, but should I?

Link to comment

Well, here we go - after two days of working without any issues HTPC VM and pfSense VM both got frozen after I tried to open a webpage with video in it (Twitch).

The diag data is attached, but even without looking into the logs I can tell that HTPC stopped because of its videocard (or USB controller) and pfSense - because of its NIC.

 

P.S. Before I added another USB card and moved some cards to different slots the issue involved HTPC VM and SPC VM (not pfSense).

srv-diagnostics-20181003-0139.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...