Running Debian VM with GPU passthrough leads to network hang


Recommended Posts

Hi everyone, 

I am currently facing a critical issue with my Unraid server's network hanging when my Debian VM with GPU passthrough is running. I had a similar issue as described in this thread here but since I added another cache drive (M2 SSD) and reconfigured my VM's XML (changing the addresses within the XML form for GPU passthrough to work) after a few minutes the Unraid server is not accessible via the network anymore. Plugging the ethernet cable in and out solves the issue for the next few minutes. With the VM being shut down, the server remains accessible.

 

Here is how I set up the VM and the xml (sorry for only providing screenshots).
 

CF29B850-8B44-49CC-A64A-4931CA7B1A1F.thumb.png.d60e9359c0f8275dee0639b3aede0194.png4286B5E8-834A-48A5-8462-8897831A3618.thumb.png.947ade134b4b837aa4879d0c011260c6.png

 

I am at my wits' end...

 

I cannot see anything in the logs. Here is an extract with the network cable being pulled out and plugged in again.

 

Apr 9 13:35:55 GRAViTY ntpd[2076]: no peer for too long, server running free now
Apr 9 14:11:14 GRAViTY ntpd[2076]: no peer for too long, server running free now
Apr 9 15:35:08 GRAViTY kernel: r8169 0000:04:00.0 eth0: Link is Down
Apr 9 15:35:08 GRAViTY kernel: bond0: (slave eth0): link status definitely down, disabling slave
Apr 9 15:35:08 GRAViTY kernel: device eth0 left promiscuous mode
Apr 9 15:35:08 GRAViTY kernel: bond0: now running without any active interface!
Apr 9 15:35:08 GRAViTY kernel: br0: port 1(bond0) entered disabled state
Apr 9 15:35:14 GRAViTY kernel: r8169 0000:04:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
Apr 9 15:35:14 GRAViTY kernel: bond0: (slave eth0): link status definitely up, 1000 Mbps full duplex
Apr 9 15:35:14 GRAViTY kernel: bond0: (slave eth0): making interface the new active one
Apr 9 15:35:14 GRAViTY kernel: device eth0 entered promiscuous mode
Apr 9 15:35:14 GRAViTY kernel: bond0: active interface up!
Apr 9 15:35:14 GRAViTY kernel: br0: port 1(bond0) entered blocking state
Apr 9 15:35:14 GRAViTY kernel: br0: port 1(bond0) entered forwarding state
Apr 9 15:35:58 GRAViTY kernel: br0: port 2(vnet0) entered disabled state
Apr 9 15:35:58 GRAViTY kernel: device vnet0 left promiscuous mode
Apr 9 15:35:58 GRAViTY kernel: br0: port 2(vnet0) entered disabled state
Apr 9 15:35:58 GRAViTY kernel: usb 1-2: reset full-speed USB device number 3 using xhci_hcd
Apr 9 15:35:58 GRAViTY kernel: input: Microsoft Microsoft® 2.4GHz Transceiver v7.0 as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2:1.0/0003:045E:07B2.0004/input/input9
Apr 9 15:35:58 GRAViTY kernel: hid-generic 0003:045E:07B2.0004: input,hidraw0: USB HID v1.11 Keyboard [Microsoft Microsoft® 2.4GHz Transceiver v7.0] on usb-0000:02:00.0-2/input0
Apr 9 15:35:58 GRAViTY kernel: input: Microsoft Microsoft® 2.4GHz Transceiver v7.0 Mouse as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2:1.1/0003:045E:07B2.0005/input/input10
Apr 9 15:35:58 GRAViTY kernel: input: Microsoft Microsoft® 2.4GHz Transceiver v7.0 Consumer Control as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2:1.1/0003:045E:07B2.0005/input/input11
Apr 9 15:35:58 GRAViTY kernel: hid-generic 0003:045E:07B2.0005: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft® 2.4GHz Transceiver v7.0] on usb-0000:02:00.0-2/input1
Apr 9 15:35:58 GRAViTY kernel: input: Microsoft Microsoft® 2.4GHz Transceiver v7.0 Consumer Control as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2:1.2/0003:045E:07B2.0006/input/input12
Apr 9 15:35:58 GRAViTY kernel: input: Microsoft Microsoft® 2.4GHz Transceiver v7.0 System Control as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2:1.2/0003:045E:07B2.0006/input/input14
Apr 9 15:35:58 GRAViTY kernel: hid-generic 0003:045E:07B2.0006: input,hiddev96,hidraw2: USB HID v1.11 Device [Microsoft Microsoft® 2.4GHz Transceiver v7.0] on usb-0000:02:00.0-2/input2
Apr 9 15:36:00 GRAViTY kernel: vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem

 

I have attached the diagnostics file.

 

Appreciating any hint from you guys! The server being unaccessible sucks especially as my Home Assistant is running on the server and I heavily relay on that. :/

 

Thanks in advance!

 

gravity-diagnostics-20220409-1549.zip

Edited by makin
Link to comment

I found out the following:

 

- When the VM is running, network access of the server is timing out and occasionally it is reachable again for a few minutes

- Pulling the Ethernet cable and plugging it in again makes the server available again immediately (for a few minutes)

- When the VM is shut down, the server is not reachable shortly after and remains that way until I re-plug Ethernet again

 

I attached some screenshots from Ping attempts.


8A5A8FBB-B1DE-4D15-B130-5C9BA3B2A6E6.thumb.png.ced90215ad3af4c7f15a4a3d18517e77.png

 

08802A58-3FC5-4CC4-8C36-0703EC5DA25D.thumb.png.47fe2ab670461dd5c8eaf73f6caf678d.png

 

37850A62-81B8-41FF-8723-10627F53A831.thumb.png.b30571da2c743f4860c0fb9745f483eb.png


And lastly, with the VM being shut down and the cable being pulled out and plugged in again.

 

34835E94-0A63-42BB-BDD3-94E9B9E7F27C.thumb.png.623ceee0968f39134ef7af2fe9c2a9c1.png

Link to comment
  • makin changed the title to Running Debian VM with GPU passthrough leads to network hang
  • 1 month later...

Hi again,

not sure whether it is right to post here again instead of creating a new thread but I partly solved the issue with a workaround.

 

i assume that there is some network storming caused by the VM and to troubleshoot I created a new VLAN in which only the VM is running (managed by my UniFi UDM Pro). The server has not become inaccessible since then until today. But this time not due to my HTPC VM but since I installed the ZWaveJS2MQTT Docker. 
 

i really have no idea how to troubleshoot but come on… I don’t want to create some artificial VLANs just to work around this problem. :/

 

Do you guys have any idea? 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.