Windows VM Freezing under Heavy Internet Usage


Go to solution Solved by Aurora8orealis,

Recommended Posts

Hello everyone,

this is my first post and also a shoutout for help regarding my current issue with the Unraid System.

I've been using Unraid now for the past 6 Month and was very happy so far, but sadly I am struggling with an issue what makes usage of my Windows VM very unusable.

My Unraid Runs With 1 Ubuntu VM and a Windows VM
The Windows VM has a 3070TI passed through, so I can use it like a normal Computer daily.

So but now to the Problem, when using the Windows VM with heavy internet usage the VM starts to stutter, lags or even freeze completely for a couple of seconds.
Best examples are:
- Internet Speed test on any website: Windows VM freezes completely for 5 Seconds and then stabilizes slowly again till speed test is complete
- Steam Game Downloads cause VM to heavily lagg, causing usage mostly impossible do to everything freezing and stuttering heavily.

First i thought that the laggs would be regarding a new NVMe SSD i bought for this VM, but with Crystaldiskmark no issues have been noticed while testing read and write speeds.
In the VM Settings i already tested br0 and T1000 and both seem to have same issues.
The Windows VM is fine when downloading with about 200mbit, at about 300-400 Mbit/s I start to notice stutter, till vm completely freezes at about 600-800 Mbit/s

I would appreciate any help and can when requested any more information if needed.
VM XML: https://pastebin.com/sa72wdCv

 

unraid-diagnostics-20221120-1802.zip

Edited by Aurora8orealis
Link to comment

yeah. sorry for you, but you have found the worst part of QEMU, Unraid's VM thing.

 

Using a fast network (does not need to be internet, LAN traffic does the same) almost halts everything. I suppose the LAN drivers for the emulator are bad to badder. So far I have only seen AMD cpus showing this problem, but that does not help you (and me).

 

At the end, I have killed all vms and moved them to a different box with Hyper V.

 

Link to comment

 

57 minutes ago, MAM59 said:

Using a fast network (does not need to be internet, LAN traffic does the same) almost halts everything. I suppose the LAN drivers for the emulator are bad to badder. So far I have only seen AMD cpus showing this problem, but that does not help you (and me).

Thats not good to hear, but i dont really wanna switch, i would still try to stay with unraid.
Any other possible solutions that could maybe fix this issue
Also, i am using a Ryzen 9 5950x

Link to comment
9 minutes ago, ghost82 said:

Check if the irq of the netwrok controller is shared with some other device; if it is, switch at least one of the devices that shares the same irq to msi-x with the msi utility, search and download it in this forum.

Hay, Thanks for your Answer
I am Still a little bit unsure what is correctly meant with "IRQ" ?

DO u mean the IOMMU Groups?

I do have 2 Virtual Machines and a few Docker Containers.

Could it be a bit more explained how i do what u suggested?

Link to comment
19 minutes ago, Aurora8orealis said:

Could it be a bit more explained how i do what u suggested?

By IRQ I mean interrupt requests.

I'm talking about things happening inside the same virtual machine.

Inside a bare metal pc, as in a virtual machine, devices talk to the cpu (real or emulated) through irq.

IRQs have a number assigned.

Some devices may share the same IRQ, there's nothing wrong most of the time, but if 2 devices like a gpu and a network controller share the same irq there will be too much traffic and you may experience lags.

If the device and drivers support it, luckily you can switch from irq to msi (message signalled interrupts); this will give a negative and unique number for the irq.

Extract the attached file and run it as administrator.

As an example, this is my device list:

msi.jpg.a975bf0700d85e7571f2f8c54ca9d716.jpg

 

Let's say your network controller is my Red Hat Virtio ethernet adapter.

What you need to check:

1. Check if there is a checkmark for the network controller, under msi; if there is, stop here the issue is elsewhere

2. If there isn't a checkmark check in the irq column that the number is not shared with anything else, if it's not shared stop here the issue is elsewhere

3. If the irq is shared and there isn't a checkmark in the msi column, under msi, put it, click apply, restart the vm, and see if things are better.

 

--

In general check for other irq shares and apply msi on devices that support it (msi is supported if it is in the "support modes" column) to avoid irq sharing.

As you can see I applied msi to several devices to avoid irq sharing. All usb controllers, emulated and passed through do not support msi, but luckily they have independent irqs: 22, 21, 20, 23.

 

Please note that enabling msi on non supported devices or on devices with bugged drivers may cause the vm to not boot...I would think twice before applying msi to a passed through sata controller to which the boot disk is attached...

MSI_util_v3.zip

Edited by ghost82
Link to comment
7 minutes ago, ghost82 said:

By IRQ I mean interrupt requests.

I'm talking about things happening inside the same virtual machine.

Inside a bare metal pc, as in a virtual machine, devices talk to the cpu (real or emulated) through irq.

IRQs have a number assigned.

Some devices may share the same IRQ, there's nothing wrong most of the time, but if 2 devices like a gpu and a network controller share the same irq there will be too much traffic and you may experience lags.

If the device and drivers support it, luckily you can switch from irq to msi (message signalled interrupts); this will give a negative and unique number for the irq.

Extract the attached file and run it as administrator.

As an example, this is my device list:

msi.jpg.a975bf0700d85e7571f2f8c54ca9d716.jpg

 

Let's say your network controller is my Red Hat Virtio ethernet adapter.

What you need to check:

1. Check if there is a checkmark for the network controller, under msi; if there is, stop here the issue is elsewhere

2. If there isn't a checkmark under msi, put it, click apply, restart the vm, and see if things are better.

 

--

In general check for other irq shares and apply msi on devices that support it (msi is supported if it is in the "support modes" column) to avoid irq sharing.

As you can see I applied msi to several devices to avoid irq sharing. All usb controllers, emulated and passed through do not support msi, but luckily they have independent irqs: 22, 21, 20, 23.

 

Please note that enabling msi on non supported devices or on devices with bugged drivers may cause the vm to not boot...I would think twice before applying msi to a passed through sata controller to ehich the boot disk is attached...

MSI_util_v3.zip 15.61 kB · 0 downloads

Hay, Thanks for your answer, but with looking into the tool the issue seems to be somewhere else.
Im sending a Screenshot:
image.png.dfc3b952c0e6a3ef854438a1d3c37f94.png

As u can See 3070Ti is -31 and completly alone in its group, so the issue seems to be somewhere else
But Thank you so far for your efforts

Link to comment
42 minutes ago, ghost82 said:

You have the audio part of the gpu on irq 22 (same as network controller), put a checkmark on high definition audio controller, apply and restart.

Not sure it will solve the issue, worth a try, and anyway it's the proper setting to avoid any conflict and audio lags.

Box was Ticked, PC Restarted, Irq is now on 32, so seperate from the network card.
image.png.fc7efdd5e0593ad643dc8e0dc9949abc.png
I've done a speedtest again to check if it's fixed, but sadly PC still freezes completely for 3-4 Seconds

Link to comment
  • 1 month later...
  • Solution

After reading through many posts and trying multiple different things, i finally found a solution.

Under Tools-> System Devices, I bound the NVME Controller of my SSD to VFIO

After a restart of Unraid I then deleted my old VM and created a new one.
This time I'm keeping the primary boot device as none, but under other PCIe Devices i then Pass through the entire NVMe Controllerimage.png.6f7f89f4a6326ee39fa2d16f7fe0b417.png

After this step I reinstalled windows as normal and installed the VirtIO Drivers, after this all issues have been fixed.

Link to comment
  • 3 weeks later...
On 11/26/2022 at 4:49 PM, ghost82 said:

You have the audio part of the gpu on irq 22 (same as network controller), put a checkmark on high definition audio controller, apply and restart.

Not sure it will solve the issue, worth a try, and anyway it's the proper setting to avoid any conflict and audio lags.

Dude!!! I have been trying to solve why I cannot stream from my VM without insane amounts of lag. I tried literally everything and was about to quit UNRAID until I saw your post. MY irq was doubled on both my GPU and the audio. Tried the MSI util and checked the box to switch it and my VM is running flawlessly now. Thank you so much.

  • Like 1
Link to comment
  • 5 months later...
On 11/26/2022 at 11:49 AM, ghost82 said:

You have the audio part of the gpu on irq 22 (same as network controller), put a checkmark on high definition audio controller, apply and restart.

Not sure it will solve the issue, worth a try, and anyway it's the proper setting to avoid any conflict and audio lags.

Man, I think this fixed my problem too. I have been working forever to try to figure out why it locks up with high network utilization. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.