Windows 10 vm crashing whole server and network


korro

Recommended Posts

Hi! first post after lurking for a couple of years, so first of all i want to thank all the forum for the great content.

Now, let's get to the problem. I've been running unraid for a couple of years with some hardware upgrades in the meantime without any major issue, but recently i've finally decided to dive into VMs to remove my old laptop used as an HTPC and run a Windows 10 VM instead, and here started the problems.

After struggling for some days with basic setup and gpu passthrugh i've finally created a functioning vm, but at first i had problems with the vm crashing the whole server when rebooting it. after some days trying to fix that the vm started to crash even at first boot, so, to be sure it wasn't due to my attempts i recreated it from scratch. Now it seems better, but it's still not 100% stable and sometimes it crashes, but now when it crashes it also screws up the whole network. i guess it just saturates it with random packets because the network activity led is blinking like hell and no other pc in the network is able to reach the main router untill i disconnect the ethernet cable from the server.

Can anyone guess what is happening? do you need anything other than the vm xml?

the xml is this:

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>GLaDOS_10</name>
  <uuid>b421838c-a199-599d-5165-ad3c8fa7f9d0</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>5242880</memory>
  <currentMemory unit='KiB'>5242880</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='6'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='2' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/domains/GLaDOS_10_new/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Windows10.iso'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.160-1.iso'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:6b:cb:9e'/>
      <source bridge='virbr0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/isos/Asus.GT610.1024.120423.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc221'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc222'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc51b'/>
      </source>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

 

Thank you all in advance for the help!

Link to comment

so, quick update.

i tried recreating the vm again, this time starting with a fresh libvirt.

the vm started correctly, i was able to start the os installation and everything went smoothly untill the installation process restarted the vm. the restart failed, the vm crashed, it crashed the server and fllooded the network again and now if i start the vm i always get the same crash.

Link to comment

yesterday i've read many other posts here and i did some other tests.

the network problem seems to be resolved or at least mitigated using dhcp reservations and long lease times on my router. Static IPs would have been more effective and simpler but i'm forced to use my isp router that doesn't support static IPs. The nat range and the dhcp range are linked, so no internet for the IPs out of the dhcp range.

Recreating yet another clean vm worked fine this time, i was able to install it and run it without issues, i was also able to shut it down and starting it multiple times, but it still crashed on reboots. i would like to solve that problem, but in the meantime i worked around that by changind <on_reboot>restart</on_reboot> to <on_reboot>destroy</on_reboot> so the vm stops in case of automatic reboots and it seemed to work fine. Unfortunately after those tests i installed the nvidia drivers and the vm crashed during installation. could it be linked to the drivers o was it just a random crash?

Also i forgot to mention that i have those problems only if passing the gpu, with vnc i have no problem at all.

 

Last thing worth noticing is that yesterday i've seen a couple of "Machine Check Event" errors, so maybe my problems aren't only due to VMs.

 

any help would be much appreciated!

Link to comment

another day, another update.

Today I managed to borrow a second video card to make some tests.

First test with my card on the second slot to isolate possible problems with the vbios but no luck, the VM started, it didn't crash installing the drivers like yesterday, it shutted down correctly but when i started it again it crashed.

Also i realized that the network problems are still here, the computers connected to the same switch as the server were fine, but a pc connected directly to the main router lost it's connection.

After dinner i will try to switch the cards, but my hopes are pretty low.

 

EDIT:

so, i've tested the vm with the borrowed gtx275 and for now it works flawlessly, i've booted the vm multiple times without issues and i'm currently writing from it, so maybe it really was the 610. 

 

The only difference i can think of except for the card itself is the pci-e slot, i was able to put the 275 in the second pci-e slot while the 610 was in the first, now i've removed the 610 and i've left the 275 in the second. I wasn't able to do so with them swapped, so for testing and vbios dump for the previous attempts i used the 275 in the first slot and the 610 in the third slot but that forced me to keep the psu out of the case, so right after that i immediately moved back the 610 in the first slot.

 

maybe tomorrow i will try again to use the 610 as the only card but this time putting it in the second slot, The 275 besides being borrowed is not a good long-term solution due to how power hungry it is.

Edited by korro
added results for the last test of the day
Link to comment

so, at this point i think nobody cares but i'll post it anyway.

Yesterday i swapped the cards and the system was fine using the 610 in the second slot as the only card. i've sarted, stopped and rebooted the vm multiple times without any issue, then i've shutted the vm down and i went to sleep. today i've fired it up again and it crashed right at the first boot taking the whole server with it, so the problem it's obviously not linked to the slot, but it's odd, was it just an incredibly long streak of lucky boots? to me it looks like they were way too many to be just pure luck.

any idea about it?

Edited by korro
Link to comment
  • 1 year later...

I am having this same problem now with passing a geforce GT 610. It seemed to only start crashing my whole server when I added the sound card pass through as well. But, so far, I haven't even been able to get the gpu to work correctly in the win10 VM. Windows says it has disabled the card with code 43.

Link to comment

hi. 

i don't know about your error code 43, for me the card was and still is running perfectly at the first start of the vm.

In the end when testing out something else i realised that the problem was the lack of full support for the card reset feature.

it kinda works but sometimes it screws up and the whole server stops responding.

In the meantime my use case for the vm changed slightly so now i simply don't shut it down unless i have updates to install so it isn't a problem anymore for me

Link to comment
  • 4 months later...

 

1 hour ago, ccsnet said:

Hi... Korro... did you manage to find a solution to the card reset issue as I think I'm having the same with a nvidia gtx 730.

 

Thanks

 

Terran

hi. no, i didn't find any solution to this problem. As i said in my last post lukilly i don't need to shut down the vm so often anymore so now i usually just restart the whole server when i shut down the vm, just to stay on the safe side, this way i don't have any problem at all.

it's not ideal if you need to shut down the vm often but it works 100% of the times

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.