Jump to content
We're Hiring! Full Stack Developer ×

[SOLVED] Windows 10 - Stuck in constant BSOD/Crash loop


heilage

Recommended Posts

Hey guys.

 

I was booting up my Windows VM today, as usual, and discovered to my dismay that for some reason, it was now stuck in some kind of loop. The boot process crashes, it reboots, try to repair, BSODs with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED, reboots again, crashes, and so forth. Once, this loop ran enough times for Windows to apparently "fix" itself, but then the whole OS suffered greatly from lag and even opening up a menu took a second or two. Hopeless.

 

Quick rundown:

- When removing the GPU passthrough on my Nvidia GTX970, it boots fine and I can interact with the system over VNC

- I uninstalled the Nvidia drivers over VNC, reenabled GPU passthrough, and it booted fine

- When installing the latest drivers (I had drivers from 9/12, latest was 21/12), the cycle repeats on reboot

- Disabling/enabling cores or other passthrough devices appear to have no effect, on the GPU

- Booting into Ubuntu with the same devices appear to work just fine, so I do not think it is related to any physical components

 

On Wednesday (6/1), it worked fine and I was using it. It has been off since. What the hell has happened in the meantime? If a Windows Update was causing the crash, then it would presumably crash when the GPU is not attached as well. Of course, an update could have been issued that specifially broke compatibilty with the GTX 970 on any recent drivers, but that seems unlikely, no? Remember, this happens on both the latest and penultimate driver sets here, and it worked fine as of Wednesday, so the drivers have not been updated since (I also, upon removing the drivers over VNC, saw that they were dated 9/12).

 

Has anyone experienced anything like this? As of right now, my VM is broken. And that sucks.

 

Settings:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>Prometheus</name>
  <uuid>32e936ec-5987-9089-029f-574bd8c1cbb2</uuid>
  <description>Spillmaskina</description>
  <metadata>
    <vmtemplate name="Custom" icon="windows.png" os="windows"/>
  </metadata>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='4' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/VMer/Prometheus/vdisk1.img'/>
      <target dev='hda' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/cache/VMer/Prometheus/vdisk2.img'/>
      <target dev='hdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:4d:8f:4b'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/Prometheus.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52b'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc318'/>
      </source>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='ioh3420,bus=pci.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='vfio-pci,host=00:1b.0,bus=root.1,addr=01.0'/>
  </qemu:commandline>
</domain>

 

Specs:

- MSI Z97S SLI Krait Edition S-1150 ATX

- Intel i5-4440

- Mixed 20GB RAM

- Samsung 840 250GB SSD

 

EDIT: I made a new, clean VM. Installed fine. Updated Windows (which included some set of Nvidia drivers), and it broke again. There is definitely something with a Windows Update crashing this, I suspect.

 

EDIT2: Removed all Windows updates, still crashes. Going to try rolling back Nvidia drivers now.

 

EDIT3: So yeah. I think there is a problem with UnRaid and the GPU passthrough with Windows 10. There is no previously confirmed working configuration that works as of now. In addition, every time the VM crashes now, it kills unRAID and I have to do a dirty reboot. This is not acceptable. What should I do?

Link to comment

Sorry, I'm bumping this, my whole machine is broken and I can't explain why. Se the edits at the bottom of the first post.

 

Try installing Windows 10 and just disable windows update completely. Install the missing drivers from Device manager from the mounted virtio iso and then Install your Nvidia Drivers.

Link to comment

Sorry, I'm bumping this, my whole machine is broken and I can't explain why. Se the edits at the bottom of the first post.

 

Try installing Windows 10 and just disable windows update completely. Install the missing drivers from Device manager from the mounted virtio iso and then Install your Nvidia Drivers.

 

Already tried that today, same result. Immediate crash on completion and reboot of the Nvidia drivers, both with the old (1/12) and new (23/12) drivers.

 

EDIT: Tried again for luck. Clean W10, no updates, latest nvidia drivers, crash.

Link to comment

Sorry, I'm bumping this, my whole machine is broken and I can't explain why. Se the edits at the bottom of the first post.

 

Try installing Windows 10 and just disable windows update completely. Install the missing drivers from Device manager from the mounted virtio iso and then Install your Nvidia Drivers.

 

Already tried that today, same result. Immediate crash on completion and reboot of the Nvidia drivers, both with the old (1/12) and new (23/12) drivers.

 

EDIT: Tried again for luck. Clean W10, no updates, latest nvidia drivers, crash.

 

You said that you have tried installing Ubuntu and that seems to work. Have you tried Windows 8.1? See if that works. It's not a replacement for your Windows 10 VM but if Windows 8.1 does work without crashing you could temporarily use that. If it doesn't work then it might be a hardware issue with your GPU.

 

You could also try to physically put your graphics card in a different slot. I had issues with my asus soundcard and they got fixed by moving the card to a different PCI-E slot.

 

 

Link to comment

I had all sorts of issues getting a Win10 VM going with a GTX960. Got there in the end, but I'm not sure what step made it come good. These were my steps.

 

* PCIe ACS override

* OVMF instead of SeaBIOS

* Make sure HyperV options are off in the VM settings

* Updated GPU firmware (http://www.techpowerup.com/vgabios/)

* Updated build of Win10 ISO (The update to build 1511 in Windows Update has been known to be a painful experience for some)

* Don't use the latest (unstable) version of virtio drivers. The stable release (102) won't work, so I use 109 instead. Others have had better success with 109 drivers as well

* Make sure you're passing through both the video and audio portions of the card. Most NVIDIA cards don't like being split

* Make sure you're setting the MSI interrupts for both the video and audio portions of the card, even if you're not using HDMI (http://lime-technology.com/wiki/index.php/UnRAID_Manual_6#Enable_MSI_for_Interrupts_to_Fix_HDMI_Audio_Support)

 

Link to comment

You said that you have tried installing Ubuntu and that seems to work. Have you tried Windows 8.1? See if that works. It's not a replacement for your Windows 10 VM but if Windows 8.1 does work without crashing you could temporarily use that. If it doesn't work then it might be a hardware issue with your GPU.

 

You could also try to physically put your graphics card in a different slot. I had issues with my asus soundcard and they got fixed by moving the card to a different PCI-E slot.

 

I'm not sure if I have a spare 8.1 image available that can be used to install (I don't think I have a key right now, I believe I need one to install?).

 

I'm starting to think that this might be down to the GPU itself, and a hardware issue (maybe the RAM?). There appears to be some instruction running when the GPU initializes that causes it to crash, and it wouldn't surprise me if the process is different under Ubuntu, considering it has a completely different driver package, which may explain why it does not break when booting there.

 

I'm going to yank out the power on the unRAID drives and try to install Win10 again directly. If it fails again, I believe we have our man. It actually didn't occur to me that the GPU might be faulty until I woke up this morning. It's less than a year old, so RMAing it shouldn't be an issue, if that's the case.

 

If it works fine on a native Windows installation, then I'm not sure what to do.

 

 

Scrapz: I actually haven't set a lot of the things you mention, but considering that Win10 was working fine for a good month before it suddenly crashed, these things shouldn't really be that relevant? It appears to have spontaneously died on me, but of course, it may have been running half-assed without me noticing (I have had some issues, and I haven't had time to go into heavy usage yet).

Link to comment

You said that you have tried installing Ubuntu and that seems to work. Have you tried Windows 8.1? See if that works. It's not a replacement for your Windows 10 VM but if Windows 8.1 does work without crashing you could temporarily use that. If it doesn't work then it might be a hardware issue with your GPU.

 

You could also try to physically put your graphics card in a different slot. I had issues with my asus soundcard and they got fixed by moving the card to a different PCI-E slot.

 

I'm not sure if I have a spare 8.1 image available that can be used to install (I don't think I have a key right now, I believe I need one to install?).

 

I'm starting to think that this might be down to the GPU itself, and a hardware issue (maybe the RAM?). There appears to be some instruction running when the GPU initializes that causes it to crash, and it wouldn't surprise me if the process is different under Ubuntu, considering it has a completely different driver package, which may explain why it does not break when booting there.

 

I'm going to yank out the power on the unRAID drives and try to install Win10 again directly. If it fails again, I believe we have our man. It actually didn't occur to me that the GPU might be faulty until I woke up this morning. It's less than a year old, so RMAing it shouldn't be an issue, if that's the case.

 

If it works fine on a native Windows installation, then I'm not sure what to do.

 

 

Scrapz: I actually haven't set a lot of the things you mention, but considering that Win10 was working fine for a good month before it suddenly crashed, these things shouldn't really be that relevant? It appears to have spontaneously died on me, but of course, it may have been running half-assed without me noticing (I have had some issues, and I haven't had time to go into heavy usage yet).

 

Yeah you could try installing Windows 10 natively to see if that works but maybe before you do that you should try to do all steps that "Scrapz" posted. Let us know if you manage to get it working.

Link to comment

It works! It was working fine with Win10 natively installed, so I was suspicious as to what could cause it, if the GPU was not faulty (it may still be, but that remains to be seen).

 

I had all sorts of issues getting a Win10 VM going with a GTX960. Got there in the end, but I'm not sure what step made it come good. These were my steps.

 

* PCIe ACS override

* OVMF instead of SeaBIOS

* Make sure HyperV options are off in the VM settings

* Updated GPU firmware (http://www.techpowerup.com/vgabios/)

* Updated build of Win10 ISO (The update to build 1511 in Windows Update has been known to be a painful experience for some)

* Don't use the latest (unstable) version of virtio drivers. The stable release (102) won't work, so I use 109 instead. Others have had better success with 109 drivers as well

* Make sure you're passing through both the video and audio portions of the card. Most NVIDIA cards don't like being split

* Make sure you're setting the MSI interrupts for both the video and audio portions of the card, even if you're not using HDMI (http://lime-technology.com/wiki/index.php/UnRAID_Manual_6#Enable_MSI_for_Interrupts_to_Fix_HDMI_Audio_Support)

 

I will address this point by point:

 

- This was not recommended, as per my google results. I did not do this step.

- Changed to OVMF, may have contributed to my success

- HyperV is off, always have been

- GPU firmware, I left this as-is, to limit the number of variables. Also, the newest firmware is older than when I bought the card (may still be outdated though)

- Switched from 110 to 109, may have contributed to my success

- I did not know about this. I have never forwarded the GPU audio, and as such, this may have been the issue all along. It would also kinda make sense given my BSOD. Setting this may have contributed.

- MSI interrupts have not been set, but this could be an option.

 

Thank you, you put me on the right track! More testing and verifications are necessary, but this was a huge help.

Link to comment

 

I have never forwarded the GPU audio, and as such, this may have been the issue all along. It would also kinda make sense given my BSOD. Setting this may have contributed.

 

 

My money is on this one, seen a few problems with this over the last couple of weeks......

 

I must admit I've been following this thread but couldn't work out why it was working then all of a sudden wasn't.....

 

But my money would be on the GPU audio thing...

 

Congrats though, your persistence pays off

 

Just noticed if I'd taken a closer look at your XML I should have picked up you hadn't passed through the audio but I misread the XML, my apologies....  :-[

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...