softfeet Posted October 12, 2020 Share Posted October 12, 2020 (edited) Hello and thanks for having a look. I've been digging into this issue where on shutdown of a Win10 VM from inside the vm, that has been running for an hour or more will crash the unraid host and spray network packets over the network. The win10 VM is using a gpu passthrough. gpu of a 1070. cpu pinned and isolated. shutdown by clicking start>shutdown. network hangs. synergy mouse sharing fails. Network collapses. hosts cant talk to anything on lan or outside. (google etc). This is a really WEIRD fail case. Help is appreciated. I've been seeing a number of posts with the same issue. I have an x79 board. So I modified the boot. did not help. example: append pcie_no_flr=1022:149c,1022:1487 vfio-pci.ids=8086:10e8 isolcpus=1-11,13-23 initrd=/bzroot I keep seeing people that have the same issue. and no real solution. seems like others have had this same sort of issue for one reason or another since 2015 advice is appreciated! Here is my xml for the VM <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm'> <name>Win10_Game_001</name> <uuid>8e78427f-65a1-5c1a-1d46-43f5a254e863</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='6'/> <vcpupin vcpu='1' cpuset='18'/> <vcpupin vcpu='2' cpuset='7'/> <vcpupin vcpu='3' cpuset='19'/> <emulatorpin cpuset='0,12'/> </cputune> <os> <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='2' threads='2'/> <cache mode='passthrough'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/mnt/cache/domains/Win10_Game_001/vdisk1.img'/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:53:00:cd:17:15'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <sound model='ich9'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </sound> <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <memballoon model='none'/> </devices> </domain> Edited October 12, 2020 by softfeet Quote Link to comment
JustOverride Posted October 12, 2020 Share Posted October 12, 2020 Hello, Have you use an edited bios for the graphics card removing the first part of it? SpaceInvader made a video on how to do it. Also, you may want to try DDU to uninstall the graphics drivers and install the latest one. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 17 minutes ago, XiuzSu said: Hello, Have you use an edited bios for the graphics card removing the first part of it? SpaceInvader made a video on how to do it. Also, you may want to try DDU to uninstall the graphics drivers and install the latest one. I had tried space invader's video on a first go through. It didn't work for me though. I dumped, found a bios file online. But found that just passing through worked... to a point. per the thread. I am not familiar with DDU. Can you explain why the options you are mentioning would help with my specific problem? I dont want to spend a lot of time trying something if it does not have a rational for being a solution. That is, I dont want to spend hours testing other people's theories unless I am sure that it is a guess and check session. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 DDU: https://www.guru3d.com/files-details/display-driver-uninstaller-download.html unsure what the workflow for this would be though. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 researching and reading about: Quote Link to comment
JustOverride Posted October 12, 2020 Share Posted October 12, 2020 (edited) 1 hour ago, softfeet said: I had tried space invader's video on a first go through. It didn't work for me though. I dumped, found a bios file online. But found that just passing through worked... to a point. per the thread. I am not familiar with DDU. Can you explain why the options you are mentioning would help with my specific problem? I dont want to spend a lot of time trying something if it does not have a rational for being a solution. That is, I dont want to spend hours testing other people's theories unless I am sure that it is a guess and check session. This issue is likely due to the graphics card passthro. You can confirm this by running the VM without the graphics card being passthro. If it doesn't crash anymore, well there's your problem. Feel free to test it by removing the graphics card passthro. The drivers being mixed in with windows drivers or older nvidia drivers may cause an issue as well which is why I suggested you try DDU, and then install new drivers. The bios from the website didn't quite work for my graphics card. I just put it in another system, and ran gpuz to extract the bios directly from my graphics card. Then use a hex editor to remove the header. Edited October 12, 2020 by XiuzSu Typo Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 I just went through the rom/dump procedure in the vm that crashes (the only windows machine i got :D) I am now runing the .dump file and the system looks stable. I'll try and get it to crash on shut down some time in the next few days. Unsure what causes the crash specifically, but this looks like a step in the right direction. Thanks for the explanation and tips. I'll check back in after a few days(or sooner if it blows up ) with updates if I use the ddu. 25 minutes ago, XiuzSu said: This issue is likely due to the graphics card passthro. You can confirm this by running the VM without the graphics card being passthro. If it doesn't crash anymore, well there's your problem. Feel free to test it by removing the graphics card passthro. The drivers being mixed in with windows drivers or older nvidia drivers may cause an issue as well which is why I suggested you try DDU, and then install new drivers. The bios from the website didn't quite work for my graphics card. I just put it in another system, and ran gpuz to extract the bios directly from my graphics card. Then use a hex editor to remove the header. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 Just had the unraid system lock up again. Was logged into win10 machine via parsec. Hit the shutdown button from the start menu. Entire network came to a grinding halt. network usage: two cifs connections open to two osx computers. (share mounted, not utilized for stream or transfer) smb mount in vm of linux. active data being transferred from wan to smb mount. nfs4 connection from above linux vm2. with active file transfer from wan. the win10 vm up. has a smb connected share to unraid. iscsi vm running with network share to linux vm. Everything works fine until the win10 vm is shut off. then unraid goes into network hell and the entire network grinds to a halt. packet blender. This is with the dumped rom file for the video card. I don't get it. This makes no sense. I would look at logs... but dont know exactly what to look for in a usb based system. . Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 I can't figure out why this is so unstable. my win10 vm is using seabios. was unable to get the install to work any other way. goes to that black startup screen if you use the non-seabios option. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 This says that It should be logging syslog here... but it is a complete lie or misdirection. As far as I can tell. Quote Link to comment
JustOverride Posted October 12, 2020 Share Posted October 12, 2020 12 minutes ago, softfeet said: Just had the unraid system lock up again. Was logged into win10 machine via parsec. Hit the shutdown button from the start menu. Entire network came to a grinding halt. network usage: two cifs connections open to two osx computers. (share mounted, not utilized for stream or transfer) smb mount in vm of linux. active data being transferred from wan to smb mount. nfs4 connection from above linux vm2. with active file transfer from wan. the win10 vm up. has a smb connected share to unraid. iscsi vm running with network share to linux vm. Everything works fine until the win10 vm is shut off. then unraid goes into network hell and the entire network grinds to a halt. packet blender. This is with the dumped rom file for the video card. I don't get it. This makes no sense. I would look at logs... but dont know exactly what to look for in a usb based system. . Did you edited the rom file as well? Cleared old drivers and re-installed them? Tried making a new VM (using the same vdisk) but without the GPU passthro? If so, does it crashes then? Do you still have "pcie_no_flr=1022:149c,1022:1487" on your unraid settings section? Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 9 minutes ago, XiuzSu said: Did you edited the rom file as well? Cleared old drivers and re-installed them? Tried making a new VM (using the same vdisk) but without the GPU passthro? If so, does it crashes then? Do you still have "pcie_no_flr=1022:149c,1022:1487" on your unraid settings section? edit rom file: yes. This was a lot easier on a windows machine. cleared old drivers: not yet. something to try. new vm method: I presume this would be just the regular emulated video hardware. the resolution is terrible. worth a shot to have it running as a clone to start and restart every few hours. pcie_no_flr in place, yes it is still in place. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 Clearing the drivers and installing them. I don't know these steps all that well. The drivers were originally auto loaded from windows 10. I was amazed. I'm not sure what to do in the reload the drivers scenario. Quote Link to comment
JustOverride Posted October 12, 2020 Share Posted October 12, 2020 (edited) Just run UUD (there is an option to stop windows from installing drivers temporarily). Uninstall the drivers, install the latest ones, then untick the windows option to continue to install drivers when done. You may also want to try running the VM with the emulated video hardware and see if you experience issues. (edit) I just remember, make sure to edit your XML and under the graphic card section, you make it look like this. <alias name='hostdev0'/> <rom file='/mnt/user/isos/GTX - Bios Dump/Updated Asus GTX 1050 Ti.rom'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> with the 'multifunction='on' And change the next pci line to match the same slot as the first, then change the function to match the same function +1 as shown above. Note: When you edit XML, if you go back to the regular settings and change something and save it, you have to go back to the XML to make this change as it will be changed back to default. Edited October 12, 2020 by XiuzSu Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 23 minutes ago, XiuzSu said: <alias name='hostdev1'/> Is this section critical? Quote Link to comment
JustOverride Posted October 12, 2020 Share Posted October 12, 2020 Just now, softfeet said: Is this section critical? Yeaaa. just post your XML after you have added the edited rom and GPU passthro, and I'll edit it for you. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/windows_share/desktop/MSI.GTX1070.dump'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> </hostdev> Quote Link to comment
JustOverride Posted October 12, 2020 Share Posted October 12, 2020 Ok, yea that's perfect like that. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 of note, that makes the VM sound a lot better. . seems faster on the network too. I have the same network profile up prior to loading the vm so that things will hopefully break/not break in the same environment. Quote Link to comment
JustOverride Posted October 12, 2020 Share Posted October 12, 2020 Yea, the second part <alias name='hostdev1'/> is the sound portion of the graphics card. I believe everything should be good now. Let me know if you have any issues. Quote Link to comment
softfeet Posted October 12, 2020 Author Share Posted October 12, 2020 I still need to do the ddu step. though I will give this phase an hour or two and see if it survives. then if it knocks itself out, i'll try the ddu. going to backup the xml too. I recall having that setting a while back ... but so many edits. like you say. they get lost depending on the mode. Thanks! Quote Link to comment
softfeet Posted October 13, 2020 Author Share Posted October 13, 2020 (edited) 6 hours ago, softfeet said: This says that It should be logging syslog here... but it is a complete lie or misdirection. As far as I can tell. Figured out how it can work. setup the listener and the server in the sections listed. so that it is sending to the ip of itself... (unraid server)... Took about 3 minutes to push a few buttons and test after looking back into it. creates a file in the listed directory. having tcp and tcp set to the same is important... udp/tcp failed. lol. Edited October 13, 2020 by softfeet Quote Link to comment
softfeet Posted October 13, 2020 Author Share Posted October 13, 2020 (edited) Hmm. Still seeing some weirdness. I have another gpu of the exact same type in the unraid host. When I load the second vm (ubuntu 16.04) with the second gpu. it crashed the system. unraid offline. network packet storm. after reboot. the ubuntu vm wont boot at all. even after switching back to the regular vnc video. so strange. Could be due to board bios type that is being emulated. but i cant imagine that as the reason. I also checked the sylog file. nothing informative. Edited October 13, 2020 by softfeet Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.