Windows 10 VM won't start / VM GUI frozen

faramir85 · December 26, 2021

Hello,

I've been using a W10 VM that i recently installed and it was working apparently very smoothly (it has a 1050TI pass-through). Yesterday I powered off the VM and this morning when i tried to boot it on again, the VM in unraid GUI won't start - red arrows in circle and no "started" message. The VM tab froze and the machine got unresponsive. Tried to shut the server down from the unraid GUI but it won't shutdown until i pressed the poweroff button on the server and forced it.

Attached you can find the diagnostic zip.

Anyone knows the reason for this error i found on libvirt.log?

Quote

2021-12-26 09:52:43.946+0000: 6053: error : virNetSocketReadWire:1840 : End of file while reading data: Input/output error

tower-diagnostics-20211226-1102.zip

Edited December 26, 2021 by faramir85

faramir85 · December 26, 2021

After restarting the server, i restored the VM xml that i backed up (in case i messed up any config) and the W10 OS booted up correctly. I shutdowned the machine and a few hours later, when i tried to start it again, the same problem arises (red arrows spinning and the VM won't start). These are the logs messages that i've found in the system:

Quote

Dec 26 19:00:48 Tower webGUI: Successful login user root from 172.17.0.8
Dec 26 19:02:30 Tower kernel: NVRM: Attempting to remove minor device 0 with non-zero usage count!
Dec 26 19:02:30 Tower kernel: ------------[ cut here ]------------
Dec 26 19:02:30 Tower kernel: WARNING: CPU: 4 PID: 6051 at /tmp/SBo/NVIDIA-Linux-x86_64-440.59/kernel/nvidia/nv-pci.c:577 nv_pci_remove+0xe9/0x2fc [nvidia]
Dec 26 19:02:30 Tower kernel: Modules linked in: macvlan nvidia_uvm(O) xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables xt_nat vhost_net tun vhost veth tap ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod bonding e1000e nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore pcbc aesni_intel aes_x86_64 glue_helper crypto_simd ghash_clmulni_intel cryptd kvm_intel drm_kms_helper kvm intel_cstate coretemp drm crct10dif_pclmul intel_powerclamp crc32c_intel x86_pkg_temp_thermal syscopyarea sysfillrect sysimgblt fb_sys_fops agpgart ahci i2c_i801 libahci i2c_core video thermal button fan pcc_cpufreq ie31200_edac backlight [last unloaded: e1000e]
Dec 26 19:02:30 Tower kernel: CPU: 4 PID: 6051 Comm: libvirtd Tainted: P O 4.19.107-Unraid #1
Dec 26 19:02:30 Tower kernel: Hardware name: VIGLEN DQ77MK/DQ77MK, BIOS MKQ7710H.86A.0060.2013.0618.1012 06/18/2013
Dec 26 19:02:30 Tower kernel: RIP: 0010:nv_pci_remove+0xe9/0x2fc [nvidia]
Dec 26 19:02:30 Tower kernel: Code: aa 01 00 00 00 75 2c 8b 95 70 04 00 00 48 c7 c6 7b a5 4a a1 bf 04 00 00 00 e8 bd 7d 00 00 48 c7 c7 c2 a5 4a a1 e8 31 36 d1 e0 <0f> 0b e8 c2 82 00 00 eb f9 4c 8d b5 50 04 00 00 4c 89 f7 e8 f7 d2
Dec 26 19:02:30 Tower kernel: RSP: 0018:ffffc90001dabd50 EFLAGS: 00010246
Dec 26 19:02:30 Tower kernel: RAX: 0000000000000024 RBX: ffff88840b3f70a8 RCX: 0000000000000007
Dec 26 19:02:30 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88840db164f0
Dec 26 19:02:30 Tower kernel: RBP: ffff8882a9a67800 R08: 0000000000000003 R09: 0000000000016300
Dec 26 19:02:30 Tower kernel: R10: 0000000000000000 R11: 0000000000000044 R12: ffff88828f8e0000
Dec 26 19:02:30 Tower kernel: R13: ffff88840b3f7000 R14: 0000000000000060 R15: ffff88824af0d620
Dec 26 19:02:30 Tower kernel: FS: 000014ab5cbfe700(0000) GS:ffff88840db00000(0000) knlGS:0000000000000000
Dec 26 19:02:30 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 26 19:02:30 Tower kernel: CR2: 000014ab5cbfa378 CR3: 000000039d792003 CR4: 00000000001606e0
Dec 26 19:02:30 Tower kernel: Call Trace:
Dec 26 19:02:30 Tower kernel: pci_device_remove+0x36/0x8e
Dec 26 19:02:30 Tower kernel: device_release_driver_internal+0x144/0x225
Dec 26 19:02:30 Tower kernel: unbind_store+0x6b/0xae
Dec 26 19:02:30 Tower kernel: kernfs_fop_write+0xf3/0x135
Dec 26 19:02:30 Tower kernel: __vfs_write+0x32/0x13a
Dec 26 19:02:30 Tower kernel: vfs_write+0xc7/0x166
Dec 26 19:02:30 Tower kernel: ksys_write+0x60/0xb2
Dec 26 19:02:30 Tower kernel: do_syscall_64+0x57/0xf2
Dec 26 19:02:30 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 26 19:02:30 Tower kernel: RIP: 0033:0x14ab5eea148f
Dec 26 19:02:30 Tower kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 49 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 7c fd ff ff 48
Dec 26 19:02:30 Tower kernel: RSP: 002b:000014ab5cbfd530 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Dec 26 19:02:30 Tower kernel: RAX: ffffffffffffffda RBX: 000000000000000c RCX: 000014ab5eea148f
Dec 26 19:02:30 Tower kernel: RDX: 000000000000000c RSI: 000014ab3403f910 RDI: 000000000000001e
Dec 26 19:02:30 Tower kernel: RBP: 000014ab3403f910 R08: 0000000000000000 R09: 0000000000000000
Dec 26 19:02:30 Tower kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000001e
Dec 26 19:02:30 Tower kernel: R13: 000000000000001e R14: 0000000000000000 R15: 000014ab34043d30
Dec 26 19:02:30 Tower kernel: ---[ end trace 12dd9243401a31e9 ]---
Dec 26 19:04:30 Tower nginx: 2021/12/26 19:04:30 [error] 7078#7078: *51229 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.17.0.8, server: , request: "POST /plugins/dynamix.vm.manager/include/VMajax.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.1.53", referrer: "http://192.168.1.53/VMs"

Squid · December 26, 2021

run the check filesystem against the cache drive

Dec 26 04:30:03 Tower kernel: XFS (sdc1): Metadata corruption detected at xfs_dinode_verify+0xa5/0x52e [xfs], inode 0x60000db5 dinode
Dec 26 04:30:03 Tower kernel: XFS (sdc1): Unmount and run xfs_repair
Dec 26 04:30:03 Tower kernel: XFS (sdc1): First 128 bytes of corrupted metadata buffer:

faramir85 · January 5, 2022

On 12/26/2021 at 7:13 PM, Squid said:

run the check filesystem against the cache drive

Dec 26 04:30:03 Tower kernel: XFS (sdc1): Metadata corruption detected at xfs_dinode_verify+0xa5/0x52e [xfs], inode 0x60000db5 dinode
Dec 26 04:30:03 Tower kernel: XFS (sdc1): Unmount and run xfs_repair
Dec 26 04:30:03 Tower kernel: XFS (sdc1): First 128 bytes of corrupted metadata buffer:

Sorry for the delay, i went on vacation and couldn't reach the server. After restarting it and repairing the cache drive, i keep getting random hard locks on the server when i boot the W10 VM. I've been reading about these kind of problems and i believe it's caused by the fact that i'm trying to passthrough the GPU to the W10 VM and the video card is also used under Plex docker (for trascoding, although in that specific moment there's no active transcoding).

The specs of the server, in case it helps:

unraidOS 6.8.3

Xeon E3 - 1265V2

nvidia 1050TI

I remember installing the nvidia drivers and using a script (via command line) in order to apply a patch to unlock the limit on the transcoding streams.

I believe i could solve these estability problems if i only use the gpu inside the W10 VM and stop using it through the Plex docker (i could use hw transcoding thanks to the Xeon - IGPU). How could i config the system so the Plex docker stops using the GPU? Upgrading to the latest unraidOS version would help?

Any help would be appreciated ,

regards.

Edited January 5, 2022 by faramir85

Windows 10 VM won't start / VM GUI frozen

Recommended Posts

faramir85

Link to comment

faramir85

Link to comment

Squid

Link to comment

faramir85

Link to comment

Join the conversation