Krzaku Posted January 2, 2020 Share Posted January 2, 2020 (edited) I had 32GB of ram and 12 cores assigned to my Win 10 VM. I added another 32GB (exact same brand and model), and now that same VM won't boot with any amount of ram with more than 8 cores. If I add more cores the VM won't boot and the first HT core is pinned at 100% while the others are 0%. I set the CPU Isolation correctly. I included the diagnostics, can anyone help? Edited January 24, 2020 by Krzaku Quote Link to comment
jonp Posted January 14, 2020 Share Posted January 14, 2020 Apologies for the delayed reply on this topic. Had to find a system that had enough RAM / CPUs to test this against. Just tested this on one of my test rigs here with an 8700k CPU that has 12 cores and 64GB of total RAM. I have a Windows 10 VM with a NVIDIA GPU for pass through (along with a dedicated USB controller). Normally I only had 8 cores assigned, so I bumped it up to 10 of the 12 cores pinned to the VM and it booted just fine. I'm assuming when this happens, VM Manager is reporting the VM as started and probably stays in that state until you do a Force Shutdown. If that's true, it's even more difficult to diagnose because I don't see any errors in the logs or anything like that. The first troubleshooting step I would take would be to stop assigning your GPU and USB device to the VM to see if it boots without PCI device pass through occurring. If that works, try adding the GPU back and see if that works (by itself). If that works, then it might be something with the USB device you're trying to assign to the VM. If it doesn't work, then there may be something odd with that GPU that doesn't like how virtual memory mapping works beyond the 32GB of RAM you previously had installed. Quote Link to comment
Krzaku Posted January 14, 2020 Author Share Posted January 14, 2020 (edited) After a few test runs it would seem that not one single device causes this. The VM only ever booted (to the EFI shell) after I removed ALL devices, including USB controller, both NVMe drives, the WiFi card and the GPU (I tried a few combinations). Also, the magic number with which the VM boots seems to be 9 cores not 8. And yes, when it does not boot, there are no errors in the VM logs and it doesn't stop until I force stop it. Edited January 14, 2020 by Krzaku Quote Link to comment
jonp Posted January 15, 2020 Share Posted January 15, 2020 This is very odd behavior indeed. Just to ensure I understand correctly, you are stubbing a USB controller, two NVMe drives, a Wifi controller, and a GPU, and you were passing them all through to a VM just fine. Now if you pass through ANY kind of device to the VM, the VM enters this state: On 1/2/2020 at 1:29 PM, Krzaku said: the first HT core is pinned at 100% while the others are 0%. And furthermore this behavior is only exhibited when more than 9 vCPUs are assigned to the VM. So even if you just assign the NVMe SSD for the boot device and no other PCIe devices, the same behavior occurs? Quote Link to comment
Krzaku Posted January 15, 2020 Author Share Posted January 15, 2020 That is almost all correct, except I'm only stubbing the WiFi card and not any other device, might this be the issue? These are my kernel params: pcie_acs_override=downstream,multifunction iommu=pt vfio-pci.ids=8086:a370 isolcpus=0-5,8-13 Quote Link to comment
Krzaku Posted January 15, 2020 Author Share Posted January 15, 2020 As a test, I stubbed all of the devices passed to the VM anyway, which did not fix the issue: pcie_acs_override=downstream,multifunction iommu=pt vfio-pci.ids=8086:a370,8087:0aaa,10de:1e87,10de:10f8,10de:1ad8,10de:1ad9,1033:0194,144d:a804,1987:5012 isolcpus=0-5,8-13 Quote Link to comment
jonp Posted January 17, 2020 Share Posted January 17, 2020 You definitely need to be stubbing any storage, Ethernet, or USB controllers you wish to use in your VM. Otherwise the host OS will load a proper driver and potentially block you from starting the VM. If with all of those devices stubbed, you're still having a problem, I'm running out of ideas. After starting your VM with the devices assigned, go ahead and recapture diagnostics and upload here. We need to see something in the logs to point to the source of the issue. If nothing is there, it's likely a hardware-specific issue which will be very difficult to remediate. Quote Link to comment
Krzaku Posted January 18, 2020 Author Share Posted January 18, 2020 (edited) Here it is: [REMOVED] Edited January 24, 2020 by Krzaku Quote Link to comment
Krzaku Posted January 19, 2020 Author Share Posted January 19, 2020 I have resolved the issue by skipping the first 2 threads when passing to a VM, so now I am passing the middle 12 threads. I still don't understand how this was causing the issue though. Quote Link to comment
bonienl Posted January 19, 2020 Share Posted January 19, 2020 Unraid itself uses core 0 and its HT, it is better not to isolate them for VM usage. Quote Link to comment
Krzaku Posted January 19, 2020 Author Share Posted January 19, 2020 All I can say is that it was working before, the only variable being the RAM amount. So you're saying that even if I isolate the first 2 cores, unraid will still use them for its own work? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.