Jump to content
Krzaku

Added more ram to the server, now can't start Win 10 with more than 8 cores

11 posts in this topic Last Reply

Recommended Posts

I had 32GB of ram and 12 cores assigned to my Win 10 VM. I added another 32GB (exact same brand and model), and now that same VM won't boot with any amount of ram with more than 8 cores. If I add more cores the VM won't boot and the first HT core is pinned at 100% while the others are 0%. I set the CPU Isolation correctly. I included the diagnostics, can anyone help?

tower-diagnostics-20200102-2019.zip

Share this post


Link to post

Apologies for the delayed reply on this topic.  Had to find a system that had enough RAM / CPUs to test this against.  Just tested this on one of my test rigs here with an 8700k CPU that has 12 cores and 64GB of total RAM.  I have a Windows 10 VM with a NVIDIA GPU for pass through (along with a dedicated USB controller).  Normally I only had 8 cores assigned, so I bumped it up to 10 of the 12 cores pinned to the VM and it booted just fine.

 

I'm assuming when this happens, VM Manager is reporting the VM as started and probably stays in that state until you do a Force Shutdown.  If that's true, it's even more difficult to diagnose because I don't see any errors in the logs or anything like that.

 

The first troubleshooting step I would take would be to stop assigning your GPU and USB device to the VM to see if it boots without PCI device pass through occurring.  If that works, try adding the GPU back and see if that works (by itself).  If that works, then it might be something with the USB device you're trying to assign to the VM.  If it doesn't work, then there may be something odd with that GPU that doesn't like how virtual memory mapping works beyond the 32GB of RAM you previously had installed.

 

 

Share this post


Link to post

After a few test runs it would seem that not one single device causes this. The VM only ever booted (to the EFI shell) after I removed ALL devices, including USB controller, both NVMe drives, the WiFi card and the GPU (I tried a few combinations).

 

Also, the magic number with which the VM boots seems to be 9 cores not 8. And yes, when it does not boot, there are no errors in the VM logs and it doesn't stop until I force stop it.

Edited by Krzaku

Share this post


Link to post

This is very odd behavior indeed.  Just to ensure I understand correctly, you are stubbing a USB controller, two NVMe drives, a Wifi controller, and a GPU, and you were passing them all through to a VM just fine.  Now if you pass through ANY kind of device to the VM, the VM enters this state:

 

On 1/2/2020 at 1:29 PM, Krzaku said:

the first HT core is pinned at 100% while the others are 0%.

 

And furthermore this behavior is only exhibited when more than 9 vCPUs are assigned to the VM.  So even if you just assign the NVMe SSD for the boot device and no other PCIe devices, the same behavior occurs?

Share this post


Link to post

That is almost all correct, except I'm only stubbing the WiFi card and not any other device, might this be the issue? These are my kernel params: 

pcie_acs_override=downstream,multifunction iommu=pt vfio-pci.ids=8086:a370 isolcpus=0-5,8-13

 

Share this post


Link to post

As a test, I stubbed all of the devices passed to the VM anyway, which did not fix the issue:

pcie_acs_override=downstream,multifunction iommu=pt vfio-pci.ids=8086:a370,8087:0aaa,10de:1e87,10de:10f8,10de:1ad8,10de:1ad9,1033:0194,144d:a804,1987:5012 isolcpus=0-5,8-13

 

Share this post


Link to post

You definitely need to be stubbing any storage, Ethernet, or USB controllers you wish to use in your VM.  Otherwise the host OS will load a proper driver and potentially block you from starting the VM.

 

If with all of those devices stubbed, you're still having a problem, I'm running out of ideas.  After starting your VM with the devices assigned, go ahead and recapture diagnostics and upload here.  We need to see something in the logs to point to the source of the issue.  If nothing is there, it's likely a hardware-specific issue which will be very difficult to remediate.

Share this post


Link to post

I have resolved the issue by skipping the first 2 threads when passing to a VM, so now I am passing the middle 12 threads. I still don't understand how this was causing the issue though.

Share this post


Link to post

All I can say is that it was working before, the only variable being the RAM amount.

 

So you're saying that even if I isolate the first 2 cores, unraid will still use them for its own work?

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.