NVIDIA GPU Passthrough ESXi 6.7 u3 won't boot >1 cpu


Recommended Posts

I am trying to get unraid (6.9 rc2) running as a vm on 6.7 U3 with a P400 GPU passthrough. When I add "hypervisor.cpuid.v0=FALSE" to the config file it will not boot with more than 1 cpu assigned. It appears like it is tries to load the Intel Micro Code and halts. I have tried disabling the microcode with the disable security mitigations plugin, but I still seeing in the hypervisor logs that it tries to load them. With 1 CPU it tries to as well, but doesn't seem to issue the CPU reset as listed in the logs.

Below is the log from the vm with 2 or more CPUs

2021-01-11T02:51:41.855Z| vcpu-0| I125: APIC THERMLVT write: 0x10000
2021-01-11T02:51:43.182Z| vcpu-0| W115: CPU microcode update available.
2021-01-11T02:51:43.182Z| vcpu-0| W115+ The guest OS tried to update the microcode from patch level 67 (43h) to patch level 68 (44h), but VMware ESX does not allow microcode patches to be applied from within a virtual machine.
2021-01-11T02:51:43.182Z| vcpu-0| W115+ Microcode patches are used to correct CPU errata. You may be able to obtain a BIOS/firmware update which includes this microcode patch from your system vendor, or your host OS may provide a facility for loading microcode patches.CPU reset: soft (mode 0)

Here is the log from 1 CPU

2021-01-11T03:35:11.496Z| vcpu-0| I125: UHCI: HCReset
2021-01-11T03:35:12.754Z| vcpu-0| W115: CPU microcode update available.
2021-01-11T03:35:12.754Z| vcpu-0| W115+ The guest OS tried to update the microcode from patch level 67 (43h) to patch level 68 (44h), but VMware ESX does not allow microcode patches to be applied from within a virtual machine.
2021-01-11T03:35:12.754Z| vcpu-0| W115+ Microcode patches are used to correct CPU errata. You may be able to obtain a BIOS/firmware update which includes this microcode patch from your system vendor, or your host OS may provide a facility for loading microcode patches.SVGA: Unregistering IOSpace at 0x1070
2021-01-11T03:35:13.182Z| vcpu-0| I125: SVGA: Unregistering MemSpace at 0xe8000000(0xe8000000) and 0xfe000000(0xfe000000)

 

Below is a screen shot from unraid where it halts.2021-01-10_20-52-14.thumb.png.b5e0d5c5a9a7fa1471f4b54f07c6f96e.png

 

I was able to pass the GPU through to an Ubuntu 18.04 machine with the same virtual hardware and have it working. When I first tried it I ran into the same problem and found a post recommending to purged the intel microcode, I followed those steps to get it working. 

Any idea of what I can do on unraid and how I can actually delete those files?  

Reading from the later posts on the security mitigation plugin it is unclear if that boot option is still working in the newer kernels (which might be my problem).

 

 

 

 

 

 

Edited by jwiener3
Link to comment

I have pretty much the exact same issue!
If I add hypervisor.cpuid.v0 = FALSE, booting freezes right after loading bzroot.

If I set just one cpu I can boot just fine
 

I'm not sure it's Nvidia specific as I still get the error even without my Nvidia card passed through to the virtual machine.

And it won't boot even without any hardware at all passed through :(


ESXi 6.7 with the latest patches

UnRAID 6.9.0-rc2

Nvidia Quadro P400

 

A plain Ubuntu 20.10 instance with hypervisor.cpuid.v0 = FALSE set works as expected. No problems at all. 

Mine was also working great with linuxserver.io nvidia builds.

So it’s definitely not a hardware or esxi issue. 

 


image.png.4cf066ad4b8625c7e0ee2a90d8688bdc.png
 

Edited by zer0zer0
Link to comment
11 hours ago, jwiener3 said:

Good info, how did you load that version?

Click

 

11 hours ago, jwiener3 said:

It is likely the intel microcode include in version RC1 "intel-microcode: version 20201118"

You actually can extract the microcode from beta35 and inject it to RC2 or whatever version you want... If it's the microcode..

 

13 hours ago, zer0zer0 said:

@jwiener3 - This issue is resolved if you drop back to UnRAID version 6.9.0-beta35 :)

But if 6.9.0 stable releases this isn't resolved I think... :D

Link to comment

Thanks for that info and the links! I am really getting in deep fast here and it is frustrating and fun.

It looks like I will have to see if I can learn how to extract and repack microcode. I did find a thread talking about that, so I will look at that in the next few days.

Link to comment

hmmm, seems it might even be a SuperMicro X10 and/or Xeon E5 v3 specific issue.

No issues on my E3-1265L v3 running on an AsRock Rack E3C224D2I

 

@StevenD can run 6.9rc2 with hypervisor.cpuid.v0 = FALSE on his setup with ESXi 7, SuperMicro X9, and E5-2680v2 cpu's

 

Thanks to @ich777, I can confirm that you can run the latest 6.9rc2 with a modified bzroot that has the 6.9.0-beta35 microcode :)

Edited by zer0zer0
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.