VMWare 7.x will not start any VM's under Unraid 6.11.0


mavrrick

Recommended Posts

So i had a working VMware environment with VMWare ESXI up until a few months ago. For various reasons it was abandoned for a time.  Now i can't seem to get VMWare ESXi to start any VM's at all. It will install and start up fine, but isn't really usable. I keep getting a error message when i try to start any VM's that indicates  VMware is having a issue with the system it is running on running another VM Software. I have tried rebuilding the several times and following whatever tutorials i can find online to install VMWare 7.x. I have tried 7.01 and 7.03g. I have tried using SEABIOS which seems to now cause a error during startup and OVMF with and without TPM. 

 

The error is "vcpu-0:Invalid VMCB.'

 

Has anyone else had issues with VMWare working the latest version of Unraid.

  • Upvote 1
Link to comment
  • 4 weeks later...

I have tried a few more things since I last posted. 

 

1. I tried loading vmware 8.x ESXI since it is available now. It is giving two error messages:  1. module "MonitorMode" failed  to start. 2. AMD-V is supported by the platform, but is implemented in a way that is incompatible. 

 

Where there any changes with QEMU that could effect how the CPU's are passed to underlying environments. This seems to me like a feature isn't being passed and causing the Nested environment to not work.

 

I have also tried a few of the options i have seen to enable nested Virtualization. 

Link to comment
  • 4 weeks later...

Have you managed to get any further with this?  I am in the same boat, whereby I am unable to start any vms inside of ESXi.  When checking the vmkernel.log I can see entries containing the same error "vcpu-0:Invalid VMCB"

 

I have following set up for ESXi guest:

 

AMD Ryzen 9 Processor passthrough

Q35-7.1

OVMF

16GB RAM

SATA Disks

 

Running unraid 6.11.3, the Nested flag is set to 1 via user script (spaceinvader) - I have also checked /sys/module/kvm_amd/parameters/nested and the flag is set.

 

I will continue to have a look myself and update here with any findings.

 

EDIT: esxcfg-info | grep "HV Support" reports back "3" (3VT-x / AMD-V is enabled in the BIOS and can be used. ). So AMD-V is getting passed through to the ESXi guest VM.

 

 

Edited by afc_rich
update
Link to comment
  • 3 weeks later...

I have not made any progress. I have attempted again today to try to get this working manipulating a few things. Now I am just going to focus on ESXI 8.0. It generates that strange message about:  AMD-V is supported by the platform, but is implemented in a way that is incompatible. What is strange is that i can't get ESXi 7.x working either which was working a while back. 

 

I have confirmed the platform works as expected if using ESXI on Bare Metal instead of Unraid. For testing i switched my box to boot into ESXi first and then run Unraid underneath ESXi. This kind of worked for testing, but complications with PCIe Passtrough requirements caused me to move away from running ESXi as the main hypervisor.

 

My suspicion is that some of the updates that were applied to KVM in the last few releases may have borked something with VMWare compatibility as a nested hypervisor. VMWARE ESXi 8 flat out says it can't access AMD-V features when being installed. Clearly unraid does based on what it is reporting. 

 

It is almost acting like the virtual bios the VM is using has it disabled. Not sure if that is possible but that is what it kind of looks like.

 

When i ran  esxcfg-info | grep "HV Support" i get a line that says 

|----HV Support .....................................................1

 

From what I can find that means it is supported but disabled in bios. 

 

For reference my machine hardware is below.

 

Ryzen 5950X

Asus ROG Strix MB with latest bios

128GB Ram

 

I have tried both SEABIOS and OVMF bios in the settings. I have also tried adding SVM as required for the profile as well.

Edited by mavrrick
Link to comment

So I got out my old Unraid server hardware and got it in a working state, then got Unraid running on it. Tested it with trying to run VMware ESXi7.0 on it as this older hardware doesn't support ESXi8.0 and it seems to have worked. It doesn't say much though as that older hardware is a Sandybridge Intel Chip that is rather dated at this point. 

 

I think this is likely something between KVM, and the AMD Ryzen CPU and what features are being allowed to the nested hypervisor. I tried forcing the CPU Profile to EPYC and EPYC-IBM and both of these profiles were found to automatically disable the Monitor CPU Feature for the lower VM. 

Link to comment

You have the same hardware as I do.  Ryzen 5950x with ROG Strix board.

 

I performed same tests as you did above:

 

ESXi 6.7 - |----HV Support .....................................................3

ESXi 7.0 - |----HV Support .....................................................3

ESXi 8.0 - |----HV Support .....................................................1

 

Interesting that the result is different for ESXi 8.0 🤔

 

XML output for CPU is:

 

  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='2' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>

 

I have attempted to change the CPU mode and also the features to no avail thus far.

 

Next step is to configure a linux vm to do further tests on the CPU flags.  Fingers crossed I can find one that is missing and narrow the search down.

 

 

 

Link to comment

Further update....

 

I have built a CentOS 7.8 VM and have looked at the contents of /proc/cpuinfo:

 

The output is below:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 33
model name	: AMD Ryzen 9 5950X 16-Core Processor
stepping	: 0
microcode	: 0xa201016
cpu MHz		: 3393.622
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm art rep_good nopl extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt lbrv nrip_save tsc_scale vmcb_clean pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq spec_ctrl intel_stibp arch_capabilities
bogomips	: 6787.24
TLB size	: 2560 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management:

 

As you can see, the SVM flag (along with many others) is being passed through to the VM.  I'm going to dive deeper to see if there are any other flags that are potentially missing causing the errors in ESXi.

Link to comment
Further update....
 
I have built a CentOS 7.8 VM and have looked at the contents of /proc/cpuinfo:
 
The output is below:
processor	: 0vendor_id	: AuthenticAMDcpu family	: 25model		: 33model name	: AMD Ryzen 9 5950X 16-Core Processorstepping	: 0microcode	: 0xa201016cpu MHz		: 3393.622cache size	: 512 KBphysical id	: 0siblings	: 2core id		: 0cpu cores	: 1apicid		: 0initial apicid	: 0fpu		: yesfpu_exception	: yescpuid level	: 16wp		: yesflags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm art rep_good nopl extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt lbrv nrip_save tsc_scale vmcb_clean pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq spec_ctrl intel_stibp arch_capabilitiesbogomips	: 6787.24TLB size	: 2560 4K pagesclflush size	: 64cache_alignment	: 64address sizes	: 48 bits physical, 48 bits virtualpower management:

 
As you can see, the SVM flag (along with many others) is being passed through to the VM.  I'm going to dive deeper to see if there are any other flags that are potentially missing causing the errors in ESXi.






@afc_rich what does your xml for the VM look like? Anything special? I added require SVM to my windows 11 VM and could not launch a hyper-v manager vm and coreinfo did not detect svm

Options from CPU below, it has been confirmed by others that they have been able to get nested virtualization to work on windows 11











Sent from my SM-G996U using Tapatalk












Link to comment

More info on CentOS....

 

[root@centos /]# sudo virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller mount-point                  : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpu' controller mount-point                     : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuacct' controller mount-point                 : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'cpuset' controller mount-point                  : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'devices' controller mount-point                 : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for cgroup 'blkio' controller mount-point                   : PASS
  QEMU: Checking for device assignment IOMMU support                         : WARN (No ACPI IVRS table found, IOMMU either disabled in BIOS or not supported by this hardware platform)
   LXC: Checking for Linux >= 2.6.26                                         : PASS
   LXC: Checking for namespace ipc                                           : PASS
   LXC: Checking for namespace mnt                                           : PASS
   LXC: Checking for namespace pid                                           : PASS
   LXC: Checking for namespace uts                                           : PASS
   LXC: Checking for namespace net                                           : PASS
   LXC: Checking for namespace user                                          : PASS
   LXC: Checking for cgroup 'memory' controller support                      : PASS
   LXC: Checking for cgroup 'memory' controller mount-point                  : PASS
   LXC: Checking for cgroup 'cpu' controller support                         : PASS
   LXC: Checking for cgroup 'cpu' controller mount-point                     : PASS
   LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
   LXC: Checking for cgroup 'cpuacct' controller mount-point                 : PASS
   LXC: Checking for cgroup 'cpuset' controller support                      : PASS
   LXC: Checking for cgroup 'cpuset' controller mount-point                  : PASS
   LXC: Checking for cgroup 'devices' controller support                     : PASS
   LXC: Checking for cgroup 'devices' controller mount-point                 : PASS
   LXC: Checking for cgroup 'blkio' controller support                       : PASS
   LXC: Checking for cgroup 'blkio' controller mount-point                   : PASS
   LXC: Checking if device /sys/fs/fuse/connections exists                   : FAIL (Load the 'fuse' module to enable /proc/ overrides)

 

 

Could it possibly be IOMMU detection causing the issue?

 

I haven't attempted to pass through any devices.

 

 

Edited by afc_rich
typo
Link to comment

@afc_rich What is the bios of your MB. I wonder if something changed in the Ages AM4 part of the bios. I am on the latest bios for my main board which is 4408

 

Just to make sure it is the same board it is a "Asus Rog Strix X570-E Gaming"  with bios 4408 dated on 10/27/22 but I also tried a few back. 

 

Your comment about IMMOU is interesting. I do use a Window 10 VM that is attached to a few devices that are passed through.

 

@ryanm91

In my research trying to get this working earlier i did see allot of references to VMWare Workstation having problems when HyperV was turned on. that may be related to what you are experiencing. I think you will need HyperV turned off if you are nesting VMware from what i found.

 

Edited by mavrrick
Link to comment

Yea.. I just upgraded from Asus 4403 to 4408 a few days ago hoping it may help. I was running a earlier version as well. 

 

I just checked with running virt-host-validate on my older test system and it seems that it gets a similar message. It is different since it is Intel, but this may indicate the concern about IOMMU and ACPI may not be the cause.

 

  • Like 1
Link to comment
  • 2 weeks later...

The same issue here, I have one VM with Vmware which use one time of 3 mounts, and last time after may be 6.11.X update face same error.
Nothing is changed to the VM or config, he is stopped all the time its only powered on when its need. I try with some xml custom config with no success. It's important to note, I do BIOS update and Unraid update, before that all work fine.

Link to comment
  • 5 weeks later...

So I also had this problem, for the time I've reverted all the way back to UnRAID 6.9.2 which exhibits none of these issues.

 

I have both ESXi running as a Guest on UnRAID and a Windows 10 VM that runs VMware Workstation (where my vCenter is installed). 

 

I went through the trouble spinning up a Windows 11 VM and testing compatibility in there as well.

 

The primary behavior that was noticed is that this error message when running ESXi 7 and VMware Workstation 16.5 is along the lines of "vcpu0:invalid VMCB".

I tested VMware Workstation 17 as well and "AMD-V is supported by the platform, but is implemented in a way that is incompatible. " 

 

After some searching it turns out that pre-2011 AMD's version of AMD-V botched the VMCB flags and didn't include the proper virtualization parameters. My best guess at the moment is that the QEMU version in UnRAID 6.11.x is for some reason implementing an extremely outdated version of AMD-V that is getting passed through...when they were doing it properly before. No amount of XML flags seems to fix the issue.

 

Can anyone chime in on QEMU regression changes?

  • Like 2
  • Thanks 2
  • Upvote 1
Link to comment
  • 2 weeks later...
  • 4 months later...
  • 1 month later...
  • 3 months later...
  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.