QEMU vs. VMWARE Performance


dqmhug

Recommended Posts

I am new to Linux and VM's so I'm trying to figure out why there's so much difference between my QEMU guest performance and VMware guest performance.

I have a NUC with i7-10510U and iGPU 620.

I want to use QEMU instead of VMware since it's corporately owned closed source software, but I'm seeing big differences in terms of performance in some specific benchmarks.

My Linux host OS has kernel 5.10.

 

I tried different configurations to increase the performance of QEMU guests but only had limited success. I still couldn't figure out how to do gvt-g/gvt-d passthrough, but the results are showing some troublesome situation for QEMU.

 

QEMU results:

https://ibb.co/album/sj08Kn

VMware results:

https://ibb.co/album/0mvX6M

 

VMware wipes the floor with QEMU on almost everything except Latencymon's stuff.

Latencymon values were also high for QEMU until I started adding this to the XML under <features> section:

<ioapic driver='kvm'/>

 

Some of these grub parameters also helped: (though not greatly)

GRUB_CMDLINE_LINUX="hugepagesz=1G hugepagesz=1G hugepages=5 intel_iommu=on iommu=pt i915.enable_gvt=1 kvm.ignore_msrs=1 report_ignored_msrs=0 intel_iommu=igfx_off i915.enable_fbc=0 i915.enable_guc=0 vfio-iommu-type1 allow_unsafe_interrupts=Y"

 

I am using 1Gib Hugepages with these flags.

  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <discard/>
  </memoryBacking>

 

I don't think I've seen a noteworthy performance gain by hugepages.

I also tried with or without CPU pinning, leaving first physical CPU (0,4) out of VM, making it the emulatorpin and iothreadpin. I think it just felt and performed a little better.

<vcpu placement='static'>6</vcpu>
 <iothreads>1</iothreads>
 <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='2'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='7'/>
    <emulatorpin cpuset='0,4'/>
    <iothreadpin iothread='1' cpuset='0,4'/>
 </cputune>

 

Even though I'm using CPU passthrough, I am suprised how AIDA64 results show an incredible difference between QEMU and VMware results on memory and cache (L1-L2-L3) results. Please take a good look at AIDA64 results.

 

Any ideas why?

 

Note: I don't need a very powerful VM, this is a NUC after all. I just need it to be snappy.

 

Edit: Using "copy cpu"(default setting) in virt-manager performs better than choosing "host-passthrough" in AIDA64 test. However it might just be dependant on how AIDA64 adjusts the test rather than a real performance increase.

Edited by dqmhug
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.