kakashisensei Posted July 30, 2020 Share Posted July 30, 2020 I am trying to get more cpu performance out of my win10 VM. Have noticed that in cpu intensive games, performance is quite lackluster, albiet the system is quite old. CPU is i7 sandybridge mobile 4core/8thread 3.2ghz all core turbo, 3.5ghz single core. I have passed 12GB of ram, dual channel ddr3 1600mhz. GPU is 980ti 6GB w/ nvidia 446 driver. On baremetal and with spectre mitigations off, I get ~330 cpu-z single thread benchmark score. Passmark v9 cpu mark total score is ~7600. This is the gta5 benchmark result at 1600x900 low settings: Frames Per Second (Higher is better) Min, Max, Avg Pass 0, 62.167404, 119.561455, 102.000427 Pass 1, 94.056564, 165.343918, 139.509125 Pass 2, 77.531998, 155.506470, 125.293236 Pass 3, 89.976601, 162.171799, 136.439087 Pass 4, 48.572926, 200.867737, 125.084503 Time in milliseconds(ms). (Lower is better). Min, Max, Avg Pass 0, 8.363899, 16.085600, 9.803881 Pass 1, 6.048000, 10.631900, 7.167990 Pass 2, 6.430601, 12.897901, 7.981277 Pass 3, 6.166300, 11.114000, 7.329278 Pass 4, 4.978400, 20.587601, 7.994596 On VM I am using Q35-v4.2 OVMF, cpu host/cache passthrough, hyper-v = yes, and spectre mitigations off on both host and VM. I get ~260-270 cpu-z single thread. Interestingly, the passmark cpu mark score only drops to ~7300. This is the gta5 benchmarks at same settings: Frames Per Second (Higher is better) Min, Max, Avg Pass 0, 16.969919, 86.502945, 72.574181 Pass 1, 46.607010, 125.070351, 101.227242 Pass 2, 47.106037, 136.561646, 94.093735 Pass 3, 64.095169, 130.890060, 99.548531 Pass 4, 35.704082, 161.464798, 88.457596 Time in milliseconds(ms). (Lower is better). Min, Max, Avg Pass 0, 11.560300, 58.927799, 13.779005 Pass 1, 7.995500, 21.455999, 9.878764 Pass 2, 7.322701, 21.228701, 10.627700 Pass 3, 7.639999, 15.601800, 10.045352 Pass 4, 6.193300, 28.008001, 11.304852 On the VM, I can only allocate 3 cores and their HT pairs. I have noticed passing all cores to VM gives quite bad performance. I maintain core 0 for the host and the HT thread for the vm emulator. This is my cpu assignment that I found has given the best performance: <vcpu placement='static'>6</vcpu> <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='5'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='6'/> <vcpupin vcpu='4' cpuset='3'/> <vcpupin vcpu='5' cpuset='7'/> <emulatorpin cpuset='4'/> </cputune> Since it is a headless server, I use nvidia gamestream to remote access. This further kills performance. I see the cpu-z single thread drop to ~230-240 with streaming the desktop. The above gta5 results were without any streaming. Since online mode is very unoptimized in this game, it can also be another 20-50% loss in performance. I see drops to 30fps in game quite often. I don't expect the performance loss to be entirely attributed to the one less core, especially with the huge drop off in cpu-z single thread results. Have tried all the following but nothing significantly bridges the gap between baremetal and VM cpu performance. 1. Changed to cpu model "Sandybridge" instead of cpu host passthrough. Resulted in significantly lower performance. 2. Passed through 2nd NIC instead of using virtual NIC. Resulted in slightly more performance. 3. Checked cpu turbo speeds on host. It does hit 3.2ghz all core in game on VM. 4. Isolated cpu cores used by VM, no noticeable improvement. 5. Changed cpu pinning and emulator pinning, but above config gives the best performance. 6. Updated kvm and virtio drivers. 7. Changed to i440fx. Resulted in slightly less performance. I am out of ideas to try. Anyone know what else I could try or have experience in this? Should this be the expected performance drop off to VM from baremetal, for a sandybridge era cpu? Quote Link to comment
kakashisensei Posted August 2, 2020 Author Share Posted August 2, 2020 (edited) I've found some configurations that half-way bridge the gap to baremetal. The cpu core assignments that give the best performance is somewhat perplexing. Passing these hyper-v features improved single thread performance noticeably. Not sure why. Found a blog that mentioned this hyper-v xml config gave him the best results. Reading the description for each feature, not obvious to me why this gets better performance. Cpu-z single thread score went up by ~20-30. More importantly, the performance loss with streaming is not as bad with these features on. Before, I'd see 30-40 less cpu-z single thread while streaming the desktop. Now, it is only ~15 less. Gta5 benchmark results improved a bit as well with these features on. <vpindex state='on'/> <synic state='on'/> <stimer state='on'/> <frequencies state='on'/> I've found that passing only the primary core threads 0,1,2,3 to the VM give the best overall performance and best single thread performance. I also have emulator pin on HT 4 and iothread on HT 5. Haven't noticed performance improvement with iothread pinning. I get ~320 cpu-z single thread, pretty close to the 330 on bare metal. I get the best gta5 benchmark results with this config. It is about half way in between my baremetal and vm results from earlier. I figure the remaining difference in performance is that baremetal is 2 threads per core, and that this vm config is only 1 thread per core. Frames Per Second (Higher is better) Min, Max, Avg Pass 0, 19.245239, 106.438461, 89.463310 Pass 1, 80.630203, 154.502197, 128.269775 Pass 2, 61.948658, 142.904099, 109.883057 Pass 3, 5.634600, 159.941147, 119.195976 Pass 4, 35.715050, 166.900330, 103.768700 Time in milliseconds(ms). (Lower is better). Min, Max, Avg Pass 0, 9.395100, 51.960903, 11.177767 Pass 1, 6.472400, 12.402301, 7.796069 Pass 2, 6.997700, 16.142399, 9.100584 Pass 3, 6.252300, 177.474899, 8.389544 Pass 4, 5.991600, 27.999401, 9.636817 I was under the assumption that passing the primary core thread + HT thread pair is most optimal, but I am not seeing that. Originally I passed core threads 1,2,3 and their HT pairs 5,6,7 and emulator pin to HT 4. That gave much lower single and multi thread performance. It seems that all four cores are critical to getting best performance. Even though this config is 6 threads compared to 4, it yields worse performance because it is only 3 cores. I have also passed all cpu threads 0,1,2,3,4,5,6,7 to VM and that gives me a cpu-z single thread result of ~300 and the best multi thread result of ~1500. But the performance in gta is not as good as passing only the primary core threads 0-3. If I define emulator pin with this config, I get absolutely atrocious performance, so I didnt define emulator or iothread pin. I don't know why this is. This should give me the closest performance to baremetal, but it doesn't. So TLDR, to recap, I get the closest to baremetal performance in gta5 and best cpu-z single thread in VM with the following: - add the hyper v features mentioned above in the xml - pass only the primary core threads 0,1,2,3 and none of the HT pairs to VM, HT 4 is emulator pin - turn off spectre/meltdown mitigations in both host and vm (if baremetal also had them off) - pass physical NIC has better performance than virtual NIC, takes some load off cpus Edited August 2, 2020 by kakashisensei Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.