VM Very low performance compared to bare metal


Recommended Posts

3 minutes ago, SweetPeachez said:

I tried all of your suggestions above, also...I'm on the 3960x.

My CPUZ bench in the VM 

 Single thread is 508.8

 Multi Thread is 16174.8

 

My CPUZ bench on bare metal

 Single thread is 518.8

 Multi thread is 16823.0

 

So this leads me to believe that there may be something wrong with 3dMark benchmarks running in VMs?

Also, are you saying that the CPU pinning as displayed in the unraid pinning menu can be incorrect? attaching a screenshot of my pinning

 

This is core pining for the CPU on that particular VM. You need to go to Settings, CPU Pinning and at the bottom "Core isolation" to isolate the cores from the rest of the system so it cannot use it.

Link to comment
5 minutes ago, PeteUnraid said:

With that latest xml you just posted - is that giving you the 10089 score?

the XML I had just posted before this post was my XML with the changes that Jerky suggested...In this post I'll post the XML that got me the highest score so far....which was my VM minus Jekys suggestions

 

Here is the XML for the "Best" config, this config gives me the highest score in 3dMark time spy

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='2'>
  <name>Windows 10_A</name>
  <uuid>af734937-bee3-267c-5a93-9fa189e66e7d</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>126877696</memory>
  <currentMemory unit='KiB'>126877696</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>46</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='25'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='26'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='27'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='28'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='29'/>
    <vcpupin vcpu='10' cpuset='6'/>
    <vcpupin vcpu='11' cpuset='30'/>
    <vcpupin vcpu='12' cpuset='7'/>
    <vcpupin vcpu='13' cpuset='31'/>
    <vcpupin vcpu='14' cpuset='8'/>
    <vcpupin vcpu='15' cpuset='32'/>
    <vcpupin vcpu='16' cpuset='9'/>
    <vcpupin vcpu='17' cpuset='33'/>
    <vcpupin vcpu='18' cpuset='10'/>
    <vcpupin vcpu='19' cpuset='34'/>
    <vcpupin vcpu='20' cpuset='11'/>
    <vcpupin vcpu='21' cpuset='35'/>
    <vcpupin vcpu='22' cpuset='12'/>
    <vcpupin vcpu='23' cpuset='36'/>
    <vcpupin vcpu='24' cpuset='13'/>
    <vcpupin vcpu='25' cpuset='37'/>
    <vcpupin vcpu='26' cpuset='14'/>
    <vcpupin vcpu='27' cpuset='38'/>
    <vcpupin vcpu='28' cpuset='15'/>
    <vcpupin vcpu='29' cpuset='39'/>
    <vcpupin vcpu='30' cpuset='16'/>
    <vcpupin vcpu='31' cpuset='40'/>
    <vcpupin vcpu='32' cpuset='17'/>
    <vcpupin vcpu='33' cpuset='41'/>
    <vcpupin vcpu='34' cpuset='18'/>
    <vcpupin vcpu='35' cpuset='42'/>
    <vcpupin vcpu='36' cpuset='19'/>
    <vcpupin vcpu='37' cpuset='43'/>
    <vcpupin vcpu='38' cpuset='20'/>
    <vcpupin vcpu='39' cpuset='44'/>
    <vcpupin vcpu='40' cpuset='21'/>
    <vcpupin vcpu='41' cpuset='45'/>
    <vcpupin vcpu='42' cpuset='22'/>
    <vcpupin vcpu='43' cpuset='46'/>
    <vcpupin vcpu='44' cpuset='23'/>
    <vcpupin vcpu='45' cpuset='47'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='46' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Windows 10/vdisk1.img' index='3'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Windows 10/vdisk2.img' index='2'/>
      <backingStore/>
      <target dev='hdd' bus='virtio'/>
      <alias name='virtio-disk3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.171.iso' index='1'/>
      <backingStore/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <alias name='ide0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:26:17:8b'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-2-Windows 10_A/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/user/vBios/myVBios.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x21' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x04d9'/>
        <product id='0x0245'/>
        <address bus='1' device='4'/>
      </source>
      <alias name='hostdev2'/>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0db0'/>
        <product id='0x543d'/>
        <address bus='7' device='2'/>
      </source>
      <alias name='hostdev3'/>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x13fe'/>
        <product id='0x5500'/>
        <address bus='2' device='4'/>
      </source>
      <alias name='hostdev4'/>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1462'/>
        <product id='0x7c60'/>
        <address bus='7' device='3'/>
      </source>
      <alias name='hostdev5'/>
      <address type='usb' bus='0' port='4'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b2a'/>
        <address bus='1' device='6'/>
      </source>
      <alias name='hostdev6'/>
      <address type='usb' bus='0' port='5'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b2e'/>
        <address bus='1' device='5'/>
      </source>
      <alias name='hostdev7'/>
      <address type='usb' bus='0' port='6'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x264a'/>
        <product id='0x1fa5'/>
        <address bus='9' device='8'/>
      </source>
      <alias name='hostdev8'/>
      <address type='usb' bus='0' port='7'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x264a'/>
        <product id='0x1fa6'/>
        <address bus='9' device='10'/>
      </source>
      <alias name='hostdev9'/>
      <address type='usb' bus='0' port='8'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

Link to comment

Interesting. his suggestions were quite good so im surprised no improvement from that.

 

The current best config looks like your original, except you have changed machine type and are using 46 cores now... Did you try with 44 cores (unpin cores 0,1,2,3) and see how the performance was? I wold be interested to know if there is actually a noticeable difference between 46 and 44 cores.

It appears currently you are getting about 82% of the performance from the VM as opposed to bare metal - would you agree?

 

If you can confirm exactly where we are I will go away and try to think of what other improvements could be tried and come back to you :)

 

P

Link to comment
1 hour ago, testdasi said:

One thing I have seen is if you don't load the CCX evenly, you will end up losing performance if your software doesn't scale too well. 3DMark Time Spy doesn't really scale that well beyond about 12 cores or so.

 

From my own testing (albeit not with 3DMark Time Spy but with a workload that similarly doesn't scale too well beyond 12-16 cores or so), 7 uneven is about the same as 6 even (i.e. the extra core performance is essentially "lost", so to speak).

It's impossible to spread 22 physical cores evenly + your VM benchmark performance is approximate 2/3 that of bare metal (your CPU has 8 CCX with 3 core each) = it sounds like the uneven load is causing you to "lose" a core performance for each CCX (which is 1/3 of each CCX), which is kinda similar to my testing.

 

You might want to test assigning the odd bank of the 48 logical cores (e.g. cpu 0 + cpu 1 = 1 physical core -> assign cpu 1 to VM and so on -> assign all the odd cpu to your VM = the odd bank) to your VM and see if it helps (i.e. your VM has 24 cores instead of 44).

WoW! I ran 3dMark time spy on the odd bank just now and got 11309 in the VM where as the bare metal ran at 12950

Link to comment
15 minutes ago, PeteUnraid said:

Interesting. his suggestions were quite good so im surprised no improvement from that.

 

The current best config looks like your original, except you have changed machine type and are using 46 cores now... Did you try with 44 cores (unpin cores 0,1,2,3) and see how the performance was? I wold be interested to know if there is actually a noticeable difference between 46 and 44 cores.

It appears currently you are getting about 82% of the performance from the VM as opposed to bare metal - would you agree?

 

If you can confirm exactly where we are I will go away and try to think of what other improvements could be tried and come back to you :)

 

P

Per my post right before this one I ran that same XML but with just the odd bank cores and got a pretty large increase that I think brings me within range of bare metal (even though I am not using half the cores)

 

EDIT: going to fix my threads per core in the XML and rerun

Edited by SweetPeachez
Link to comment
1 minute ago, PeteUnraid said:

Thats crazy. How does assigning less cpu's and only using the odd cores produce a better result? Somoen would have to explain that to me.

Could be that his cores are mismatched. It happened in earlier versions of the 2990wx bios where the cores shown were not the matching physical/hyper threaded. When your making an XML file your basically telling the machine what the host looks like. I remember some people had to manually assign even the cores in the right order. It's just a thought.

  • Like 1
Link to comment

Is this relevant to the issue Jerky?

https://bugs.launchpad.net/qemu/+bug/1856335

 

"AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications."

 

"On a 3-CCX CPU (3960X /w 6 cores and no SMT):"

 

Also found this:

 

38 minutes ago, Jerky_san said:

Could be that his cores are mismatched. It happened in earlier versions of the 2990wx bios where the cores shown were not the matching physical/hyper threaded. When your making an XML file your basically telling the machine what the host looks like. I remember some people had to manually assign even the cores in the right order. It's just a thought.

 

Link to comment
18 minutes ago, PeteUnraid said:

Is this relevant to the issue Jerky?

https://bugs.launchpad.net/qemu/+bug/1856335

 

"AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications."

 

"On a 3-CCX CPU (3960X /w 6 cores and no SMT):"

 

Also found this:

 

 

Does appear that because you have the newer arch it isn't doing the caching properly. On the 2990wx I have(and my old 1700) it was basically a requirement. What you could "technically" do is try the old way we fixed it before they fixed it on QEMU. Adjust cores below to match whatever your doing. It will be half of whatever you have assigned.. This will pass it as an "EPYC" processor. See if CPUZ sees your cache the same as your baremetal with this. If not you may have to wait till they resolve the issue.

 

<cpu mode='custom' match='exact' check='full'>
	<model fallback='forbid'>EPYC</model>
	<topology sockets='1' cores='22' threads='2'/>
	<feature policy='require' name='topoext'/>
	<feature policy='disable' name='monitor'/>
	<feature policy='require' name='hypervisor'/>
	<feature policy='disable' name='svm'/>
	<feature policy='disable' name='x2apic'/> 
</cpu>

Cache matching with settings. If your cache doesn't match you'll get hitching and stuff due to cache hit misses. It is ESPECIALLY important for L1 and L2 but also important for L3 given how much cache the 3960 has.

2990wx Baremetal CPUZ

1034756913_CPUZcache.PNG.a5b490e36158908a0e283ed9d998be53.PNG

 

2990wx VM CPUZ

image.png.9304533b57d47574f9fb5a0f49d98696.png

 

Edited by Jerky_san
Link to comment

Ok I am wondring as a workaround (I know this isnt normally recommended) he can test by trying to pin every single core (ie pin all 48 cpus) the the VM and not leave any free for unraid at all (and remove any isolation). It might make unraid a bit unhappy under heavy load but at the same time at least give him good scores until the bug is patched and most of the time I would imagine (even in gaming) he would not be maxing every single one of the cores all the time.

Edited by PeteUnraid
Link to comment

It's "she/her" by the way :) ....anyways yeah I'll check out Jerkys latest suggestion in a moment...

Most of the time I wont need all 48 cpus, but I do a lot of very heavy data processing (a lot of it in GPU) but I do at times still find I need as many CPU cores I can get as well.

For typical daily use the machine will be serving 2-3 Gaming VMs though, so still in my best interest to get everything working correctly. Anyways, I'll get to trying stuff after I grab some lunch.

Link to comment
55 minutes ago, SweetPeachez said:

It's "she/her" by the way :) ....anyways yeah I'll check out Jerkys latest suggestion in a moment...

Most of the time I wont need all 48 cpus, but I do a lot of very heavy data processing (a lot of it in GPU) but I do at times still find I need as many CPU cores I can get as well.

For typical daily use the machine will be serving 2-3 Gaming VMs though, so still in my best interest to get everything working correctly. Anyways, I'll get to trying stuff after I grab some lunch.

My bad, apologies. I should have noticed. Try with all 48 cores pinned as well.

Edited by PeteUnraid
Link to comment
20 minutes ago, PeteUnraid said:

My bad, apologies. I should have noticed. Try with all 48 cores pinned as well.

No biggie! Pinning all cores and turning isolation off all cores got me 10646/12950

 

1 hour ago, Jerky_san said:

Does appear that because you have the newer arch it isn't doing the caching properly. On the 2990wx I have(and my old 1700) it was basically a requirement. What you could "technically" do is try the old way we fixed it before they fixed it on QEMU. Adjust cores below to match whatever your doing. It will be half of whatever you have assigned.. This will pass it as an "EPYC" processor. See if CPUZ sees your cache the same as your baremetal with this. If not you may have to wait till they resolve the issue.

 


<cpu mode='custom' match='exact' check='full'>
	<model fallback='forbid'>EPYC</model>
	<topology sockets='1' cores='22' threads='2'/>
	<feature policy='require' name='topoext'/>
	<feature policy='disable' name='monitor'/>
	<feature policy='require' name='hypervisor'/>
	<feature policy='disable' name='svm'/>
	<feature policy='disable' name='x2apic'/> 
</cpu>

Cache matching with settings. If your cache doesn't match you'll get hitching and stuff due to cache hit misses. It is ESPECIALLY important for L1 and L2 but also important for L3 given how much cache the 3960 has.

2990wx Baremetal CPUZ

1034756913_CPUZcache.PNG.a5b490e36158908a0e283ed9d998be53.PNG

 

2990wx VM CPUZ

image.png.9304533b57d47574f9fb5a0f49d98696.png

 

my Cache layout doesnt match bare metal and the settings you provided put windows into an unusable state.

 

So it seems like I will be getting best performance with just banking with even cores on one VM and odd cores on the other? At least until another update comes out?

Link to comment
14 minutes ago, SweetPeachez said:

No biggie! Pinning all cores and turning isolation off all cores got me 10646/12950

 

my Cache layout doesnt match bare metal and the settings you provided put windows into an unusable state.

 

So it seems like I will be getting best performance with just banking with even cores on one VM and odd cores on the other? At least until another update comes out?

Hmm that is very interesting.. Your the first person to have a 3960 here I believe. At least that I've seen talk about it. As to your question I would say yes but I got a feeling your going to get slapped with performance hitching. My test game for this is "Dying Light" as it has a very good multicore implementation. The Good news though is you won't have to wait long as AMD is basically the performance crown in almost all types I figure they will start really ramping up fixes/patches in QEMU/VirtIO. I'll say though it took nearly a year for the 2990wx to start "running well" compared to baremetal sadly. It would seem they are rolling fixes to GA faster via QEMU/VirtIO so hopefully they will be picked up by limetech as quick as well.(generally they are as limetech seems to always deliver 😃 )

 

Any ways if you have dying light try playing it and see if you notice any hitching or frame drops. Don't play with gsync on btw. At least last time that was discussed it caused a lot of fun issues itself. You might also want to check what your DCP latency is along with what kind of memory performance your getting inside your VM with AIDA. If your latency is very high on your L1-L3 you maybe in for a ride. I get very close to bare metal on all performance metrics that I test but it was a long time of tuning. You can also check out reddit's virtio subreddit.(guess you have been but sometimes the threadripper convos get buried sadly) They don't have a lot of threadripper convos but there have been some from time to time that are very helpful. 

Baremetal 2990wx AIDA64

2800.PNG.296476e7cd8e716c8836b772a43bd122.PNG

VM 2990wx AIDA64

292567945_virtiochanges-MEMORY.PNG.12f5ac8df49dd16917f5ac053ac9c95e.PNG

Edited by Jerky_san
  • Like 1
Link to comment
9 minutes ago, SweetPeachez said:

No biggie! Pinning all cores and turning isolation off all cores got me 10646/12950

 

my Cache layout doesnt match bare metal and the settings you provided put windows into an unusable state.

 

So it seems like I will be getting best performance with just banking with even cores on one VM and odd cores on the other? At least until another update comes out?

Oof I guess so, hopefully an update and fix is not far off :(

Link to comment
15 hours ago, PeteUnraid said:

Thats crazy. How does assigning less cpu's and only using the odd cores produce a better result? Somoen would have to explain that to me.

It's not crazy. You are assuming perfect scaling (i.e. more core = better performance) but in real life, it's rarely ever like that.

44 (logical) cores sounds like a lot but they are hyperthreaded. An extra hyperthreaded core adds only about 30% performance (at best!) so 44 logical cores is roughly equivalent to 33 non-HT cores at best and 22 non-HT cores at worst.

So now you only have to really "explain" the diff between my 24 non-hyperthreaded "odd-bank" config vs 33 non-hyperthreaded (at best) config.

 

Firstly, there's diminishing return (i.e. performance) as you add cores and it can get to a point in which more cores makes things slower e.g. the scheduler of tasks can't keep up with the number of tasks being run.

And then there's the idiosyncratic design of AMD Ryzen family of CPU (with CCX, CCD and how L3 cache is shared) e.g. how I have found uneven load on the CCX causes performance to drop.

And then there's qemu bug.

 

That's why Steve and Linus and others still run benchmarks. Otherwise, you would just say "hey more core, more performance, done deal, we are all fired".

 

16 hours ago, SweetPeachez said:

WoW! I ran 3dMark time spy on the odd bank just now and got 11309 in the VM where as the bare metal ran at 12950

I would suggest you find a benchmark that matches your intended workload or even better, bench using your actual workload.

Synthetic stuff sometimes doesn't quite match real life.

  • Like 1
Link to comment

@SweetPeachez Interesting and good info from testdasi. Maybe as a last test if you have time you could set the

 

 

<vcpu placement='static'>24</vcpu> <cputune>

<vcpupin vcpu='0' cpuset='1-47'/>

<vcpupin vcpu='1' cpuset='1-47'/>

...

<vcpupin vcpu='24' cpuset='1-47'/>

 

and

 

topology sockets='1' cores='24' threads='1'/>

 

As I would be interested to know how that performed with all the other tweaks you have done, if you have the time. You would have to ensure you didnt run any other VMs while doing the benchmark though. The above does exclude core 0 (for unraid to use).

 

P

Link to comment
On 1/21/2020 at 9:30 AM, PeteUnraid said:

@SweetPeachez Interesting and good info from testdasi. Maybe as a last test if you have time you could set the

 

 

<vcpu placement='static'>24</vcpu> <cputune>

<vcpupin vcpu='0' cpuset='1-47'/>

<vcpupin vcpu='1' cpuset='1-47'/>

...

<vcpupin vcpu='24' cpuset='1-47'/>

 

and

 

topology sockets='1' cores='24' threads='1'/>

 

As I would be interested to know how that performed with all the other tweaks you have done, if you have the time. You would have to ensure you didnt run any other VMs while doing the benchmark though. The above does exclude core 0 (for unraid to use).

 

P

I'll be able to test this at some point this weekend...if y'all are wanting a guinea pig to test stuff on the 3960x please let me know, I'm more than willing to try whatever or test code and such.

Link to comment
  • 3 months later...
On 1/23/2020 at 6:26 AM, SweetPeachez said:

I'll be able to test this at some point this weekend...if y'all are wanting a guinea pig to test stuff on the 3960x please let me know, I'm more than willing to try whatever or test code and such.

 

Any further testing @SweetPeachez? I also have a 3960x and would like to see what if any other changes you've implemented and see how things have been performing with your rig the past few months.

 

Thanks!

 

Link to comment
1 hour ago, Skrumpy said:

 

Any further testing @SweetPeachez? I also have a 3960x and would like to see what if any other changes you've implemented and see how things have been performing with your rig the past few months.

 

Thanks!

 

Hello, unfortunately I ended up giving up due to not getting close to bare metal performance, this was mainly due cache not being mapped correctly for the 3960x under the Linux kernel that was being used at the time, funny enough is that I'm checking this week to see if this is still an issue and will post if there's any progress.

Link to comment
37 minutes ago, SweetPeachez said:

Hello, unfortunately I ended up giving up due to not getting close to bare metal performance, this was mainly due cache not being mapped correctly for the 3960x under the Linux kernel that was being used at the time, funny enough is that I'm checking this week to see if this is still an issue and will post if there's any progress.

 

I assume the performance you were looking at was just with 3D Mark and not real world? Do you know if the caching issue was bug reported? Lastly, didn't see that you pinned the emulator off the VM (I know you were passing the majority of your cores), but FWIW my network latency at least dropped by 50% with it on.

  <emulatorpin cpuset='0,24'/>

 

Here are my CPU-z's BM | VM (6 cores). Overclocking (as you can see to 4125 MHz)

 

BM.png.71a2f22eef3494a4ae82f9d7bec5bd22.png  VM.png.dfa7824900629e7c6415a0a8cf23405d.png

 

VM 3D Mark for me is ~75% of BM, so will still need some work.

 

I'm not sure what unRAID version improvements have come out since you last checked, but sounds like 6.9 may have the new 5.6 kernel which apparently has a massive number of changes.

 

I'm going to see if I can coax SI1 to help me out directly, but let me know if you want to tag team some of these performance tweaks to see if we can get something a bit more optimal.

 

Link to comment
  • 2 months later...

Just to throw it out there. Before my server died to lightening I had switched to vmware. I was running unraid as a VM with the HBA passed through and did my primary desktop as a VM on the same esxi host. The performance difference was insane.. @SweetPeachez  @Skrumpy .. You might want to give it a stab. It was an amazing experience. Best of both worlds in that regard. Unraid with all it's storage abilities/dockers. VMWare for my gaming machine/all other vm stuff. We are talking about on a 2990wx btw which was gimped pretty hard to begin with. No gaming lag.. no needing to disable gsync or anything. Ran smooth as butter. only problem I had was it wouldn't tell me CPU/MB temps because vmware hates non imm systems..

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.