Jump to content

Gaming VM on Threadripper


Dom Stone

Recommended Posts

Hey guys, I've been messing around with VMs a lot this past week for gaming, and I've been running into performance issues.

 

The setup:

unRAID 6.6.6, Threadripper 2950X, ASUS Zenith Extreme, 64gb Vengeance Pro

1 VM with 8 threads with ASUS Strix 1070 pass-thru, and 1TB Samsung NVME pass-thru

1 VM with 8 threads with ASUS Strix 1070 pass-thru, and using cache (2x 1TB Samsung SSDs in RAID0)

Only docker running Krusader

 

Running Far Cry 5 benchmark on bare metal, I'm seeing around 72ish FPS without CPU or GPU OC (only using single card)

 

When I boot into unRAID 6.6.6 however, performance drops to around 50 FPS average. I've tried various CPU pinning, dedicating the CPU pinning for the VMs, and still performance issues. Not just in the games, the entire interface appears to be a bit laggy. Long boot times, etc.

 

I've done a bunch of googling and searching on this forum. I've found a few things I've tried, but nothing works. I am running the latest BIOS for my motherboard. I've added the zenstates that 'Fix Common Problems' was telling me to add. 

 

- As of today, I was running the latest stable release of 6.6.6. 

- I then tried the 'next' release of 6.6.6, still the same issue

- I then tried downgrading to 6.5.3, and boom performance jumped up to around 66 to 68 FPS. A lot closer to bare metal. The VM was snappier and the boot time a lot quicker.

- I figured OK maybe something with the latest release, so I grabbed 6.6.3 and tested again. Boom performance right back down and the VM just a bit slow.

 

So does anyone know if I'm missing something or has anyone had this issue, or maybe I'm just insane?

 

Thanks

 

 

Link to comment
4 hours ago, Dom Stone said:

Hey guys, I've been messing around with VMs a lot this past week for gaming, and I've been running into performance issues.

 

The setup:

unRAID 6.6.6, Threadripper 2950X, ASUS Zenith Extreme, 64gb Vengeance Pro

1 VM with 8 threads with ASUS Strix 1070 pass-thru, and 1TB Samsung NVME pass-thru

1 VM with 8 threads with ASUS Strix 1070 pass-thru, and using cache (2x 1TB Samsung SSDs in RAID0)

Only docker running Krusader

 

Running Far Cry 5 benchmark on bare metal, I'm seeing around 72ish FPS without CPU or GPU OC (only using single card)

 

When I boot into unRAID 6.6.6 however, performance drops to around 50 FPS average. I've tried various CPU pinning, dedicating the CPU pinning for the VMs, and still performance issues. Not just in the games, the entire interface appears to be a bit laggy. Long boot times, etc.

 

I've done a bunch of googling and searching on this forum. I've found a few things I've tried, but nothing works. I am running the latest BIOS for my motherboard. I've added the zenstates that 'Fix Common Problems' was telling me to add. 

 

- As of today, I was running the latest stable release of 6.6.6. 

- I then tried the 'next' release of 6.6.6, still the same issue

- I then tried downgrading to 6.5.3, and boom performance jumped up to around 66 to 68 FPS. A lot closer to bare metal. The VM was snappier and the boot time a lot quicker.

- I figured OK maybe something with the latest release, so I grabbed 6.6.3 and tested again. Boom performance right back down and the VM just a bit slow.

 

So does anyone know if I'm missing something or has anyone had this issue, or maybe I'm just insane?

 

Thanks

 

 

<cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>EPYC-IBPB</model>
    <topology sockets='1' cores='8' threads='2'/>
    <feature policy='require' name='topoext'/>
  </cpu>

Try this and make sure your pinned to the correct NUMA in relation to the CPU cores. Remember every time you use the GUI to update you will have to put this back in. Adjust the cores and threads accordingly. If you have 8 cores and 8 SMT cores then its right the way I pasted it. If its 4 cores and 4 SMT change the cores to 4. Anyways tell me how it goes with this. Also I don't use the fix common problems zenstates stuff.

 

If you don't know what the above is. When editing the VM in the top right corner click "Form View" and it will turn into XML. Go down to where it says CPU MODE = 'passthrough' and replace the whole XML of the CPU part with the above. (I hope you know how XML works) <- not meant to offend. I simply don't know your knowledge level.

 

Also

 

 <emulatorpin cpuset='1-7'/>

This goes above </cputune> replace the #'s with the correct core #'s. You don't want the emulator leaving the cores that have access to the ram your using.

Right below </cputune> add this

  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>

The "nodeset" is the numa node your CPU cores are coming from. 'strict' means you best have enough ram free to allocate to the VM because if doesn't QEMU/Unraid will start killing processes to find said ram. The reason you need this is because you need to tell the machine to ONLY use ram the cores have a memory controller to. If you don't have much ram set it to 'interleave' but get ready for a performance hit.

 

Lastly in the bios under how it splits the ram. I have mine set to "channel" which decreases certain things like RAM read/write speeds but helps in other ways such as memory latency.

 

Also make sure your running Q35-3.0 or the windows equivalent 3.0

Link to comment

Okay so I tried your suggestions with no luck so far. Something I'm confused about, it should be 2 threads right? When I edit the VM config it says 1 thread with 4 cores. But using paired CPUs, shouldn't that be 2 threads? I confirmed SMT is on in the BIOS. I booted to bare metal and set the OC settings via Ryzen Master instead of the BIOS, just to make sure I didn't miss anything. I set it to creator mode and tested Far Cry 5 again, 79 FPS after a few runs was the average. 

 

When I boot into unRAID 6.6.6 and use the settings suggested, I'm seeing about 44. I'm really confused what I'm missing.


 

 <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <emulatorpin cpuset='0,16'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>

  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>EPYC-IBPB</model>
    <topology sockets='1' cores='4' threads='1'/>
    <feature policy='require' name='topoext'/>
  </cpu>

Any other suggestions would be much appreciated.

 

Thanks

Link to comment

2 cores, 4 CPUs. I had it on 8 before, it was still 1 thread.

 

So if I change it to this:

  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='21'/>
    <emulatorpin cpuset='0,16'/>

It still shows as only 1 thread, unless I'm misunderstanding this isn't 1 thread?

Link to comment

So couple things.. Since your only passing two real cores I would thing the 44 fps is actually probably pretty close.. Even though games tend to be "single" threaded many spin off multiple threads to do things like AI computation and such so only having to "real" cores doing work is kind of crazy. Anyways I'd at least assign 4 if not 7 physical cores to your machine. Make sure you always leave 0(and whatever its partner thread is) because 0 will always be used no matter what you do and there isn't a thing you can do to stop unraid from doing it. <emulatorpin cpuset='0,16'/> <- this is i wouldn't do because 0 & 16 can be used by unraid at ANY time. I use cores that my VM could potentially use as well but I don't seem to have much issue. 

 

So lets get down to business. We need to get something more tangible than FPS as FPS is good and all but we need to see what's causing the issue. So download CPUZ and do a single thread bench and compare to what the single core % is to say a 2700x or a 1950x. I use a 2990wx with a few browser windows open and other things on my system running such as plex(so I can't get near the core boost) I am 103% single threaded of a 1950x or 85% of a 2700x single threaded.

 

Please post what you came up with with say verses a 1950x or a 2700x in your case as its a zen+. 

 

Next we need to see what your memory latency looks like. If you have AIDA64 run its memory benchmark and post it here. If you don't run the free version and lets hope we can see some useful parts as it blanks out random parts of the test.

Link to comment
2 minutes ago, Dom Stone said:

2 cores, 4 CPUs. I had it on 8 before, it was still 1 thread.

 

So if I change it to this:


  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='21'/>
    <emulatorpin cpuset='0,16'/>

It still shows as only 1 thread, unless I'm misunderstanding this isn't 1 thread?

This would be 4 cores 2 thread. What your specifying is 4 physical cores and you "two threads" per core. Which is what SMT is. Basically SMT just makes it were you can load a core down more and there performance benefits. Thats why it would be 4 cores 2 threads with the XML posted.

Link to comment
1 minute ago, Jerky_san said:

This would be 4 cores 2 thread. What your specifying is 4 physical cores and you "two threads" per core. Which is what SMT is. Basically SMT just makes it were you can load a core down more and there performance benefits. Thats why it would be 4 cores 2 threads with the XML posted.

Right this was my understanding, but my configuration, if I switch it to 2 threads it throws me an error saying that isn't the correct configuration. So that's my main confusion. How is it 8 cores and 1 thread when I'm selecting pairs, it should be 4 cores and 2 threads, which would equal 8 vCPUs. That's how it shows on my Intel build. 

 

As for the testing you suggested, I'll run those and get back to you.

 

 

Link to comment
1 minute ago, Dom Stone said:

Right this was my understanding, but my configuration, if I switch it to 2 threads it throws me an error saying that isn't the correct configuration. So that's my main confusion. How is it 8 cores and 1 thread when I'm selecting pairs, it should be 4 cores and 2 threads, which would equal 8 vCPUs. That's how it shows on my Intel build. 

 

As for the testing you suggested, I'll run those and get back to you.

 

 

  <memory unit='KiB'>20971520</memory>
  <currentMemory unit='KiB'>20971520</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>14</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='33'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='34'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='35'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='36'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='37'/>
    <vcpupin vcpu='10' cpuset='6'/>
    <vcpupin vcpu='11' cpuset='38'/>
    <vcpupin vcpu='12' cpuset='7'/>
    <vcpupin vcpu='13' cpuset='39'/>
    <emulatorpin cpuset='1-7'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/3b8790bc-59c0-ff66-e9b9-c3c716abc8b5_VARS-pure-efi.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>

 <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <synic state='on'/>
      <stimer state='on'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <topology sockets='1' cores='7' threads='2'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='yes'/>
  </clock>

Here is a snippet of what my XML looks like. I am testing hpet(rzyen doesn't actually use hpet) and synic/stimer to get lower power consumption on idle and also less kernel time used from Unraid/QEMU for my machine. It actually was a pretty big drop honestly but I'm telling you because if you copy anything out of the config just be aware that mine is nonstandard.

 

I'd make sure your set to 3.0 because it brought a lot of improvements for threadripper(though there are some missing)

<type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>

 

Also are these fresh installs or move overs from your intel machine or something else?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...