Ryzen 3950x CPU cores assignation and gaming performance (CSGO 100fps drop)


Astror

Recommended Posts

Hi everyone, first a little bit about my system. In January I decided to retire my old computer, and I knew of the existence and possibilities of Unraid, so I decided to go for it and assembled a computer with the goal of it being my NAS/Home Server/Gaming-Work machine. Here are the specs: 

 

  • Ryzen 3950x
  • Gigabyte x570 Aorus Master
  • Corsair Vengeance LPX 32 Gb 3200 Mhz (2x16)
  • Sabrent 2TB Rocket NVMe PCIe M.2 (Cache pool)
  • WD Elements Desktop 10 TB - X2 (Unraid pool)
  • Corsair HX1200 1200W
  • MSI Radeon RX 580 Armor 8G OC 8GB (from my old computer, now I´m waiting for new cards to be available :( )

 

From the moment I had it “properly” configured I have been using it as mentioned, I do all my gaming in a Windows 10 VM, and I have been enjoying it a lot, no major problems whatsoever. Of course, at the beginning I did some testing to see the difference in gaming performance between the VM and a bare-metal Win 10 installation. The thing is, I play a lot of CSGO, so it was my go-to benchmark. Even then, I noticed that there was a weirdly big difference (more details in a moment), but other games ran well so I didn’t give it much importance, and CSGO was also perfectly playable so I just kept using this machine as is.

 

Now I have time so I have been doing some testing again, to see if that difference in CSGO was still a thing, and it is. I though that it may not be the case, because the way I configure the CPU cores assigned to the VM has changed from the moment that I first assembled the system, the reason being that I found a configuration which had a Cinebench multi-core score that was very close to bare-metal. That configuration is the next:

 

now.png.3ce46822a9cecb5ea0b024051e05e9c1.png Now

 

With this configuration Cinebench R23 gives 17.400 (aprox) points, while the bare-metal Win 10 gives 24.000 (aprox). I don’t know why they are so close, given that I have half the CPU working here (I think). But it is how I am using the VM now, versus how I had it during the first months:

 

before.png.d7ed796a3941d09f3f42efcd7d4976f5.png Before

 

Because of this difference I expected to have a closer experience to bare-metal in CSGO, but as you will see now, there is still a 100 fps difference in the CSGO fps Benchmark (a workshop map for testing):

 

  • Bare-metal: 395 fps
  • VM: 295 fps

 

This is the difference in either of the CPU core configuration (approximately of course). In both cases the game is configured so it is never GPU bounded, meaning that the graphical settings are set to the lowest possible.

 

Suffice to say I don’t have the slightest idea of what is going on here, but I will give more details of my testing in case it is of some help. My suspicion is that whatever the reason, it has to do with the CPU, because when the GPU is the bottleneck the difference in performance is close to cero, even in CSGO. I have made some testing with 3DMark to back this up:

 

  • Bare-metal: Graphics Score 4391 - CPU Score 13445
  • VM: Graphics Score 4387 - CPU Score 12477

 

I think that this result is more representative of what I experience in games in general, as it is usually the GPU what limits the fps in games. For this very reason I am not very worried about this issue, but I am really curious about the big gap in the CSGO benchmark (again, the game runs just well and I find the fps in game more than sufficient in general).

 

To sum up, I would like to know why there is such difference in this particular case (maybe too niche and not many people here will care I imagine). It is also a good opportunity for me to maybe learn a bit about the CPU core assignation with the 3950x, as I am curious about what is the correct way (or the one that uses the 8 cores inside the same CCX). I have tried to be as detailed as possible, but I am sure there are important aspects of the problem that I haven’t talk about, in which case I will be happy to answer any questions.

 

Thanks a lot in advance.

Link to comment

It’s to do with cross die and CCX latancy. Also setting up the MV for the correct application.

 

a lot of this is talked about in the thread:

 

https://forums.unraid.net/topic/73509-ryzenthreadripper-psa-core-numberings-and-assignments/#comment-676202

 

TLDR

 

workstation - spread load evenly across cores / dies / ccx 

gaming - minimise cross die and CCX interaction.

 

If you are just gaming. You will probably get better performance with 4 cores from one CCX.

 

I have a 3900x and I use one die (2 CCX) for 6C/12T gaming VM. I can’t tell the difference between BM and VM. Benchmark results (even synthetic ones) are within a few %.

Link to comment
10 hours ago, gray squirrel said:

It’s to do with cross die and CCX latancy. Also setting up the MV for the correct application.

 

a lot of this is talked about in the thread:

 

https://forums.unraid.net/topic/73509-ryzenthreadripper-psa-core-numberings-and-assignments/#comment-676202

 

TLDR

 

workstation - spread load evenly across cores / dies / ccx 

gaming - minimise cross die and CCX interaction.

 

If you are just gaming. You will probably get better performance with 4 cores from one CCX.

 

I have a 3900x and I use one die (2 CCX) for 6C/12T gaming VM. I can’t tell the difference between BM and VM. Benchmark results (even synthetic ones) are within a few %.

After some reading (that post you mentioned was really usefull thanks), I understand that it is in my best interest for the gaming VM to set the CPU in NUMA mode, and later set the cores of the VM to use the 8 cores of the CCD that the graphics card connects to. That will reduce latency and help in the games that try to push fps to the maximum (like the CSGO case I´m working with). 

 

That said, I have been exploring the BIOS settings, and I can´t find the option mentioned in the post that is suposed to activate the NUMA mode, that being the AMD CBS/DF Common Option/Memory Addressing with 5 different modes: auto, die, channel, socket and none. With Channel being the one that I am looking for.

 

In my case I see 2 options that (I suppose) are related to this:

1509526841_WhatsAppImage2020-12-27at12_12.56(1).thumb.jpeg.5666a167af0889a6a70f7ddeafbdeb81.jpeg851054648_WhatsAppImage2020-12-27at12_12_56.thumb.jpeg.70c0f78331b7fc02e35f9757d6c617ff.jpeg

 

I have tried the 8 possible combinations, Memory interleaving AUTO and disabled, and the 4 NUMA nodes options in each case, none made Unraid see more than one node with numactl --hardware. And there is no channel mode to be seen anywhere (either inside AMD CBS or in any other tab), which leads me to believe that maybe the ability to choose NUMA node may be deprecated in thi BIOS version (F31q, the latest one as of December 2020). If thats the case, is this a dead end?

Edited by Astror
Link to comment

The NUMA stuff is related to threadripper not your 3950x. But the principal is the same.

 

your 3950x is made up of two CPU dies. Each die has two CCX made up of 4 cores.

 

if you are gaming. You want to avoid crossing over die to die as this will add significant latancy as the OS isn’t aware of the layout of the CPU (because it’s in a VM)

 

lowest latency will be one CCX. But this might not be enough CPU power for you. My view with my 3900x was to give the VM one whole die.

 

in your first layout you have passed all the hyperthreads of all the cores, this will give good for something like rendering as long as the host isn’t doing anything. But will be very bad for latency.

 

in your second layout you have given two CCX’s across two dies. This will be very poor.

 

Try giving it 8-15 + the hyper threads and your performance should be good.

 

remember to isolate the cores and hyper threads from the host. 
 

please also ensure you follow the guidance on this.

 

 

 

Link to comment

Although I have not tested CSGO this was my results BM 12 core vs 6 core VM. This is with a GTX 1080.

 

Cinebench R20

BM multi = 6675

VM multi = 3488 (52%)

BM single = 511

VM single = 471 (92%)

 

Cinebench R15

BM multi = 3010

VM multi = 1457 (49%)

BM single = 192

VM single = 190 (99%)

 

timespy

BM GPU = 7548

VM GPU = 7446 (98%)

 

CIV VI

BM turn time = 7.56

VM turn time = 7.63 (99%)

BM FPS = 128

VM FPS = 130 (102%)

 

F1 2019

BM av FPS 136

BM min FPS 101

VM av FPS 124 (91%) this was gsync on for some reason so probably attributes most of thr differences.

VM min FPS 99

 

mankind divided

BM av FPS 73.7

VM av FPS 69.7 (95%)

 

RDR2

Bm av FPS = 62.13

VM av FPS = 63.8 (103%)

 

at that point I gave up benchmarking and just started gaming as I doubt I will ever notice the difference.

 

edit: I game at 1440p so am GPU bound in most situations. In games I have 99% GPU utilisation 
 

 

Edited by gray squirrel
Link to comment
7 hours ago, gray squirrel said:

Although I have not tested CSGO this was my results BM 12 core vs 6 core VM. This is with a GTX 1080.

 

Cinebench R20

BM multi = 6675

VM multi = 3488 (52%)

BM single = 511

VM single = 471 (92%)

 

Cinebench R15

BM multi = 3010

VM multi = 1457 (49%)

BM single = 192

VM single = 190 (99%)

 

timespy

BM GPU = 7548

VM GPU = 7446 (98%)

 

CIV VI

BM turn time = 7.56

VM turn time = 7.63 (99%)

BM FPS = 128

VM FPS = 130 (102%)

 

F1 2019

BM av FPS 136

BM min FPS 101

VM av FPS 124 (91%) this was gsync on for some reason so probably attributes most of thr differences.

VM min FPS 99

 

mankind divided

BM av FPS 73.7

VM av FPS 69.7 (95%)

 

RDR2

Bm av FPS = 62.13

VM av FPS = 63.8 (103%)

 

at that point I gave up benchmarking and just started gaming as I doubt I will ever notice the difference.

 

edit: I game at 1440p so am GPU bound in most situations. In games I have 99% GPU utilisation 
 

 

First thanks for putting the effort in tryng to help me here, I was clearly a bit lost even after a whole year using Unraid. I see that you have made quite a lot of benchmarking to make sure that your VM worked fine. I can also see in all cases your tests were GPU limited (at least in games, Cinebench I guess is another thing but the results are well withing what I would expect). But in the case of CSGO at lowest settings I think that I am being limited by the CPU somehow.

 

7 hours ago, gray squirrel said:

The NUMA stuff is related to threadripper not your 3950x. But the principal is the same.

 

your 3950x is made up of two CPU dies. Each die has two CCX made up of 4 cores.

 

if you are gaming. You want to avoid crossing over die to die as this will add significant latancy as the OS isn’t aware of the layout of the CPU (because it’s in a VM)

 

lowest latency will be one CCX. But this might not be enough CPU power for you. My view with my 3900x was to give the VM one whole die.

 

in your first layout you have passed all the hyperthreads of all the cores, this will give good for something like rendering as long as the host isn’t doing anything. But will be very bad for latency.

 

in your second layout you have given two CCX’s across two dies. This will be very poor.

 

Try giving it 8-15 + the hyper threads and your performance should be good.

 

remember to isolate the cores and hyper threads from the host. 
 

please also ensure you follow the guidance on this.

 

 

 

 

I have been reading from that post you mentioned (a lot of interesting stuff about CPU pinning), and I have followed OPs recommendations, some things I hadn´t considered until now (like the 'Disk Cache' settings of Tips and Tweaks), but I guess the most important one has been to isolate the CPU cores that I assign to the VM with isolcpus. I expected this to have some effect but I am afraid that the tests I made after that didn´t show any improvements (all testing has been done with Docker off to ensure no other important tasks affected performance inside the VM). I have tried different combinations of isolated cores ranging from 2 to 8 cores, all had similar scores in the CSGO benchmark withing a +-20 fps, so I guess they can be considered the same. In all cases the cores I chosed where paired threads (not like the first setup I posted, with a thread from every core of the CPU).

 

At this point I am starting to think that in these extreme cases where the CPU speed (or latency I guess) is so important the VM strategy simply isn´t capable to perform as well as a bare-metal machine. In which case I don´t think this is the end of the world, as 300 fps seems plenty for a good CSGO experience:P.

 

Right now I am using the 8-15 cores + hyperthreads as sugested by @gray squirrel, but the situation is the same as in the OP. If someone else has another idea of what may be causing this (maybe something related with the CPU governor?) I am all ears.

 

Thanks @gray squirrel again for the guidance, if nothing else I think that I understand a lot more about CPU pinning now.

Edited by Astror
Link to comment

So I did a bit of testing with CSGO VM vs BM @1080p and see similar results.

 

The community benchmark goes from around 400 to 320 average FPS. Playing with bots on one of the maps and there is a same result around 80 FPS drop. But this is such an extreme test and so unrealistic. I bet if I made a 4 core (one CCX) VM it would be much closer. But 300 FPS in a game is a bit ludicrous anyway.

Edited by gray squirrel
Link to comment
1 hour ago, gray squirrel said:

So I did a bit of testing with CSGO VM vs BM @1080p and see similar results.

 

The community benchmark goes from around 400 to 320 average FPS. Playing with bots on one of the maps and there is a same result around 80 FPS drop. But this is such an extreme test and so unrealistic. I bet if I made a 4 core (one CCX) VM it would be much closer. But 300 FPS in a game is a bit ludicrous anyway.

Well thanks for taking the time 🦸, just knowing that you can replicate the same results is a relieve. I guess this is just what you can get in a VM... As you said 300 fps is obviously a lot, but, as a couter point, it is believed that for how the game engine works it is an advantage to have as many fps as possible in this particular game, from a competitive advantage perspective... That said, for my skill level I am good with this performance.

 

Thanks again :D

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.