Jump to content

bastl

Members
  • Posts

    1,267
  • Joined

  • Last visited

  • Days Won

    3

Posts posted by bastl

  1. It doesn't matter performance wise if set it to  cores='4' threads='1' or  cores='2' threads='2'. For me in my tests it always shows the same performance. I did a couple tests on the current 6.6.5 with different benchmarks (Cinebench, Aida, CPUz) and games (GTA, BF1, Rust) and all scores are nearly the same.

     

    The issue with the L1, L2 and L3 cache thats reported wrong to the VM, I don't know if this a Unraid specific thing that @limetech can fix or has to be implemented in the Linux kernel, Libvirt or Qemu.

  2. @Chamzamzoo Every first gen Threadripper has 2 dies. Max cores per die are 8. The smallest TR4 (1900x) has only 4 cores per die enabled. 0-3 + HT are on one die and 4-7 + HT on the second die. The increase of memory bandwith you see in AIDA is to the fact that you're using both dies each with it's own memory controller with 2 channels each. So with 2 Ryzen dies you get quad channel. 

     

    There is no actual "best setting" all depends on your needs. If you need the memory bandwith for your applications, use both dies. If you need the lower latency use only cores from one die and set you memory in your BIOS to NUMA or channel in most cases. There are still some quirks with KVM and the memory setting. You can "strictly" set your VM to use only memory from a specific node but a couple people reported that a bit of memory is still used from the other die which inreases the latency again. Also tweaking your xml to present the CPU as an Epyc to the VM can improve the performance a bit. In this case the actual CPU cache is presented to the VM in the correct way. Unraid itself for some reason changes the L1 L2 and L3 cache that the VM sees with it's standard settings for CPU model passthrough.

     

    Also noted, you only passing through the HT cores to the VM. The usual way is to passthrough the main core + it's hyperthread. I can't really tell if this makes any big differences in performance. I never tested it like you have set it up.

  3. If you're running in NUMA mode you gain improvement in memory bandwith cause you're accessing both memory controllers at the same time, 1 each die. But you will have slightly higher latency in memory access. All i tested for me and normal use of a VM (browsing, office stuff, gaming) i could't see any big differences. Switching the GPU in another slot or using a NVME in another one so the device isn't connected to the die i passthrough directly for me shows the same. No big noticable differences. I don't really care about +-5fps in games as long as everything runs smooth and it does. Read and write speeds of the NVME also kinda the same. You might see some hickups if the device is connected to the other die which is under heavy load. For me it never happens that my dockers or the Unraid itself gets to the point to use all the CPU ressources. Might be different with a plex container or some other transcoding dockers pushing the CPU to the limit. It all depends on how you're using your rig. 

     

    I don't really know where you can find the BIOS setting on a MSI board. On AsRock you can find it under CBS / DF common options. Default for memory interleaving setting for me is Auto. Channel switches it to Numa. Besides from setting my XMP profile for the RAM i only enabled the hardware virtualization support (SVM Mode and SR-IOU) and enabled IOMMU. Not really sure how the last one exactly is called. Something with IOMMU. 

     

    No matter how you set your memory you should always use cores/threads from the same die for a single VM and don't mix up the cores to reduce the latency. On my 1950x it looks like this:

     

    cores.JPG.edb2c0fd5bb3b4a9e14b5e75fa15fbe3.JPG

     

    The die1 is only for my main VM with GPU and NVME passthrough. Cores 8 and 24 are the emulatorpins and the rest is isolated and only used by a Win10 VM. On die0 I have all my dockers running and a couple VMs which i use from time to time. I am playing around currently with the emulatorpin setting, but i can't see any difference as without it. Isolating the emu pins, not isolating, no pinning at all for me i can't really see a difference. Using the emulatorpin on the other die i didn't checked yet. Worth a try. 

     

    • Like 1
  4. Isolating cores is to prevent unraid to use them for backround tasks or for docker. So for best performance in a VM you have to isolate the cores you wanna pin to your VM. Keep in mind core 0 is always used by unraid and you can't isolate this one. Also keep in mind always isolate the core + its HT. In my case i have a 1950x. 2 dies each with 8cores. I isolated all 8 cores from the second die which I use for my main Win10 VM. The rest of the cores are used by Unraid, Docker and some VMs i use from time to time.

  5. @Jerky_san I wouldn't trust CPUz in the first place inside a VM as well as other software. Cinebench never reads the correct core clock nore does any tool show me the right Vcore used. CPUz also shows me the core clock always on its max speed where in the backround on unraid you can see it's running on the idle speeds as it should be. I guess thats the way Qemu/KVM showing/emulating the CPU to the guest OS. 

  6. I did a couple tests. I have a EVGA 1050ti in my first PCIE slot and a 1080ti in my third slot. The 1080ti is mainly used for a gaming VM and the 1050ti for Linux VMs i use. For the Linux VMs i don't have to use a extra VBIOS. Only choosing Q35 and it works out of the box. The 1080ti i can passthrough to a Win10 VM without any modifications. Create the VM (i440fx; OVMF) with VNC, install the OS, add the GPU+Audio later, install the driver, remove VNC done. For the card in the first slot if i want that to passthrough to a Win10 VM i have to do the same as before excepts that i have to passthrough the VBIOS of the card to get it working without the ERROR 43. I got my VBIOS from TechpowerUP for the EVGA 1050ti and modified it like SpaceInvader described it in his video. Passthrough the modified VBIOS remove VNC, done and it works. But if VNC is enabled i'm still getting the Error. So make sure after installinng your OS and enabling RDP or installing Teamviewer to remotly accessing your VM, that you remove VNC. Another thing to mention it makes no difference for me turning the Hyper-V option on or off or manual edit the xml, it works.

     

     

  7. @testdasi

    Yesterday I reduced my Gaming VM to 6 cores + 6 threads on node 1 with all cores isolated and did a couple of benchmarks without running anything else on that die. Than i switched all my dockers and other VMs from node 0 to node 1, isolated the last 6 out of 8 cores and their threads on node 0 from unraid and switched over the gaming VM to node 0 where still my 1080ti should be attached to (if lstopo is correct). I don't flipped the cards around yet, because for now i don't need any vbios to pass through. The performance is basically the same except from small stutters/hickups and sound bugs every 30-40 seconds. Every game (BF5, Farcry5, DayZ, Superposition + Heaven benchmark) i tested gave me nearly the same performance as on node 1 + that weird stuttering. Don't exactly know why. I never had that issue if i isolate the second die and use these cores only. This gets me back again to my initiall idea, that maybe the BIOS is reporting the core pairings wrong to the OS. Why should i get stutters if the GPU is connected to the cores which are used directly and no stutters across the infitity fabric. Weird! 

     

    I didn't retested in NUMA mode. I did that before and as long as i don't mix up the dies for one VM it makes no difference in gaming performance. Using the UMA mode showed me in my tests that i get a higher memory bandwith with no real performance losts. 

  8. Thanks @SpaceInvaderOne the symlink fixed it for me 

     

    Why the hell is the first PCIE slot connected to the second die and the third slot to the first die? In the first slot i have a 1050ti which is used by a Linux VM which uses some cores from the first die. The 1080ti on the 3rd slot is mainly used for a gaming VM and using all cores (8-15;24-31 isolated) on the second die. I wish i could flip a switch in the BIOS to reverse that. I guess there is no chance for such an option, right?

     

    1145779383_topology-Kopie.thumb.png.646ee5248d26a94e71013357104bf766.png

  9. Ok, now it gets interesting. I already watched almost all videos from Wendell, but thanks for mentioning it here for people stumbling across this thread. @tjb_altf4 

     

    I might overlooked something by doin all my tests and the presented core pairings are alright. I assumed that the better memory performance depends on the cores and from which die they are. By switching between the options auto, die, channel, socket and none in the BIOS under AMD CBS settings, I should have already noticed that as soon as I limit a VM to only 1 die I get the memory bandwith from this specific memory controller. I basically cut the bandwith in half from quad channel (both dies) to dual channel. Makes perfectly sense. How could i miss that?

     

    If you need the memory bandwith for your applications, the UMA mode is the way to go. For me i have to set it to Auto, Socket or Die for the memory to get interleaved over all 4 channels and the CPU gets reported as only 1 node. By choosing the option Channel (Numa mode) I basically limit the memory access to the 2 channels from the specific die. The latency in this case should be reduced because you removed the hop to the other die. Option None will limit it to single channel memory and cuts the bandwith even further as shown in the pictures above. I'am actually not sure whats the difference between Auto, Die and Socket are. They all show similar results in the tests. And it should be also mentioned that it looks like Cinebench is more memory bandwith related as most people are reporting.

     

    Wendell mentioned in that video by using the lstopo to check which PCIE slots are directly connected to which die. Is there a way to check this without lstopo, which isn't available on Unraid? Right now my 1080ti is placed on the third PCIE slot x16 (1st slot 1050ti x16, second slot empty x8) and I'am not sure if it's directly attached to the correct die in my gaming VM. Maybe there is something already implemented in Unraid for listing the topology in a way lstopo did.

     

    Any ideas?

     

    Edit:

    Another thing i should have checked earlier are the behaviour of the clock speeds. Damn i feel so stupid right now. 

     

    watch grep \"cpu MHz\" /proc/cpuinfo

     

    Checking this command during the tests would have shown that as soon as i choose cores from both dies for a VM the clocks on all cores ramp up. If i assign the core paires Unraid gives me, only one die ramps up to full speed and the other stays on idle clocks.  🙄

     

    • Upvote 1
  10. As reported earlier for the 1950x on a ASRock Fatal1ty x399 Gaming Pro something is reported differently. Looks like the same happened for Jcloud on his Asus Board. Currently I'am on the 6.6 RC2. I couldn't realy find a BIOS setting to change the behaviour how the dies are reported to the OS.  It always been reported as 1 node. 

     

    numactl.JPG.a8a4c1b33ba578c23ac1f86428ff2e22.JPG

     

    numastat.JPG.ddf8c699918ad77e3e6401af834a69f9.JPG

     

    Edit:

    @testdasi

    It looks like your RAM usage for your VMs isn't optimized either. If I understand the shown scheme right, for example your VM with PID 33117 uses half the RAM from 2 different nodes which have a memory controller build in. In case u have more than 1 die assigned to the VM thats ok, but if you use lets say 4 cores from 1 die, it should use the 4GB RAM from the same node and not from another node.

  11. 15 minutes ago, testdasi said:

    Do you know which AGESA was the proper fix in? That probably helps the TR peeps to know for sure the min BIOS to use.

    Agesa 1.0.0.4 you needed some sort of extra patch. The Agesa 1.0.0.6 never released for me at least from ASRock as stable. Only a beta version was available, i never testet. I think it mainly adressed memory incompatibility for the AM4 Ryzen chips and came with some microcode updates to fix security issues. The Agesa version 1.1.0.0 should be the first one including the fix.

     

  12. BIOS is up to date now with version 3.30. Same results. Core pairings are showing wrong in 6.6.0-rc1. I played around a bit and tested a couple things. First boot it came up with tons of PCI reset errors, but it looks fine now after the second reboot. ACS override i can disable now and get most devices split up in there own groups now. Only the network interfaces are grouped together. 

  13. I did a couple more tests with all available memory interleaving settings. In the ASRock BIOS settings under AMD CBS / DF Common Options i found 5 available options (auto, die, channel, socket, none). Auto settings is what i used before in all of my tests. This time i only tested with the Win10 VM using cores 16-31. The die and socket option produced pretty much the same results as auto

     

    1776380582_win1016-31autodiesocket.thumb.jpg.ea6a6d3804ebe2ee235f8cddb4094940.jpg

     

    and as expected choosing channel or none interleaving showed the worst performance.

     

    1077202318_win1016-31nonechannel.thumb.jpg.bf4c8d9cb5b2a6ada181b86add8d69e2.jpg

     

    If i had accidentally choosen cores from both dies i guess the results by selecting the die option for memory interleaving would be different. I searched around and tested a bit in the BIOS for an option to maybe force the BIOS to report the cores in a different way to unraid, but without luck. I couldn't find any option specific to select UMA or NUMA either. I know you can set it in Ryzen Master, but the software doesn't work inside a vm. Maybe i will test it tomorrow with a bare metal install and check what else is changed in the BIOS after choosing the NUMA/UMA setting in Ryzen Master.

     

    Enough for today. Good night to everyone 😑

×
×
  • Create New...