DZMM Posted November 25, 2018 Share Posted November 25, 2018 8 hours ago, Symon said: And Numatune: <numatune> <memory mode='interleave' nodeset='1'/> </numatune> Thanks for your help guys! Does it matter where in the VM xml this goes? Thanks Quote Link to comment
Chamzamzoo Posted November 25, 2018 Share Posted November 25, 2018 4 hours ago, Jerky_san said: So.. its important to set so you don't go outside a core that has access to the memory. It will introduce stuttering into games with intense graphics. The one I test on is Dying Light. Before I did this little patch I was getting max 80fps. This patch skyrocketed my fps to 120 nearly consistent. Your L3 cache gets the largest boost of from 50ish to 10-11 and your memory latency usually drops from low 100's to very close to bare metal. Its amazing. Thanks for all the explanations, it's coming together slowly in my head. I'm not sure how I would know which core has access to the memory though, any chance would you be able to explain that bit for me? Quote Link to comment
DZMM Posted November 25, 2018 Share Posted November 25, 2018 (edited) On 11/21/2018 at 3:36 PM, Nooke said: Go into BIOS -> OC -> Advanced DRAM Configuration. Scroll down to "Misc Item" and look for "memory interleaving". Change this from "auto" to "channel" and you are in NUMA mode. Brilliant - it worked. The results were interesting. Organising my 4 VMs the way I want looks easy, although the SM961 m.2 drive they share is on the wrong NUMA for two of them. Luckily, they are the two less important ones (kids): Does NUMA 'tuning' also apply to HDDs? My cache pool drives sdl and sdk are on different NUMAs - should I swap sdl's connector with one from sdc-sdj? Once someone confirms where numatune goes in the xml file, I'll start experimenting! Edit: found it <numatune> <memory mode='interleave' nodeset='1'/> </numatune> Edited November 25, 2018 by DZMM found nodeset location Quote Link to comment
bastl Posted November 25, 2018 Share Posted November 25, 2018 numastat qemu Shows current running VMs and their use of RAM. As i said earlier, you can tell the VM with the numatune option which node to use the memory from but it looks like it doesn't. There are always a couble megs used from the other node. First VM is set to use 8GB and 4 cores from node0, second VM is set to use 16GB and 14 cores from node1. <memory mode='strict' nodeset='1'/> Strict should limit the VM only to a specific node. But it doesn't. "prefered" or "interleaved" doesn't change anything for me. Quote Link to comment
DZMM Posted November 25, 2018 Share Posted November 25, 2018 11 minutes ago, bastl said: numastat qemu Shows current running VMs and their use of RAM. As i said earlier, you can tell the VM with the numatune option which node to use the memory from but it looks like it doesn't. There are always a couble megs used from the other node. First VM is set to use 8GB and 4 cores from node0, second VM is set to use 16GB and 14 cores from node1. <memory mode='strict' nodeset='1'/> Strict should limit the VM only to a specific node. But it doesn't. "prefered" or "interleaved" doesn't change anything for me. I've just done 3 of my 4 VMs and my memory usage looks the same for interleaved: Per-node process memory usage (in MBs) PID Node 0 Node 1 Total ----------------------- --------------- --------------- --------------- 18388 (qemu-system-x86) 4143.74 167.77 4311.51 29504 (qemu-system-x86) 4864.92 3407.02 8271.94 36634 (qemu-system-x86) 4881.64 3376.45 8258.08 115105 (qemu-system-x86) 8236.66 25.16 8261.83 ----------------------- --------------- --------------- --------------- Total 22126.96 6976.40 29103.36 18388 is the VM I haven't tweaked yet as it's my pfsense firewall, so I have to do that when the family aren't doing stuff. Unfortunately all the memory is on Node 0, when the cores are on 1. I'm wondering if memory mode isn't working because the memory is already in use e.g. by unRAID, dockers etc so if there's not enough when the VM is created unRAID uses memory from the other node? Otherwise to allocate say 8GB when there's only 6GB available it would have to move other blocks around???? Quote Link to comment
bastl Posted November 25, 2018 Share Posted November 25, 2018 The 16GB VM i tried to use RAM and cores from node0 only and it shows the same. It always uses a couple megs from the other node. Don't know why. Something is off. 64GB in total and not even running any Docker or other stuff it happens. Can you check one of your VMs with "preferred" instead of "interleaved" how it behaves? @DZMM Also i don't know why you using "interleaved" anyways. This forces Qemu to spread the RAM across all nodes. The point for pinning down the RAM to a specific node/die is to reduce latency and improve the performance of a VM. Quote Link to comment
DZMM Posted November 25, 2018 Share Posted November 25, 2018 26 minutes ago, bastl said: @DZMM Also i don't know why you using "interleaved" anyways. This forces Qemu to spread the RAM across all nodes. The point for pinning down the RAM to a specific node/die is to reduce latency and improve the performance of a VM. I just followed @Symon earlier post blindly All good now, will set 19218 to Node 1 strict later: Per-node process memory usage (in MBs) PID Node 0 Node 1 Total ----------------------- --------------- --------------- --------------- 10236 (qemu-system-x86) 8269.48 0.00 8269.48 19218 (qemu-system-x86) 4146.88 181.60 4328.48 124935 (qemu-system-x86) 19.52 8257.12 8276.64 127416 (qemu-system-x86) 8264.96 0.00 8264.97 ----------------------- --------------- --------------- --------------- Total 20700.85 8438.73 29139.58 Quote Link to comment
Nooke Posted November 25, 2018 Share Posted November 25, 2018 1 hour ago, bastl said: numastat qemu Shows current running VMs and their use of RAM. As i said earlier, you can tell the VM with the numatune option which node to use the memory from but it looks like it doesn't. There are always a couble megs used from the other node. First VM is set to use 8GB and 4 cores from node0, second VM is set to use 16GB and 14 cores from node1. <memory mode='strict' nodeset='1'/> Strict should limit the VM only to a specific node. But it doesn't. "prefered" or "interleaved" doesn't change anything for me. I guess you took too much RAM on your VM. I'm running with strict memory and my Gaming VM is only using stuff from the node I set it to. What you have to keep in mind, if you have like 32 GB overall with 4x8 GB sticks and have those in normal quad channel mode (motherboard manual). you can't take exactly 16GB for a VM. Because even without running any VM there is some small amount of memory already used from each node. so just check how much memory is available before starting the VM with "numactl --hardware" and then just use that much and not more. cheers Quote Link to comment
Symon Posted November 25, 2018 Share Posted November 25, 2018 (edited) 1 hour ago, DZMM said: I just followed @Symon earlier post blindly For me it works this way . If I use interleave, it uses the configured node for most of the RAM.. (10 MB always seem to get taken from the other node) If I use strict, the VM takes longer to boot but all of the RAM is on one node. Same as you, I have the problem that my cache disks are on a differente node than my GPU.. For now I decided to go with the node that the GPU is connected to for the gaming VM. My configuration currently looks now like this: (Cores 1-7, 18-23 are isolated from unraid) <vcpu placement='static'>12</vcpu> <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='18'/> <vcpupin vcpu='2' cpuset='3'/> <vcpupin vcpu='3' cpuset='19'/> <vcpupin vcpu='4' cpuset='4'/> <vcpupin vcpu='5' cpuset='20'/> <vcpupin vcpu='6' cpuset='5'/> <vcpupin vcpu='7' cpuset='21'/> <vcpupin vcpu='8' cpuset='6'/> <vcpupin vcpu='9' cpuset='22'/> <vcpupin vcpu='10' cpuset='7'/> <vcpupin vcpu='11' cpuset='23'/> <emulatorpin cpuset='1,17'/> </cputune> <numatune> <memory mode='strict' nodeset='0'/> </numatune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.10'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/18da3180-7fb3-6e5c-7013-963dfa89ec0a_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> </features> <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <topology sockets='1' cores='6' threads='2'/> <feature policy='require' name='topoext'/> <feature policy='disable' name='monitor'/> <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='svm'/> </cpu> On the second node I run 6 x Windows 2016 servers and the Win 10 VM of my wife (with numatune) Edited November 25, 2018 by Symon Quote Link to comment
Symon Posted November 25, 2018 Share Posted November 25, 2018 (edited) Tried it with "strict". The VM takes quite a bit longer to start but now the RAM is completely set to one node. I will have to do some tests Update: After doing a Cinebench test, memory has been moved to the other node again Update2: Did a Cinebench test on my wives VM (which is on the same node as GPU / cache disks) and the memory stayed on the same node. So my guess is that 'strict' only works if the node is connected to the storage (cache/passthorugh discs) as well. Edited November 25, 2018 by Symon Quote Link to comment
bastl Posted November 25, 2018 Share Posted November 25, 2018 @Nooke I have 4x16GB. 2x16 each die and if i have nothing run except of 1 VM with 16GB please explain me why I 1 hour ago, Nooke said: took too much RAM on my VM. Only reason could be Unraid claiming more than 16GB from node1. 1 hour ago, Symon said: So my guess is that 'strict' only works if the node is connected to the storage (cache/passthorugh discs) as well. My GPU and the NVME controller i pass through to the VM are connected to node1 which cores and RAM I assigned to the the VM and still it uses RAM from node0 ^^ Quote Link to comment
mikeyosm Posted November 26, 2018 Share Posted November 26, 2018 (edited) Guys, need some help. Having read all of the above I'm not convinced my set up is optimal. Hardware: MSI x399 MEG Creation (Latest as of today BIOS) - TR 2950X - Hyperx 3200 32GB (2x16) BIOS: Channel enabled (from Auto) UNRAID 6.6.5 VM: W10, 12GB mem, 6 cores (12 threads) CPU I haven't made any modifications (numa, epyc) to the XML for my W10 VM and ran a Aida64 mem bench. I think the stats are poor right? I know I'm only using 12 threads but still my memory scores should be better? I see that the topic mainly relates to 2990wx but I have a 2950x. Any tips on what I need to check, change would be appreciated. UNRAID BARE METAL Edited November 26, 2018 by mikeyosm added physical mem bench Quote Link to comment
TType85 Posted November 26, 2018 Share Posted November 26, 2018 I have been having issues getting the cache to show up right using this tweak. I ended up creating a new VM and the cache showed up correctly; so I created a new VM XML for my existing windows install and now it shows correctly. I will have to test games when I get home. Quote Link to comment
Jerky_san Posted November 26, 2018 Share Posted November 26, 2018 8 hours ago, mikeyosm said: Guys, need some help. Having read all of the above I'm not convinced my set up is optimal. Hardware: MSI x399 MEG Creation (Latest as of today BIOS) - TR 2950X - Hyperx 3200 32GB (2x16) BIOS: Channel enabled (from Auto) UNRAID 6.6.5 VM: W10, 12GB mem, 6 cores (12 threads) CPU I haven't made any modifications (numa, epyc) to the XML for my W10 VM and ran a Aida64 mem bench. I think the stats are poor right? I know I'm only using 12 threads but still my memory scores should be better? I see that the topic mainly relates to 2990wx but I have a 2950x. Any tips on what I need to check, change would be appreciated. UNRAID BARE METAL So you may be allocating ram from the other die so if your CPUs are all on a single die then you need to use the node-set parameter and set it to the node its running on. On a 2950x I'm guessing its just 0 or 1. You should also try the EPYC fix though your latency isn't nearly as high as mine was so you probably want to see if your cache is out of wack first. Mine definitely was. <numatune> <memory mode='static' nodeset='0'/> </numatune> Please read the below two posts from me earlier in the thread. Quote Link to comment
mikeyosm Posted November 26, 2018 Share Posted November 26, 2018 1 hour ago, Jerky_san said: So you may be allocating ram from the other die so if your CPUs are all on a single die then you need to use the node-set parameter and set it to the node its running on. On a 2950x I'm guessing its just 0 or 1. You should also try the EPYC fix though your latency isn't nearly as high as mine was so you probably want to see if your cache is out of wack first. Mine definitely was. <numatune> <memory mode='static' nodeset='0'/> </numatune> Please read the below two posts from me earlier in the thread. Ill try that thank you. BTW, should that be strict not static for numatune? Quote Link to comment
Jerky_san Posted November 26, 2018 Share Posted November 26, 2018 1 hour ago, mikeyosm said: Ill try that thank you. BTW, should that be strict not static for numatune? Yeah sorry on that one Quote Link to comment
Jerky_san Posted November 27, 2018 Share Posted November 27, 2018 8 hours ago, TType85 said: I have been having issues getting the cache to show up right using this tweak. I ended up creating a new VM and the cache showed up correctly; so I created a new VM XML for my existing windows install and now it shows correctly. I will have to test games when I get home. I hope it turns out great for you. Quote Link to comment
TType85 Posted November 27, 2018 Share Posted November 27, 2018 1 hour ago, Jerky_san said: I hope it turns out great for you. Wow runs a little bit smoother, not much frame-rate improvement but it *feels* a bit smoother. Quote Link to comment
mikeyosm Posted November 27, 2018 Share Posted November 27, 2018 (edited) 9 hours ago, Jerky_san said: Yeah sorry on that one So, I added the EPYC section and numatune strict, however performance was no better. So I changed to interleave and there is an improvement in L3 bench. Still not happy with memory read/write/copy being half the bare metal performance. Edited November 27, 2018 by mikeyosm Quote Link to comment
mikeyosm Posted November 27, 2018 Share Posted November 27, 2018 6 hours ago, TType85 said: Wow runs a little bit smoother, not much frame-rate improvement but it *feels* a bit smoother. What did the cache look like before you created a new xml? And how does it look now? A comparison would be good so I can troubleshoot my performance issues. Thanks. Quote Link to comment
bastl Posted November 27, 2018 Share Posted November 27, 2018 @mikeyosm You only using 2x16GB of RAM. You can't achive the performance of bare metal dual channel memory speed if you limit a VM to one channel only. Just sayin. Quote Link to comment
mikeyosm Posted November 27, 2018 Share Posted November 27, 2018 1 minute ago, bastl said: @mikeyosm You only using 2x16GB of RAM. You can't achive the performance of bare metal dual channel memory speed if you limit a VM to one channel only. Just sayin. Understood so how do I make use of dual channel with a VM? Quote Link to comment
bastl Posted November 27, 2018 Share Posted November 27, 2018 (edited) Don't use the numatune option if you want the bandwith of both channels. Or set it to "interleaved" and specify both nodes. <numatune> <memory mode='interleave' nodeset='0-1'/> </numatune> It's up on you what works better for you. Depends on the application and the games. Some benefit of the higher bandwith some of the lower latency. Edited November 27, 2018 by bastl typo 1 Quote Link to comment
mikeyosm Posted November 27, 2018 Share Posted November 27, 2018 14 minutes ago, bastl said: Don't use the numatune option if you want the bandwith of both channels. Or set it to "interleaved" and specify both nodes. <numatune> <memory mode='interleaved' nodeset='0-1'/> </numatune> It's up on you what works better for you. Depends on the application and the games. Some benefit of the higher bandwith some of the lower latency. OK, makes sense. However, before I made any mods to the XML or setting my BIOS to channel from auto, memory read/write/copy in Aida64 were exactly the same (half the bare metal mem perf). Adding numatune and EPYC has not impacted read/write/copy - only improve L3 chache a bit. Quote Link to comment
bastl Posted November 27, 2018 Share Posted November 27, 2018 The numatune setting only works if you set your BIOS settings for the RAM to channel. On all TR4 boards i saw so far the default setting is auto and the CPU is presented as 1 node to the OS. You might have to check your manual of the board how to setup the RAM slots correctly for 2 dimms. You're the first one i see here in the forum with only 2 dimms on the TR4 platform. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.