**VIDEO GUIDE** How To Use LSTOPO for Better VM Performance on Multi CPU and Threadripper Systems


SpaceInvaderOne

Recommended Posts

So I want to make sure that I have this correctly understood:

1) If I interpret this correctly, I only have ONE node on my 1920x. So I have less complexity/latency issues to worry about yes?

2) On the Core, diagrams, #0 & #12 are the logical and SMT correct? I want to make sure I'm pinning the correct cores/threads to my VM.


That being said, any improvements on layout? I currently have the PCIE slots laid out physically as such:

GTX 210
PCIe Empty
LSI 9211 8i
Titan Xp
PCIe Empty
Mellanox 10GBe card

I'm looking to add a PCIe video capture card in the empty slot under the 210 at some point so my layout will be pretty full, I don't have many options in regards to shuffling from slot to slot.
 

topology_markup.png

Edited by DayspringGaming
Link to comment
4 hours ago, DayspringGaming said:

1) If I interpret this correctly, I only have ONE node on my 1920x. So I have less complexity/latency issues to worry about yes?

NO. Every Threadripper chip has at least 2 nodes (2 chips). Otherwise you wouldn't have quad channel memory or more PCI lanes available on x399 compared to the Ryzen chips. Go to your BIOS and change the memory configuration from UMA to NUMA. The setting is somewhere in the AMD CBS settings, at least for me. Auto, die, channel, socket and none is available and on default it's set to "auto" and reports only 1 node. Set it to "channel" (NUMA mode) and it will report the both dies separated.

 

4 hours ago, DayspringGaming said:

2) On the Core, diagrams, #0 & #12 are the logical and SMT correct? I want to make sure I'm pinning the correct cores/threads to my VM.

Depending on board manufacturer and BIOS versions it's different how the core pairings are presented to unraid. Could be possible by changing to NUMA mode the core pairings are shown differently than in UMA mode.

 

4 hours ago, DayspringGaming said:

That being said, any improvements on layout? I currently have the PCIE slots laid out physically as such:

After changing to NUMA mode you should see in LSTOPO which device is connected to which node directly.

Link to comment

@DayspringGaming You ever wondered what the Game mode in Ryzen Master does? Exactly this, otherwise for windows there is only 1 chip with all the cores. It doesn't know that there are 2 nodes underneath and it doesn't know the exact topology where the devices are directly attached to. The memory management is a different and also the related memory latency issue especially on the first gen TR4. NUMA vs. UMA if you search for that you will find a couple things to understand it better. For virtualisation especially if you want to fine tune it and optimise the performance of a VM you have to know about it, otherwise you will always have some bottlenecks and wont know where they come from.

Link to comment
  • 8 months later...

Sorry for the thread bump, hopefully this is the right place for this. I'm struggling to decipher the lstopo output for my mobo.

 

I (finally) was able to get a Windows VM set up on my EVGA SR-2 motherboard, with GPU passthrough, despite the NF200 pcie multiplexer chips. For those that don't know anything about this board....it's a weird LGA1366 / Intel 5520 era part that is dual CPU but only single chipset, intended for overclockers and folks running quad-SLi......which means, all the PCIe is through one CPU supposedly. I got the VM running but with a bit disappointing performance, so am in the tweaking and tuning stage.

However, running the lstopo command, here's the diagram I get back (no ACS overrides or unsafe interrupts enabled). Both CPUs are in their own physical node, which makes sense, but all the IO and PCIe are just floating weirdly. The terminal output lists all that under "HostBridge P#0", but it's not visible on the diagram.

So here are my questions:

  1. If I'm understanding this correctly, that means that all the IO/PCIe/etc are only connected to the P#0 Numa node?
  2. If so, why isn't the IO/PCIe stuff inside the green box for Node P#0 as on other diagrams in this thread?
  3. Would it be somewhat accurate to think of this topology as half of a Threadripper 2970X? I.e., there are two 'dies', one of which has PCIe access (though, both have memory access here), and tuning tricks may apply?
  4. finally, for CPU allocation, my goal with the system was initially to set up a 4 gamer VM tower for group parties, if/when those happen again with 'Rona. Would it make sense to split CPUs between the two nodes i.e. allocate 1-2 from the PCIe connected CPU + 1-2 from the non-connected per VM, since I don't have a lot of cores to go around?
  5. OR is it fine to do 2-3 core per CPU since the PCIe/IO stuff isn't directly shown as belonging to a given node?
    1. Some physical evidence for this as an option is that the board will boot to a GPU with one CPU disabled via enable/disable jumpers on the board. Found that out while troubleshooting other stuff on it, so my thinking is that the PCIe/IO may be addressable by either CPU?
  6. Or am I better off going with a different motherboard given two numa nodes, only one of which is PCIe connected?

Thanks all!

 

not sure how much of this is useful, sooooo here's a bunch.

 

lstopo output:

BadDecisionsSR2_STOCK_IOMMU_4.thumb.png.a301410497a5622055c1005d0b96d2df.png

 

lstopo text:

lstopo -s
depth 0:        1 Machine (type #1)
 depth 1:       2 NUMANode (type #2)
  depth 2:      2 Package (type #3)
   depth 3:     2 L3Cache (type #4)
    depth 4:    12 L2Cache (type #4)
     depth 5:   12 L1dCache (type #4)
      depth 6:  12 L1iCache (type #4)
       depth 7: 12 Core (type #5)
        depth 8:        24 PU (type #6)
Special depth -3:       13 Bridge (type #9)
Special depth -4:       11 PCI Device (type #10)
Special depth -5:       6 OS Device (type #11)

-------

lstopo
Machine (47GB total)
  NUMANode L#0 (P#0 24GB) + Package L#0 + L3 L#0 (12MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#12)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#13)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#14)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#15)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
      PU L#8 (P#4)
      PU L#9 (P#16)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
      PU L#10 (P#5)
      PU L#11 (P#17)
  NUMANode L#1 (P#1 24GB) + Package L#1 + L3 L#1 (12MB)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#6)
      PU L#13 (P#18)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#19)
    L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
      PU L#16 (P#8)
      PU L#17 (P#20)
    L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
      PU L#18 (P#9)
      PU L#19 (P#21)
    L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
      PU L#20 (P#10)
      PU L#21 (P#22)
    L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
      PU L#22 (P#11)
      PU L#23 (P#23)
  HostBridge L#0
    PCIBridge
      PCI 1b4b:9123
        Block(Disk) L#0 "sdd"
        Block(Disk) L#1 "sde"
      PCI 1b4b:91a4
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 10de:13c0
        PCIBridge
          PCI 1000:0072
        PCIBridge
          PCI 1002:67b1
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1002:67b1
    PCIBridge
      2 x { PCI 197b:2363 }
    PCIBridge
      PCI 11ab:4380
        Net L#2 "eth0"
    PCIBridge
      PCI 11ab:4380
        Net L#3 "eth1"
    PCI 8086:3a22
      Block(Disk) L#4 "sdc"
      Block(Disk) L#5 "sdb"
--------

numactl output

numactl -s
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
cpubind: 0 1 
nodebind: 0 1 
membind: 0 1 
------

numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 24164 MB
node 0 free: 23133 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 24189 MB
node 1 free: 23396 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

numastat

numastat -c

Per-node numastat info (in MBs):
                Node 0 Node 1 Total
                ------ ------ -----
Numa_Hit          5840  10566 16406
Numa_Miss            0      0     0
Numa_Foreign         0      0     0
Interleave_Hit     576    579  1155
Local_Node        5838   9983 15821
Other_Node           3    583   586

 

Edited by bimmerman
Link to comment
  • 1 year later...

If I wanted to update my topo diagram...how would I go doing so? I made some changes but get an error from doing a topo a while ago.

 

ln -s /lib64/libudev.so.1 /lib64/libudev.so.0
ln: failed to create symbolic link '/lib64/libudev.so.0': File exists

 

So which file do I delete to generate the new topo?

 

 

 

Link to comment
  • 11 months later...

I tried this yesterday, and the YouTube links were removed. I came back here and saw an edit saying they were no longer needed as it was natively installed. The command still didn't work so I had to install the following packages with un-get.
 

Quote

hwloc
cairo
fontconfig
freetype
harfbuzz
libX11
libXau
libXdamage
libXdmcp
libXext
libXfixes
libXrender
libXxf86vm
libfontenc
libxcb
libxshmfence
mesa
libSM
libICE
graphite2


the following are already on the latest unraid version 6.11.5
 

Quote

libdrm
libpng


afterwords
 

Quote

un-get remove hwloc cairo fontconfig freetype harfbuzz libX11 libXau libXdamage libXdmcp libXext libXfixes libXrender libXxf86vm libfontenc libxcb libxshmfence mesa libSM libICE graphite2

 

and reboot the server. (following SpaceInvaderOne's video of course)

Edited by slimshizn
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.