ASUS ROG Zenith Extreme Alpha X399


Recommended Posts

41 minutes ago, authorleon said:

Hello John,

 

I should give you an update what's happened with my install. Unfortunately I had some water damage so I had to go through the insurance company and buy a new motherboard and memory.

 

I have a few questions for you if I may:

 

  1. What version by us are you running. I have installed the latest one. Version 2001
  2. when installing a VM, it is installing incredibly slow. What are your settings regarding the BIOS and any other special settings. Please note I am using only a SSD at the moment. And I would like to get it working first before I try using the MVMe
  3. I am not using a second video card just yet. But when installing Windows 10 it is incredibly slow,  so I'm not quite sure what I'm doing wrong.

Thank you very much.

Fixed it. I had the OLD bios. LOL... 

 

Thanks

Link to comment

@jbartlett are you planning on upgrading to Threadripper 3 or staying with the 2990wx? Just wondering since if you are I want to read about how well it does. I jumped on the 2990wx early and kind of regret it due to its quirky performance with more than a single numa. Even tuning it is fairly difficult since Unraid seems to always want to allocate ram from numa 0 instead of balancing it like I put in the config.

Link to comment
26 minutes ago, Jerky_san said:

@jbartlett are you planning on upgrading to Threadripper 3 or staying with the 2990wx? Just wondering since if you are I want to read about how well it does. I jumped on the 2990wx early and kind of regret it due to its quirky performance with more than a single numa. Even tuning it is fairly difficult since Unraid seems to always want to allocate ram from numa 0 instead of balancing it like I put in the config.

Well that means also getting a new motherboard. Because of the new memory configuration in the CPU, a new board will be required. :((

Link to comment
13 hours ago, authorleon said:

Well that means also getting a new motherboard. Because of the new memory configuration in the CPU, a new board will be required. :((

While the socket size is identical to TR2, the TR3 socket has remapped several of the pins to optimize trace lengths.

 

13 hours ago, Jerky_san said:

@jbartlett are you planning on upgrading to Threadripper 3 or staying with the 2990wx? Just wondering since if you are I want to read about how well it does. I jumped on the 2990wx early and kind of regret it due to its quirky performance with more than a single numa. Even tuning it is fairly difficult since Unraid seems to always want to allocate ram from numa 0 instead of balancing it like I put in the config.

I haven't had any issues with the 2990WX chip. I've got eight Win10 VM's configured and running at the same time, one of which spans NUMA 0 & 2 (the ones with the direct PCIe/RAM connections). I had all 8 uploading a feed to YouTube at the same time. Cam 1 has 12 cores, Cam 2-4 has 4 cores, PiP Cams 1-4 all have 1 cores.

 

Do you have the VM's installed on spinners or SSD's? If it's spinners, that's likely your problem.

Link to comment

Hello John, 

 

As you have become somewhat of an expert on this matter and we have the same hardware I would like to ask you a question please if I may.

  1. PCI Slot 1 - GTX 1080Ti
  2. PCI Slot 2 - GTX 1080Ti
  3. PCI Slot 3 - GTX 1080Ti
  4. PCI Slot 4 - Elgato 4k 60 Pro Capture Card
  5. PCI Slot 5 - Black Magic Deck Link Duo 2

Do you think it is viable to have the following?

 

  1. VM 1 - Gaming (GTX 1080Ti) 8 Cores
  2. VM 2 - Video Editing (GTX 1080Ti) 24 Cores
  3. VM 3 - Streaming Machine (GTX 1080Ti, Elgato 4k 60 Pro Capture Card,Black Magic Deck Link Duo 2) 24 Cores
  4. VM 4 - * - Small automation Tasks ( The rest of the cores)

 

The reason I am asking this is that, I know that a lot of time will be needed to set this kind of system up. So that is why am curious to see if this is viable from your prospective?

 

If it is not viable, can you suggest a better setup, or should I just have two separate machines. 

 

Thank you very much.

64Core.PNG

Link to comment
19 minutes ago, authorleon said:

If it is not viable, can you suggest a better setup, or should I just have two separate machines. 

Not John but doesn't take John to notice it's not viable right off the bat. You need single-slot width cards for the middle 2 GPU and I don't remember any single-slot 1080Ti widely available (if at all outside of China). The card in your illustration are Zotac Mini which definitely is dual-slot.

(not wanting to sound harsh but if you don't know why dual-slot cards would not work in your config then you might want to watch a few more Youtube videos on PC building)

 

Also, you don't need 1080Ti for streaming nor video editing.

Unless there are separate people working for you for video editing and streaming, there's also no reason why you need separate VM's for video editing and streaming. I can see merit in having a separate gaming VM for Threadripper due to NUMA nodes but certainly not streaming / video editing.

Link to comment
15 minutes ago, testdasi said:

Not John but doesn't take John to notice it's not viable right off the bat. You need single-slot width cards for the middle 2 GPU and I don't remember any single-slot 1080Ti widely available (if at all outside of China). The card in your illustration are Zotac Mini which definitely is dual-slot.  (NO PROBLEM HERE - I have LONG PCIX Ribbons)

(not wanting to sound harsh  (Not at all) but if you don't know why dual-slot cards would not work in your config then you might want to watch a few more Youtube videos on PC building)   (NO PROBLEM HERE - I have LONG PCIX Ribbons, check out the Thermaltake Core P3 ATX)

 

Also, you don't need 1080Ti for streaming nor video editing.  (With the extra CUDA cores, it help as well as other applications we use, Furthermore, it is a good idea to use all the same cards for the custom water loop I am building. )

Unless there are separate people working for you for video editing and streaming, there's also no reason why you need separate VM's for video editing and streaming. I can see merit in having a separate gaming VM for Threadripper due to NUMA nodes but certainly not streaming / video editing.   (Multiple people working at the same time)

 

Thank you for the input.

Edited by authorleon
Link to comment
8 minutes ago, authorleon said:

 

Thank you for the input.

Ok, in that case, you will need to note the below:

  • You will certainly need ACS Override for it to work since I'm pretty sure the PCIe x4 slot is connected to the chipset so it's in the same IOMMU group as your LAN, Wifi etc. The x8 and M.2 slots tend to be together in the same group too so need ACS Override to break them out.
  • Since you are passing through a 1080Ti as primary (i.e. what Unraid booted with), also expect potential run in with error code 43. You can try to mitigate this by booting Unraid in legacy mode, vfio stubbing the GPU, turning off Hyper-V in the VM template and dump your own vbios and hopefully all is well.
  • Based on your plan, I'm assuming you are doing 2990WX, for which your core assignment isn't ideal. The 2990WX has 32 physical cores broken down into 4 chiplets (each chiplet has 2 CCX, each CCX has 4 cores), only 2 of the chiplets have PCIe and memory connection.
    • So your main gaming PC should use exactly 8 cores (1 full chiplet) that connect directly to your GPU PCIe slot for lowest latency. Anything else using the same chiplet while you game will introduce some variance to your frame rate, which can vary from unnoticeable to freaking annoying.
    • The remaining chiplet with PCIe connection needs 1 core reserved for Unraid (core 0 is almost always on chiplet 0, which almost always be the one having PCIe connection). So you are left with 7 cores with PCIe connection to split between your streaming and video-editing. I would suggest you give 3 cores (from a single CCX) to your video editing and  the remaining 4 cores (1 full CCX) to the streaming.
      • Alternatively, you can have 6 physical cores assigned to the streaming VM (evenly spread across 2 CCX) and the remaining 1 to the video editing VM. The latency is less noticeable with video editing than streaming (here I'm assuming you are doing live streaming on Twitch / Youtube).
    • The remaining chiplet cores (without PCIe and memory connection) can be split across VMs as you see fit but for best performance (especially for the streaming VM), you don't want to split chiplet across multiple VM's and within the same chiplet, you would want to split your cores as evenly across the multiple CCX as possible (to even out the load).
  • Need pictures once you are done with the custom loop. Would be a nice build.
Link to comment
Just now, testdasi said:

Ok, in that case, you will need to note the below:m (Thank you, So then this looks like it is a go-ahead and it should be viable.)

  • You will certainly need ACS Override for it to work since I'm pretty sure the PCIe x4 slot is connected to the chipset so it's in the same IOMMU group as your LAN, Wifi etc. The x8 and M.2 slots tend to be together in the same group too so need ACS Override to break them out. (Understood, Should I disable WIFI / USB 3.1 to save on system resources. In reality I'm not going to be hot plugging any USBs or anything like that to be honest.)
  • Since you are passing through a 1080Ti as primary (i.e. what Unraid booted with), also expect potential run in with error code 43. You can try to mitigate this by booting Unraid in legacy mode, vfio stubbing the GPU, turning off Hyper-V in the VM template and dump your own vbios and hopefully all is well. (Understood, before the water damage I did actually have two cards working perfectly with no error code 43. Two separate machines with one GTX 1080 and a 1080TI as well. Unfortunately after the water damage, the motherboard was completely destroyed as well as the USB stick. And I had no backup :( so I am building everything again from scratch.)
  • Based on your plan, I'm assuming you are doing 2990WX, for which your core assignment isn't ideal. The 2990WX has 32 physical cores broken down into 4 chiplets (each chiplet has 2 CCX, each CCX has 4 cores), only 2 of the chiplets have PCIe and memory connection. (Understood, Thank you for making that clear.)
    • So your main gaming PC should use exactly 8 cores (1 full chiplet) that connect directly to your GPU PCIe slot for lowest latency. Anything else using the same chiplet while you game will introduce some variance to your frame rate, which can vary from unnoticeable to freaking annoying. (Understood, Thank you for making that clear.)
    • The remaining chiplet with PCIe connection needs 1 core reserved for Unraid (core 0 is almost always on chiplet 0, which almost always be the one having PCIe connection). So you are left with 7 cores with PCIe connection to split between your streaming and video-editing. I would suggest you give 3 cores (from a single CCX) to your video editing and  the remaining 4 cores (1 full CCX) to the streaming. (I am good with this information in Excel, just so I have it clear in mind.)
      • Alternatively, you can have 6 physical cores assigned to the streaming VM (evenly spread across 2 CCX) and the remaining 1 to the video editing VM. The latency is less noticeable with video editing than streaming (here I'm assuming you are doing live streaming on Twitch / Youtube (Correct). (Understood)
    • The remaining chiplet cores (without PCIe and memory connection) can be split across VMs as you see fit but for best performance (especially for the streaming VM), you don't want to split chiplet across multiple VM's and within the same chiplet, you would want to split your cores as evenly across the multiple CCX as possible (to even out the load). (Understood)
  • Need pictures once you are done with the custom loop. Would be a nice build. (Of course)

 

At the moment it is just a temporary setup as you can see in the picture. I have this loop only to cool the CPU as I am building everything. Once everything is working and stable, I will then proceed to create the hard tubing dual loop system. Which has an extra advantage of exhausting the heat in the room where I am working to conserve electricity and take advantage of the heat. 

 

 

 

3fa025d2-a9e6-42e6-a560-5bbf6deebf18.jpg

Link to comment

No need to disable Wifi and USB 3.1. They barely use any resource if at all.

In fact, on the subject of USB:

  • The X399 chipset has 2x USB 3.0 controllers that can be passed through to your VM's. At the back of the mobo you will see 2 groups of 4x USB 3.0 ports that look to be obviously "together", so to speak, - those are the ones.
    • Even though you are not hot-plugging devices, I would suggest passing each controller to your gaming and streaming VM's (unless your video editing VM has issues with USB ports - someone reported on here about his external sound card errors out when connected the normal way i.e. not through passed-through USB controller).
    • This is especially important for the gaming VM. Passed-through USB controller has the lowest (albeit theoretical) latency regardless of devices.
  • The USB 3.1 controller cannot be passed through even with ACS Override. I have not had any success and have not seen any success story. IIRC, the USB 2.0 and internal ports are also connected to the 3.1 controller.
  • Connect your Unraid USB stick to the USB 2.0 port (you can get a internal to USB 2.0 adapter.
Link to comment
6 hours ago, jbartlett said:

While the socket size is identical to TR2, the TR3 socket has remapped several of the pins to optimize trace lengths.

 

I haven't had any issues with the 2990WX chip. I've got eight Win10 VM's configured and running at the same time, one of which spans NUMA 0 & 2 (the ones with the direct PCIe/RAM connections). I had all 8 uploading a feed to YouTube at the same time. Cam 1 has 12 cores, Cam 2-4 has 4 cores, PiP Cams 1-4 all have 1 cores.

 

Do you have the VM's installed on spinners or SSD's? If it's spinners, that's likely your problem.

I pass NVMEs to it.. I was more or less referring to spanning multiple NUMA without performance problems and some minor hitching in games though last night I think I made a break through. I was wondering what kind of performance you all get with your 2990wx's when spanning two NUMA?

Last night I was playing around with my secondary VM's config and I decided to work on the back-end numa presentation more. My problem has always been that I couldn't get memory to span properly between the two numa and I think I finally made a major breakthrough on that. Ironically when I started the machine I accidentally killed my current one because there wasn't enough ram on NUMA 0 to start the new VM with my config I set lol. So here is my documentation I took with it.. If you guys believe I made some progress then I'll write a better guide about it. Also the question was more because if you were going to get one or someone on here I was going to wait and see how good they perform before I consider jumping but the 80 platform interest me to if they support ECC.

 

Opinions welcome - please be gentle ^_^;

 

Configration

2900wx

Asus Zenith X399

128 gb of ram G.Skill samsung B-Die

 

20 Docker's running - Including Plex with 4 transcodes happening during the VM tests.

 

 

PDO - LVL 3 is enabled for all tests. - These tests for the VM reflect my settings that physically divide NUMA node memory by what I state. One node will always have slightly more due to Hyper visor overhead(I assume) - 117809 machine

image.png.018f81078184d1ec4cf0e5a77d0bfd95.png

 

Baremetal before ram OC

start.PNG.27177d350acaec6a54568aea9fe554a5.PNG

 

Ram is OC'd to 2800 14-14-14-34 CR1 with very tight timings. - Still Baremetal - Note Latency of memory changes to some degree between 69-74ns - same goes with read/writes

2800.PNG.34438e89d875e3dd2c0bfb76939c0abc.PNG

 

Last night with XML changes to backend NUMA presentation - L3 reads are low for some reason.. Not figured this out yet but could be that 1 of my cores is always being used for Unraid

1754414317_virtiochanges-MEMORY.PNG.e8329012f7c2d9c6a67db842d1e30dc8.PNG

 

CPUZ baremetal

CPUZ.PNG.b85f75639acce7221c1da82e68a625ce.PNG

 

CPUZ with 2 NUMA - 30 cores - 1 SMT core reserved for unraid Core 0/31 - Test taken while plex was being utilized on other cores so PBO was unable to OC as high I believe. NUMA 0(0/31 missing) & NUMA 2(Fully allocated)

1249919639_virtiochanges-CPUZ.PNG.8db779070f399c69d46de91504001312.PNG

Edited by Jerky_san
Link to comment
42 minutes ago, Jerky_san said:

I pass NVMEs to it.. I was more or less referring to spanning multiple NUMA without performance problems and some minor hitching in games though last night I think I made a break through.

 

...

 

Last night I was playing around with my secondary VM's config and I decided to work on the back-end numa presentation more. My problem has always been that I couldn't get memory to span properly between the two numa and I think I finally made a major breakthrough on that.

 

...

 

image.png.018f81078184d1ec4cf0e5a77d0bfd95.png

 

How did you do it please?

My workaround has been to start a dummy VM to reserve X amount of RAM from the node which then would force the actual VM to spread evenly across nodes.

Would love to not have to do that. LOL

 

Also with regards to NVMe, try turning on Hyper-V. As I recently found out, Hyper-V off slows down my NVMe SSD so perhaps it would help your case.

Link to comment
2 hours ago, testdasi said:

How did you do it please?

My workaround has been to start a dummy VM to reserve X amount of RAM from the node which then would force the actual VM to spread evenly across nodes.

Would love to not have to do that. LOL

 

Also with regards to NVMe, try turning on Hyper-V. As I recently found out, Hyper-V off slows down my NVMe SSD so perhaps it would help your case.

  <vcpu placement='static'>30</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='33'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='34'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='35'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='36'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='37'/>
    <vcpupin vcpu='10' cpuset='6'/>
    <vcpupin vcpu='11' cpuset='38'/>
    <vcpupin vcpu='12' cpuset='7'/>
    <vcpupin vcpu='13' cpuset='39'/>
    <vcpupin vcpu='14' cpuset='8'/>
    <vcpupin vcpu='15' cpuset='40'/>
    <vcpupin vcpu='16' cpuset='9'/>
    <vcpupin vcpu='17' cpuset='41'/>
    <vcpupin vcpu='18' cpuset='10'/>
    <vcpupin vcpu='19' cpuset='42'/>
    <vcpupin vcpu='20' cpuset='11'/>
    <vcpupin vcpu='21' cpuset='43'/>
    <vcpupin vcpu='22' cpuset='12'/>
    <vcpupin vcpu='23' cpuset='44'/>
    <vcpupin vcpu='24' cpuset='13'/>
    <vcpupin vcpu='25' cpuset='45'/>
    <vcpupin vcpu='26' cpuset='14'/>
    <vcpupin vcpu='27' cpuset='46'/>
    <vcpupin vcpu='28' cpuset='15'/>
    <vcpupin vcpu='29' cpuset='47'/>
    <emulatorpin cpuset='1-15,33-47'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0,2'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='2'/>
  </numatune>

The bottom part of the top code snippet is the part that does the back end <numatune> stuff then at the processor you need something like this below. The cell id's correspond to each other. The bottom is what the VM sees but it is also how the VM will be allocated resources

 

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC</model>
    <topology sockets='1' cores='15' threads='2'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
    <numa>
      <cell id='0' cpus='0-13' memory='16777216' unit='KiB'/>
      <cell id='1' cpus='14-29' memory='16777216' unit='KiB'/>
    </numa>
  </cpu>

 

Now to get more performance you might want to look at htop.. You'll notice the first core is always using lots of kernel if your using windows 10. To fix that change your <clock> to this below. This will really help your CPUZ score along with a noticeable boost in performance. You'll also notice the kernel in htop has gone away substantially. 

  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='yes'/>
  </clock>

Overtime I also did these below for Hyper-v

 <hyperv>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='KVM Hv'/>
      <frequencies state='on'/>
    </hyperv>

I was going to write a little guide on how to make Ryzen work as well as possible. If any of you that have 2990's and 2950's please tell me if these settings replicate into your VM's as well. I can tell you that my non fresh VM didn't take the CPU boosts nearly as well as my fresh test VM. This is within 5-10% of bare metal though which is very good in my book.

Edited by Jerky_san
  • Thanks 1
Link to comment
11 hours ago, authorleon said:

If it is not viable, can you suggest a better setup, or should I just have two separate machines.

You're definitely going to have to enable the PCIe override to even have a prayer. I didn't do any benchmarks with it on & off because I was able to utilize my use case with it off. Your performance may be less with the override on. Please refer to the first post on this thread for PCie slot/memory to NUMA slot assignments as well as what controllers can be passed through to a VM and if it needs an override or not along with quirks I've found to date.

 

9 hours ago, testdasi said:

IIRC, the USB 2.0 and internal ports are also connected to the 3.1 controller.

This is true. Note that I haven't been able to find which USB controller will let the onboard USB2 ports pass through.

 

Other notes:

The Infinityfabric for internode communication is fast enough on TR2 that having off-node graphics card will cause a slight CPU usage bump but no meaningful difference in GPU scores. I haven't tested off-node memory speeds but I have two VM's where one is on node and one off so I can do that.

 

I have a GeForce GT 1030 (single slot) for unraid to grab in slot #1 since the MB demands a GPU. I have a Quadro P2000 in slot #2 for my Main VM. I was able to pass through a USB3 PCIe card to a VM in slot #3 but none of the other slots worked with override=off. I was able to connect several Brio's and pass them through but I had the occasional video glitch with a 1080p@60 video feed which hinted that the data stream wasn't operating at a constant speed. This almost ruined my planned VM buildout until I discovered that I had no issues with passing through the onboard USB3 controllers that don't require an override. Not ideal because those USB3 ports only have one controller but it is what it is.

 

I have cores CPU's 1-59 isolated and while I do see CPU 0 bump that doesn't seem to mach the VM's task manager, I haven't noticed any impact yet. I still need to try out @Jerky_san clock/hyperv tweaks.

 

Here's my pinning layout. Cam 1 is running Livestream Studio and it exports multiple Brio's via NDI to the other VM's to upload to YouTube using OBS.

 

image.thumb.png.627cd26a1ec4cf657ac3f59b745170d7.png

 

Link to comment
2 hours ago, jbartlett said:

You're definitely going to have to enable the PCIe override to even have a prayer. I didn't do any benchmarks with it on & off because I was able to utilize my use case with it off. Your performance may be less with the override on. Please refer to the first post on this thread for PCie slot/memory to NUMA slot assignments as well as what controllers can be passed through to a VM and if it needs an override or not along with quirks I've found to date.

 

This is true. Note that I haven't been able to find which USB controller will let the onboard USB2 ports pass through.

 

Other notes:

The Infinityfabric for internode communication is fast enough on TR2 that having off-node graphics card will cause a slight CPU usage bump but no meaningful difference in GPU scores. I haven't tested off-node memory speeds but I have two VM's where one is on node and one off so I can do that.

 

I have a GeForce GT 1030 (single slot) for unraid to grab in slot #1 since the MB demands a GPU. I have a Quadro P2000 in slot #2 for my Main VM. I was able to pass through a USB3 PCIe card to a VM in slot #3 but none of the other slots worked with override=off. I was able to connect several Brio's and pass them through but I had the occasional video glitch with a 1080p@60 video feed which hinted that the data stream wasn't operating at a constant speed. This almost ruined my planned VM buildout until I discovered that I had no issues with passing through the onboard USB3 controllers that don't require an override. Not ideal because those USB3 ports only have one controller but it is what it is.

 

I have cores CPU's 1-59 isolated and while I do see CPU 0 bump that doesn't seem to mach the VM's task manager, I haven't noticed any impact yet. I still need to try out @Jerky_san clock/hyperv tweaks.

 

Here's my pinning layout. Cam 1 is running Livestream Studio and it exports multiple Brio's via NDI to the other VM's to upload to YouTube using OBS.

 

image.thumb.png.627cd26a1ec4cf657ac3f59b745170d7.png

 

So the bump comes from the 1st core assigned to your VM. If it's a windows 10 machine it seems to consume a lot of kernel in htop. The timer fixes will cause it to drop. You won't see the utilization in the VM itself as it's large overhead happening in the hypervisor level. What you will see though is higher cpuz tests as you notice it does multi thread then tests the first core in windows. I'm curious to know if others see the same cpu throughput in cpuz that I see.

 

Also I kind of wonder if its windows 10 version. I was going to make a video about the differences to show you but today I had updated to 1900 series and it seems the higher kernel usage is gone. So wonder if the scheduler changes brought other changes as well. I was running the 1800 series of windows yesterday when I was doing tweaks on this machine.

 

Captured videos though the difference I feel is not as defined

 

 

 

 

 

Edited by Jerky_san
Link to comment
19 minutes ago, jbartlett said:

I use the Hyperthreading workaround in my VM's, CPUZ hangs when it scans the CPUs.

Hmm what work around? I pass it through as an EPYC so it sees the L3 cache right and it does threading correctly that way as well. Also posted video of with and without but feel its not as defined. Both times the machine was just idle/logged in

 

Forgot to mention your watching Core 2 <- Sorry core 2.. I really need to not cook and post on forums..

Bare metal

1648056177_CPUZcache.PNG.c00a1575bd8923b176338fdda463a982.PNG

 

VM

image.png.329b0b5e605f0f1a0852f249bc9fbf3d.png

Edited by Jerky_san
Link to comment
17 hours ago, Jerky_san said:
17 hours ago, jbartlett said:

I use the Hyperthreading workaround in my VM's, CPUZ hangs when it scans the CPUs.

Hmm what work around? I pass it through as an EPYC so it sees the L3 cache right and it does threading correctly that way as well. Also posted video of with and without but feel its not as defined. Both times the machine was just idle/logged in

I have this block in my XML

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC</model>
    <topology sockets='1' cores='4' threads='2'/>
    <cache level='3' mode='emulate'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
  </cpu>

With this setup, likely driven mostly by the cpu mode/matach and model EPYC, CoreInfo shows the TR as hyperthreaded.

Logical to Physical Processor Map:
**------  Physical Processor 0 (Hyperthreaded)
--**----  Physical Processor 1 (Hyperthreaded)
----**--  Physical Processor 2 (Hyperthreaded)
------**  Physical Processor 3 (Hyperthreaded)

 

Link to comment
34 minutes ago, jbartlett said:

I have this block in my XML


  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC</model>
    <topology sockets='1' cores='4' threads='2'/>
    <cache level='3' mode='emulate'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
  </cpu>

With this setup, likely driven mostly by the cpu mode/matach and model EPYC, CoreInfo shows the TR as hyperthreaded.


Logical to Physical Processor Map:
**------  Physical Processor 0 (Hyperthreaded)
--**----  Physical Processor 1 (Hyperthreaded)
----**--  Physical Processor 2 (Hyperthreaded)
------**  Physical Processor 3 (Hyperthreaded)

 

That's what mine is but minus the cache level piece. It emulates it properly as long as it passes through as EPYC so I don't use it.

Link to comment

Hello all, 

 

I just wanted to give an update Regarding where I am:

  • VM 01
    • GTX 1080 with BIOS Dumpped Slot 1
    • 8GB RAM
    • 32 Cores 
    • Running Furmark and CPU at 100%
  • VM 02
    • GTX 1060 Slot 2
    • 8GB RAM
    • 32 Cores 
    • Running Furmark and CPU at 100%

 Ran both systems for 12 hours, no issue at all. 

 

The above is just a test scenario. Obviously there's a lot more consideration needs to be done regarding the CPUs, and a variety of other things. Today I should receive another GTX1060 where I can test all three video cards at the same time. 

 

I have not done any tweaking as such, however John's recommendation of resetting the BIOS really helped when the VM's were fairly slow and sluggish. 

 

Without UNRAID, I have overclocked the CPU to 3.6 GHz on all cores stable. However after resetting the BIOS to optimal defaults, and changing the CPU ratio from auto to 36. All the VMs become sluggish again. I am not sure why this is the case. Any Insights are more than welcomed.

 

Next steps:

 

  • Figure out the USB configuration, which USB controllers should be passed through. Maybe use of a HUB
  • Tester video capture card, and see if it is viable
  • Pin down the appropriate cause to the appropriate VM's
  • benchmark systems accordingly and compare VM to bare metal performance

 

Thank you all for your support. It is really appreciated!

 

 

 

3862c758-caa3-4d75-8d22-e0d6d4bcb7ed.jpg

Link to comment
4 hours ago, authorleon said:

I have not done any tweaking as such, however John's recommendation of resetting the BIOS really helped when the VM's were fairly slow and sluggish. 

 

Without UNRAID, I have overclocked the CPU to 3.6 GHz on all cores stable. However after resetting the BIOS to optimal defaults, and changing the CPU ratio from auto to 36. All the VMs become sluggish again. I am not sure why this is the case. Any Insights are more than welcomed.

 

Try these, hope it helps with the sluggishness.

  • Enable Global C State Control in the BIOS. (Enable, NOT Auto)
  • Create a new template with Hyper-V = Yes, Machine Type = Q35-4.0.1 and the rest of the config the same. Untick Start VM and Save. Then edit in xml mode:
    • Change the <hyperv> ... </hyperv> block of codes to this: 
          <hyperv>
            <relaxed state='on'/>
            <vapic state='on'/>
            <spinlocks state='on' retries='8191'/>
            <vpindex state='on'/>
            <synic state='on'/>
            <stimer state='on'/>
            <reset state='on'/>
            <vendor_id state='on' value='0123456789ab'/>
            <frequencies state='on'/>
          </hyperv>
    • Add this bit above </features>
          <kvm>
            <hidden state='on'/>
          </kvm>
          <ioapic driver='kvm'/>
    • Change the <clock> ... </clock> block of code to this:
        <clock offset='localtime'>
          <timer name='hypervclock' present='yes'/>
          <timer name='hpet' present='yes'/>
        </clock>

You will also need to watch SpaceInvaderOne vid on lstopo so you assign the right cores to the right PCIe slot. Always leave core 0 free.

Edited by testdasi
Link to comment

(Ref "on node": Numa 0 & 2 with direct access to PCI & RAM. "off node": Numa 1 or 3)

 

Some interesting findings I had last night. I had three VM's running, Cam1 (hogging numa 0 & 2) had four Brios connected, each set to output the cam over NDI.

Cam2 running OBS taking in a NDI feed from Cam1. CPU % steady.

When I started OBS on Cam3 taking in a NDI feed from Cam1, Cam2's CPU utilization jumped. When I closed OBS on Cam3, the CPU % on Cam2 returned back. The effect could also be seen in reverse.

 

Cam2 & Cam3 were running "off node" with accessing the memory. I just ran a benchmark and showed that memory latencies as well as read/write/copy times were negatively affected by around 50% if the memory access had to utilize the Infinityfabric of the TR2.

Link to comment
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.