Jump to content

bastl

Members
  • Posts

    1,267
  • Joined

  • Last visited

  • Days Won

    3

Posts posted by bastl

  1. @rix Try the following, Get some load on the GPU for example with the render test in GPUZ and run the following comand in unraid.

    lspci -s 43:00.0 -vv | grep LnkSta:

    Adjust it so it matches your GPU. 43:00.0 is my passed through 1080ti. 8GT/s is what you wanna see for a x16 Gen3 speed.

     

    lspci.thumb.png.be01d58c64caa8e1b17efd4de6a2778a.png

     

    Mistake that everyone makes is to trust the link speeds GPUZ is reporting. Even if it's reporting x16 the Nvidia system info is the place shows it right. 

    gpu.png.70d782eb101de2e9db5b6bff6e402dc0.png

     

    Another tool for testing is concBandwidthTest:

     

    https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx

     

    Run it from the comandline inside your VM and report back the values you get.

    test.png.03fc428e737d6cf6dbc96f2b8e818d52.png

     

     

    • Like 1
  2. @xlucero1 You can't passthrough the audio device from group 15 as long as it isn't separated in it's own group. The ACS override option should have added an entry in your syslinux config already. Check your syslinux config, you can find it under main and click your flash device. Change it to the following and restart your server and check your system devices again.

    pcie_acs_override=downstream,multifunction

     

  3. The changes aren't that big in synthetic benchmarks like Timespy or Heaven. In FarCry 5 i saw more improvement. I guess the fact that a real game constantly streaming textures and stuff is what I see here. A synthetic benchmark loads all the stuff right at the beginning into memory. Games like Doom and FarCry at least feel smoother now. Below an overview what I've tested.

     

    summary.thumb.png.47d92f7bc87088584be758888fb78403.png

     

    Test 1 was my original i440fx VM with some manual tweaks like numatune, emulatorpin and iothread set. For test 2 I created a fresh Q35 VM with the same corecounts, RAM, NVME, SSD, GPU as in test 1 and applied all the tweaks from the i440fx VM + the Qemu arguments at the end of the XML

      <qemu:commandline>
        <qemu:arg value='-global'/>
        <qemu:arg value='pcie-root-port.speed=8'/>
        <qemu:arg value='-global'/>
        <qemu:arg value='pcie-root-port.width=16'/>
      </qemu:commandline>

    Test 3 is a fresh Q35 VM with no manual tweaks. Only GPU and SSD/NVME passthrough, same cores and RAM as before without the Qemu arguments at the end. Test 4 is the same as test 3. I only added the Qemu part. And finally test 5 is basically test 2 with a couple tweaks.

     

    In test 5 I changed the memory mode from 'preferred' to strict

        <memory mode='strict' nodeset='1'/>

    added some changes in the hyperv section

        <hyperv>
          ...      
          <vpindex state='on'/>
          <synic state='on'/>
          <stimer state='on'/>
          <reset state='on'/>
        </hyperv>

    and I changed some parts of the EPYC fix

    old:

      <cpu mode='custom' match='exact' check='partial'>
        <model fallback='forbid'>EPYC-IBPB</model>
        <topology sockets='1' cores='7' threads='2'/>
        <feature policy='require' name='topoext'/>
        <feature policy='disable' name='monitor'/>
        <feature policy='require' name='x2apic'/>
        <feature policy='require' name='hypervisor'/>
        <feature policy='disable' name='svm'/>
      </cpu>

    new:

      <cpu mode='custom' match='exact' check='full'>
        <model fallback='forbid'>EPYC</model>
        <topology sockets='1' cores='7' threads='2'/>
        <cache level='3' mode='emulate'/>
        <feature policy='require' name='topoext'/>
        <feature policy='disable' name='monitor'/>
        <feature policy='require' name='hypervisor'/>
        <feature policy='disable' name='svm'/>
        <feature policy='disable' name='x2apic'/>
      </cpu>

     

    CPUZ scores also looks pretty good now. 

    cpuZ.png.1c039c1457b38f9d6178fd2251a99cee.png

     

    WIN10_NVME_UEFII_Q35_RC4.xml

     

     

  4. All tests i did so far, on synthetic benchmarks like cinebench or heaven and superposition you can't see that much of a difference. I guess the fact, that the benchmark loads all shaders and textures at the begining is the reason. Testing FarCry5 the performance gain I see is bigger. I guess games that constantly loading stuff will benefit more from that patch. I think that needs a couple more tests

     

    Edit: 

    I posted some test in another forum related to this.

     

  5. Thanks @Jerky_san  You basically added the qemu lines at the end. For me in a test VM the Nvidia driver reports the 1050ti as x16 Gen3 now, before only x1.

     

    nvidia.png.5d5b84921a030d992f2be41dbff69152.png

     

    Another thing i noticed, this is the first VM using the correct memory i have setup. Usually with strict no matter what, it always used a couple MB from the other node. Coincidence? Never saw that before.

    <memory mode='strict' nodeset='0'/>

    numastat.png.c56f46046fd2974af4baa1b0e16ff511.png

     

     

    3 hours ago, limetech said:

    Right, our testing didn't show much speed difference but maybe not configuring properly...

    Looks like a slight improvment to me. 😂

     

    2053056098_aidaGPUcompareRC3RC4.jpg.4301d543aaf68c3c7b49500afdabe839.jpg

     

    Couple more tests will follow tomorrow. Thanks for adding that fix 👍

  6. I tried a lot of things to improve the performance of my VMs the last couple days and stumbled across that level1tech forum as i guess like everybody here. Great in depth information and i hope limetech is able to push that fix to us unraid users as soon as possible 😉

     

    GIVE US THE FIX NOOOOOOW

     

    Just kiddin. Don't push features if they aren't tested in your product. Since I'am using Unraid, even with all the RC builds I tested (every public RC since early 2018) were stable for my needs. Sure there are always performance improvments possible often on the edge of stability. Always using the bleeding edge technology is fun, sure and for a techi nice to play with but for the general user often hard to handle. It's hard for @limetech and any over tech company to find a good middle way. I believe in you guys 👍

  7. The <emulatorpin> tag specifies which host physical CPUs the emulator (a subset of a domain, not including vCPUs)
      will be pinned to. The <emulatorpin> tag provides a method of setting a precise affinity to emulator
      thread processes. As a result, vhost threads run on the same subset of physical CPUs and memory, and
      therefore benefit from cache locality.

    @Tritech Does that mean emulatorpin outside the range of the already used vCPUs? I already have it set up for my main VM that the emupin cores are separated from the cores the VM uses, same die. Difficult if it's not your main language ^^

  8. @Tritech Ok the "<numa>" tag is only if you have more vCPUs as one NUMA node has and you want to have a NUMA topology inside the VM. Let's say 2 cores from each node and than you can tell the vm which "virtual" node uses how much RAM. This should not affect us. 

     

    example from https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/virtualization_tuning_and_optimization_guide/

    Btw. a really useful guide.

    
    4 available nodes (0-3)
    Node 0:	CPUs 0 4, size 4000 MiB
    Node 1: CPUs 1 5, size 3999 MiB
    Node 2: CPUs 2 6, size 4001 MiB
    Node 3: CPUs 0 4, size 4005 MiB
    
    In this scenario, use the following Domain XML setting:
    
    <cputune>
    	<vcpupin vcpu="0" cpuset="1"/>
    	<vcpupin vcpu="1" cpuset="5"/>
    	<vcpupin vcpu="2" cpuset="2"/>
    	<vcpupin vcpu="3" cpuset="6"/>
    </cputune>
    <numatune>
      <memory mode="strict" nodeset="1-2"/> 
    </numatune>
    <cpu>
    	<numa>
    		<cell id="0" cpus="0-1" memory="3" unit="GiB"/>
    		<cell id="1" cpus="2-3" memory="3" unit="GiB"/>
    	</numa>
    </cpu>

     

  9. @Tritech 

    <memory mode='strict' nodeset='1'/>

    Btw the numatune doesn't really work as it supposed to work. It always grabs RAM from the other node too. First VM "strict" 8GB from node0 and second "preferred" 16GB from node1. If I remeber right strict can cause issues when not enough RAM is available on the specified node and should cause an error but it doesn't for me. No glue how to fix this yet.

    numastat.png.5dc67b440bee12291435c53f1dbc31bf.png

     

     

    Edit:

    Another thing i noticed in your xml

        <numa>
          <cell id='0' cpus='0-15' memory='16777216' unit='KiB'/>
        </numa>

    Isn't that line telling the VM using 16GB from node0 for cores 0-15 where you using cores 8-15 and 24-32?

  10. @Tritech With one of the earlier Agesa updates i thing end 2017 or early 2018 Amd changed something thats right. On the first BIOS version on my board (dec 2017) it reported the core pairings differently as on an update a couple months later. Depending on your BIOS version, i guess you will have already the changed one, lstopo showed it always correct and so does unraid. In earlier days people got confused cause everyone had different pairings.

  11. These PCIE fixes gnif talked about aren't in yet. He suggested they will be implemented in 4.x as default and first shown in 3.2 and we are currently on 3.1 in the RC build.

    1 Dec '18
    
    Hi, is there any chance these patches reach qemu 3.1 release or other systems involved ?
    
    gnif:
    I believe they are trying to get them queued up for 3.1, and will default to full speed in Qemu 4.0. These patches will only apply to platforms that actually have PCIe such as Q35, i440fx is out of the question.

     

    lessaj:
    Went to do a new build and the patch set failed to apply, seems as of Dec 19 this patch set was committed to qemu master branch. Awesome!

     

    gnif:
    4.0 is when it will default to using the higher link speeds, last I read however the 3.2 and later builds have these patches but you must specify the link speed. I have not checked as I have been on break however and could have the versioning wrong :slight_smile:

     

  12. I already waited for that question 😂 I have that same error from the beginning.

    Cannot reset device 0000:0a:00.3, depends on group 18 which is not owned.

    Group 18 for me is the same as your group 17

    [1022:1455] 0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function

    I never tried to pass that trough because i never had audio issues with the onboard device inside the VM. The VM is running for almost 10 hours now with a online radio playing in the backround and not one single audio drop or lag. 

×
×
  • Create New...