bastl

February 17, 2019

@rix Try the following, Get some load on the GPU for example with the render test in GPUZ and run the following comand in unraid.

lspci -s 43:00.0 -vv | grep LnkSta:

Adjust it so it matches your GPU. 43:00.0 is my passed through 1080ti. 8GT/s is what you wanna see for a x16 Gen3 speed.

Mistake that everyone makes is to trust the link speeds GPUZ is reporting. Even if it's reporting x16 the Nvidia system info is the place shows it right.

gpu.png.70d782eb101de2e9db5b6bff6e402dc0.png

Another tool for testing is concBandwidthTest:

https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx

Run it from the comandline inside your VM and report back the values you get.

test.png.03fc428e737d6cf6dbc96f2b8e818d52.png

February 17, 2019

@xlucero1 You can't passthrough the audio device from group 15 as long as it isn't separated in it's own group. The ACS override option should have added an entry in your syslinux config already. Check your syslinux config, you can find it under main and click your flash device. Change it to the following and restart your server and check your system devices again.

pcie_acs_override=downstream,multifunction

February 17, 2019

The changes aren't that big in synthetic benchmarks like Timespy or Heaven. In FarCry 5 i saw more improvement. I guess the fact that a real game constantly streaming textures and stuff is what I see here. A synthetic benchmark loads all the stuff right at the beginning into memory. Games like Doom and FarCry at least feel smoother now. Below an overview what I've tested.

Test 1 was my original i440fx VM with some manual tweaks like numatune, emulatorpin and iothread set. For test 2 I created a fresh Q35 VM with the same corecounts, RAM, NVME, SSD, GPU as in test 1 and applied all the tweaks from the i440fx VM + the Qemu arguments at the end of the XML

  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
  </qemu:commandline>

Test 3 is a fresh Q35 VM with no manual tweaks. Only GPU and SSD/NVME passthrough, same cores and RAM as before without the Qemu arguments at the end. Test 4 is the same as test 3. I only added the Qemu part. And finally test 5 is basically test 2 with a couple tweaks.

In test 5 I changed the memory mode from 'preferred' to strict

    <memory mode='strict' nodeset='1'/>

added some changes in the hyperv section

    <hyperv>
      ...      
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
    </hyperv>

and I changed some parts of the EPYC fix

old:

  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <topology sockets='1' cores='7' threads='2'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
  </cpu>

new:

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC</model>
    <topology sockets='1' cores='7' threads='2'/>
    <cache level='3' mode='emulate'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
  </cpu>

CPUZ scores also looks pretty good now.

cpuZ.png.1c039c1457b38f9d6178fd2251a99cee.png

WIN10_NVME_UEFII_Q35_RC4.xml

February 16, 2019

All tests i did so far, on synthetic benchmarks like cinebench or heaven and superposition you can't see that much of a difference. I guess the fact, that the benchmark loads all shaders and textures at the begining is the reason. Testing FarCry5 the performance gain I see is bigger. I guess games that constantly loading stuff will benefit more from that patch. I think that needs a couple more tests

Edit:

I posted some test in another forum related to this.

February 16, 2019

@Nooke I'am currently testing a lot of different settings. Main VM runs still on i440fx and i have a second template as Q35 configured with the same devices passthrough as my main VM. So far it looks promising. Still trying to find the best settings for numa settings pinning etc. Will report back later.

February 16, 2019

Same for me. Test VM with a 1050ti and the Nvidia driver shows the correct PCIe Bus speeds.

nvidia.png.2479d0b6533ef80138885e033fa863a0.png

899456194_aidaGPUcompareRC3RC4.jpg.a85df5981df7bf2596ae44acf68bb701.jpg

February 16, 2019

Thanks @Jerky_san You basically added the qemu lines at the end. For me in a test VM the Nvidia driver reports the 1050ti as x16 Gen3 now, before only x1.

nvidia.png.5d5b84921a030d992f2be41dbff69152.png

Another thing i noticed, this is the first VM using the correct memory i have setup. Usually with strict no matter what, it always used a couple MB from the other node. Coincidence? Never saw that before.

<memory mode='strict' nodeset='0'/>

numastat.png.c56f46046fd2974af4baa1b0e16ff511.png

3 hours ago, limetech said:

Right, our testing didn't show much speed difference but maybe not configuring properly...

Looks like a slight improvment to me. 😂

2053056098_aidaGPUcompareRC3RC4.jpg.4301d543aaf68c3c7b49500afdabe839.jpg

Couple more tests will follow tomorrow. Thanks for adding that fix 👍

February 15, 2019

@Jerky_san can you post your xml for reference?

February 15, 2019

@blaine07 There are still no Mojave web drivers for Nvidia 10 series cards available. I have the EVGA 1050ti and passthrough works but no acceleration with this card without the drivers.

February 13, 2019

2 minutes ago, Jerky_san said:

build just for threadripper

Yeah, if I remember correctly they pushed the "ugly patch" into unraid before it was build into the kernel. Let's hope the devs still loving their Threadripper systems and playing around with them.

February 13, 2019

I tried a lot of things to improve the performance of my VMs the last couple days and stumbled across that level1tech forum as i guess like everybody here. Great in depth information and i hope limetech is able to push that fix to us unraid users as soon as possible 😉

GIVE US THE FIX NOOOOOOW

Just kiddin. Don't push features if they aren't tested in your product. Since I'am using Unraid, even with all the RC builds I tested (every public RC since early 2018) were stable for my needs. Sure there are always performance improvments possible often on the edge of stability. Always using the bleeding edge technology is fun, sure and for a techi nice to play with but for the general user often hard to handle. It's hard for @limetech and any over tech company to find a good middle way. I believe in you guys 👍

February 13, 2019

I don't think. I don't directly passthrough a nic. It's a virtual nic emulated by unraid. I guess it's the same for you.

February 13, 2019

If the storport.sys handles all the disk IO than maybe changing/tweaking the iothreadpin can bring improvements.

February 13, 2019

The <emulatorpin> tag specifies which host physical CPUs the emulator (a subset of a domain, not including vCPUs)
  will be pinned to. The <emulatorpin> tag provides a method of setting a precise affinity to emulator
  thread processes. As a result, vhost threads run on the same subset of physical CPUs and memory, and
  therefore benefit from cache locality.

@Tritech Does that mean emulatorpin outside the range of the already used vCPUs? I already have it set up for my main VM that the emupin cores are separated from the cores the VM uses, same die. Difficult if it's not your main language ^^

February 13, 2019

@Tritech Ok the "<numa>" tag is only if you have more vCPUs as one NUMA node has and you want to have a NUMA topology inside the VM. Let's say 2 cores from each node and than you can tell the vm which "virtual" node uses how much RAM. This should not affect us.

example from https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/virtualization_tuning_and_optimization_guide/

Btw. a really useful guide.


4 available nodes (0-3)
Node 0:	CPUs 0 4, size 4000 MiB
Node 1: CPUs 1 5, size 3999 MiB
Node 2: CPUs 2 6, size 4001 MiB
Node 3: CPUs 0 4, size 4005 MiB

In this scenario, use the following Domain XML setting:

<cputune>
	<vcpupin vcpu="0" cpuset="1"/>
	<vcpupin vcpu="1" cpuset="5"/>
	<vcpupin vcpu="2" cpuset="2"/>
	<vcpupin vcpu="3" cpuset="6"/>
</cputune>
<numatune>
  <memory mode="strict" nodeset="1-2"/> 
</numatune>
<cpu>
	<numa>
		<cell id="0" cpus="0-1" memory="3" unit="GiB"/>
		<cell id="1" cpus="2-3" memory="3" unit="GiB"/>
	</numa>
</cpu>

February 13, 2019

@Tritech

<memory mode='strict' nodeset='1'/>

Btw the numatune doesn't really work as it supposed to work. It always grabs RAM from the other node too. First VM "strict" 8GB from node0 and second "preferred" 16GB from node1. If I remeber right strict can cause issues when not enough RAM is available on the specified node and should cause an error but it doesn't for me. No glue how to fix this yet.

numastat.png.5dc67b440bee12291435c53f1dbc31bf.png

Edit:

Another thing i noticed in your xml

    <numa>
      <cell id='0' cpus='0-15' memory='16777216' unit='KiB'/>
    </numa>

Isn't that line telling the VM using 16GB from node0 for cores 0-15 where you using cores 8-15 and 24-32?

February 13, 2019

@Tritech That level1 forum is the one we talked about earlier btw 😂

I didn't had any time to test yet, but from what i read are some of these fixes available with qemu 3.2 and later defaults for 4.0. Not sure when we'll see this in unraid.

February 13, 2019

9 hours ago, Tritech said:

tell me where things are plugged into your rear USB

February 12, 2019

@Tritech With one of the earlier Agesa updates i thing end 2017 or early 2018 Amd changed something thats right. On the first BIOS version on my board (dec 2017) it reported the core pairings differently as on an update a couple months later. Depending on your BIOS version, i guess you will have already the changed one, lstopo showed it always correct and so does unraid. In earlier days people got confused cause everyone had different pairings.

February 12, 2019

These PCIE fixes gnif talked about aren't in yet. He suggested they will be implemented in 4.x as default and first shown in 3.2 and we are currently on 3.1 in the RC build.

1 Dec '18

Hi, is there any chance these patches reach qemu 3.1 release or other systems involved ?

gnif:
I believe they are trying to get them queued up for 3.1, and will default to full speed in Qemu 4.0. These patches will only apply to platforms that actually have PCIe such as Q35, i440fx is out of the question.

lessaj:
Went to do a new build and the patch set failed to apply, seems as of Dec 19 this patch set was committed to qemu master branch. Awesome!

gnif:
4.0 is when it will default to using the higher link speeds, last I read however the 3.2 and later builds have these patches but you must specify the link speed. I have not checked as I have been on break however and could have the versioning wrong :slight_smile:

February 12, 2019

@Nooke can you please link the level1techs forums entry? I can't find it.

February 11, 2019

Wait a second. I have 2 Realtek HD Audio Devices, I'am confused right now. Is the upper one part of the 1080ti? I thought the "NVIDIA HD Audio" belongs to the card. 🤔 The other one is the same driver as yours.

February 11, 2019

Another idea. What version is your installed audio driver? Just an idea, remebering some AMD GPU users had issues passing through there GPUs with the latest drivers. I have a kinda old driver installed which is working from day one until today.

1665722395_realtekhdaudio.png.5a723d210ffc03e7cf43ccc2ef8b542e.png

February 11, 2019

@Tritech I will check the next days if it's the newest BIOS causing it.

February 11, 2019

I already waited for that question 😂 I have that same error from the beginning.

Cannot reset device 0000:0a:00.3, depends on group 18 which is not owned.

Group 18 for me is the same as your group 17

[1022:1455] 0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function

I never tried to pass that trough because i never had audio issues with the onboard device inside the VM. The VM is running for almost 10 hours now with a online radio playing in the backround and not one single audio drop or lag.

bastl

Posts

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by bastl

QEMU PCIe Root Port Patch

VIDEO GUIDE The best way to install and setup a Windows 10 VM as a daily driver or a Gaming VM PART 1 AND PART 2 **

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

QEMU PCIe Root Port Patch

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

QEMU PCIe Root Port Patch

QEMU PCIe Root Port Patch

VIDEO GUIDE How to Install MacOS Mojave or High Sierra as a VM

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

QEMU PCIe Root Port Patch

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x