6.6.6. Windows 10 VM Performance unusable


Recommended Posts

Hi there,

So I began my unraid journey about 2 Months ago, installing it on my system:

 

Threadripper 1920x

32 GB 2666mhz DDR4 Ram

3*2 TB Storage/Parity drives

1*1 TB old Storage drive

1*500GB NVME Samsung evo 870 Cache Drive

2* RX 580 8GB GPU

 

It ran pretty good up until my Cache drive began overflowing.

 

I set it up with 2 Windows VMs with one dedicated GPU each, 10 Cores each, Cores 0,1,12,13 left for unraid, 16GB Ram for one VM and 10 for the other, rest for Overhead/Unraid.

 

so a few days ago I started getting strange performance issues, like VM's crashing or lagging horribly. I've had those problems before with a full cache drive, so I tried removing data from it, and added a second cache drive.

 

I added a 250GB sata ssd as a second cache drive. which completely fucked up my cache.

 

I moved all files to the Data Storage Drives, and did a "new config" in tools, going back to my original configuration.

 

Since then I can't get the performance back as it used to be, and get constant issues like:

 

2VM's:

 

Cores 0-23 - Unraid Numbering

 

VM 1 Core 2-12

VM 2 Core 15-23

VM 1 10 GB Ram

VM 2 16 GB Ram

 

VM1 (PCIe Slot 1 - Display 1):

 

Won't boot. says it boots, but doens't show image on screen at all. I accidentially deleted the Image of this VM while tinkering, so I will set it up again some time, adding how it goes to this topic.

 

VM2 (PCIe Slot 3 - Displays 2 and 3)

This is my "Daily Driver", or should be.

It boots, slower than before, even after deleting and recreating the libvirt.img on cache, moving image to cache.

Core 15 (the first core) shows 80-90% Usage constantly, spiking to 100% when moving the mouse, playing audio, anything. 

Stuttering, lagging, 2-5 Second freezes, Agressive Audio Modulation, Blue Screens with Kernel Security Check Failures.

 

If I add core 13 and 14 to VM 2 it gets a bit better, less pinning, but still some audio problems persist and games are unplayable.

 

This feels like my Ram is allocated wrong, since TR has like different Ram lanes or something for each of the dies. 

I feel like I should just use 1 Die, 2 Sticks of Ram and 1 GPU that are all directly connected for best performance, but I don't know how?

 

Please Wizards of the Unraid Forum, help me get my Machine back running.

I just saw that moving, removing and what not of my docker folder broke my openvpn server so I can't access the machine remotely at this second, but I will upload the Diagnostics .zip as soon as I get home.

 

Link to comment
On 5/7/2019 at 1:23 PM, n3rf said:

Hi there,

So I began my unraid journey about 2 Months ago, installing it on my system:

 

Threadripper 1920x

32 GB 2666mhz DDR4 Ram

3*2 TB Storage/Parity drives

1*1 TB old Storage drive

1*500GB NVME Samsung evo 870 Cache Drive

2* RX 580 8GB GPU

 

It ran pretty good up until my Cache drive began overflowing.

 

I set it up with 2 Windows VMs with one dedicated GPU each, 10 Cores each, Cores 0,1,12,13 left for unraid, 16GB Ram for one VM and 10 for the other, rest for Overhead/Unraid.

 

so a few days ago I started getting strange performance issues, like VM's crashing or lagging horribly. I've had those problems before with a full cache drive, so I tried removing data from it, and added a second cache drive.

 

I added a 250GB sata ssd as a second cache drive. which completely fucked up my cache.

 

I moved all files to the Data Storage Drives, and did a "new config" in tools, going back to my original configuration.

 

Since then I can't get the performance back as it used to be, and get constant issues like:

 

2VM's:

 

Cores 0-23 - Unraid Numbering

 

VM 1 Core 2-12

VM 2 Core 15-23

VM 1 10 GB Ram

VM 2 16 GB Ram

 

VM1 (PCIe Slot 1 - Display 1):

 

Won't boot. says it boots, but doens't show image on screen at all. I accidentially deleted the Image of this VM while tinkering, so I will set it up again some time, adding how it goes to this topic.

 

VM2 (PCIe Slot 3 - Displays 2 and 3)

This is my "Daily Driver", or should be.

It boots, slower than before, even after deleting and recreating the libvirt.img on cache, moving image to cache.

Core 15 (the first core) shows 80-90% Usage constantly, spiking to 100% when moving the mouse, playing audio, anything. 

Stuttering, lagging, 2-5 Second freezes, Agressive Audio Modulation, Blue Screens with Kernel Security Check Failures.

 

If I add core 13 and 14 to VM 2 it gets a bit better, less pinning, but still some audio problems persist and games are unplayable.

 

This feels like my Ram is allocated wrong, since TR has like different Ram lanes or something for each of the dies. 

I feel like I should just use 1 Die, 2 Sticks of Ram and 1 GPU that are all directly connected for best performance, but I don't know how?

 

Please Wizards of the Unraid Forum, help me get my Machine back running.

I just saw that moving, removing and what not of my docker folder broke my openvpn server so I can't access the machine remotely at this second, but I will upload the Diagnostics .zip as soon as I get home.

 

Which Bios are you using for the vms?

are you pinning real cores with there corresponding threads?

are you pinning any other dockers running to cores?

is your ram fully populated if not do you have them in the right slots?

do you have domains set to cache only?

have you tried picking alternate cores so its picking equal amounts from each chip? if its a ram issue this might worth a shot

are you running stock unriad or the nvidia build?

have you tried 6.6.7?

 

Edited by nicksphone
typo
Link to comment
Just now, nicksphone said:

Which Bios are you using for the vms?

are you pinning real cores with there corresponding threads?

are you pinning any other dockers running to cores?

 is your ram fully populated if not do you have them in the right slots?

 do you have domains set to cache only?

 have you tried picking alternate cores so its picking equal amounts from each chip? if its a ram issue this might worth a shot

 are you running stock unriad or the nvidia build?

 have you tried 6.6.7?

 

1.SeaBios because my RX580s won't boot with uefi

2.I didn't before, and I have come a bit further,but now I have a different Issue

3.no

4. 8 Slots, 4 populated in the correct slots.

5. yes

6.I don't think it's a ram Issue anymore. 

7. Stock

8. Will upgrade later.

 

 

So the VM with the Card in PCIe Slot 1 Works fine now.

but the second VM, works fine in windows, but as soon as I begin Gaming, weirdness starts:

so when I play apex legends it works fine for a while, but after some time the Game just drops frames left right and center, like 1 frame every 20 seconds and distorted repeating sound until I quit the game, where windows calms down after a while and everything works again.

 

I ran lstopo and It just shows my CPU as a package opposed to single Numa nodes like in 

 

 

which kinda makes me think I have to tinker with the bios.

Help appreciated.

I added my diagnostics zip.

Red: 1st GPU, Orange 2nd GPU, weird how they show up 2 times right?

lstopo.png

tower-diagnostics-20190508-1143.zip

 

 

During a Freeze or whatever in the game this happens. the vm is running on 6/18,7/19,8/20.

 

photo_2019-05-08_13-17-14.jpg

Edited by n3rf
Link to comment
14 hours ago, n3rf said:

Pin your dockers to a core + thread so they don't randomly pick a core your using.

Have you disabled c-states in the MB bios?

while you wait for someone with more knowledge than me on this as i only have my experience with ryzen to go off try did you watch this video?

 If not try making another Vm following this guide just turn off the problem one and try his way see if the issue is still there if it only need 20 gigs of space to test.

 

 

 

Link to comment
Quote

Pin your dockers to a core + thread so they don't randomly pick a core your using.

Have you disabled c-states in the MB bios?

 while you wait for someone with more knowledge than me on this as i only have my experience with ryzen to go off try did you watch this video?

I found something more mysterious:

 

When I switch the VM's, or set up a new one, the problem persists on the VM with GPU in Slot 3, even if i switch out GPUs, it's somehow connected to the slot.

 

Will disable C-states when i come home.

 

I saw a weird setting in my Bios (Using newest Version of Asus Prime X399-A Bios) that allowed me to set a PCI Bus that was on PCI slot 3 and I could chose between 1 and 3, so I set it on one, but didn't have time to test it. will Update tonight.

 

I have now Isolated the Cores that I use for the VM from Unraid, and I am not using any Docker Containers at this moment.

 

 

 

 

Edited by n3rf
Link to comment
47 minutes ago, bastl said:

@n3rf Check you BIOS settings for "memory interleaving" and set it to channel and run LSTOPO again, to see which PCIE device is connected to which die. Try to use only the cores from the specific die where your GPU is directly connected to. 

 

During Googling on How to enable this on my board, I Stumbled upon this thread: 

 

Which i will work through tonight, which I think might solve my Problem. Will update.

Link to comment
4 hours ago, n3rf said:

I found something more mysterious:

 

When I switch the VM's, or set up a new one, the problem persists on the VM with GPU in Slot 3, even if i switch out GPUs, it's somehow connected to the slot.

 

Is your 3rd slot a 16x slot? most 3rd slots are 1x that might be you issue. you can add this to your boot so unraid will ignore your cards so you have have them in slot 1 and 2/

install config file editor from apps and select /boot/syslinux/syslinux.cfg

in your default boot section 

after append add this for each one vfio-pci.ids=XXXX:XXXX you can get the numbers for the x from the iommu group should be in [xxxx:xxxx] since you have 2 put them both in so your defalt boot will look somthing like this 

menu default
  kernel /bzimage
  append vfio-pci.ids=1234:5678 vfio-pci.ids=8765:4321 pcie_acs_override=downstream,multifunction  initrd=/bzroot,/bzroot

 

Edited by nicksphone
missed the v from vfio
Link to comment
5 hours ago, n3rf said:

During Googling on How to enable this on my board, I Stumbled upon this thread: 

 

Which i will work through tonight, which I think might solve my Problem. Will update.

I got the notification about linked content and I'm real glad that people are using this.  It's become quite a good resource.  

 

Due to the architecture of Ryzen, there are PLENTY of tidbits that need to be in a row to reduce inter-die communication, and therefore latency.  Making sure you're accessing memory modules directly linked to your die, using only cores on your die (or even just on the same CCX), what PCIe lanes go where, etc.

 

It looks like a lot of ground has been covered since your initial post, with cross-die core allocations and nebulous GPU performance.  It seems like you've corrected quite a few problems.  

 

To add to your solutions, I'm going to recommend some basic architecture decisions that worked for me in the past (on my 1950x, so grain of salt):

Try to reduce vms to all cores on one CCX, and then expand from there.

Try to get VMs on different dies.

Try to leave 10-20% of your total memory to Unraid, then scale up from there.

Numa-ize your memory.  Try to get 2 nodes and assign node 0 to die 0 operations (like VM 1) and node 1 to die 0 to VM 2.

Reallocate disks for your VMs.  Try passing them through.  

Most of all, experiment.

Foot my plane ticket, and I'll look alongside you ;)

Link to comment
Just now, thenonsense said:

I got the notification about linked content and I'm real glad that people are using this.  It's become quite a good resource.  

 

Due to the architecture of Ryzen, there are PLENTY of tidbits that need to be in a row to reduce inter-die communication, and therefore latency.  Making sure you're accessing memory modules directly linked to your die, using only cores on your die (or even just on the same CCX), what PCIe lanes go where, etc.

 

It looks like a lot of ground has been covered since your initial post, with cross-die core allocations and nebulous GPU performance.  It seems like you've corrected quite a few problems.  

 

To add to your solutions, I'm going to recommend some basic architecture decisions that worked for me in the past (on my 1950x, so grain of salt):

Try to reduce vms to all cores on one CCX, and then expand from there.

Try to get VMs on different dies.

Try to leave 10-20% of your total memory to Unraid, then scale up from there.

Numa-ize your memory.  Try to get 2 nodes and assign node 0 to die 0 operations (like VM 1) and node 1 to die 0 to VM 2.

Reallocate disks for your VMs.  Try passing them through.  

Most of all, experiment.

Foot my plane ticket, and I'll look alongside you ;)

 

So i edited my xml to include the epyc workaround, changed my interleave to channel and added numatune with strict node 1, my vm running on only one CCX, Cores 6,7,8,18,19,20 and still get the same crashes in that game. but only that game. I got greatly improved performance in general though which is nice.

Link to comment

So I'm at my witts end here.

The last two things I can imagine being left is something with the cache or something with the virtual lan adapter.

I get this sign during a crash: l9rd33tsyek21.jpg

 

While the other VM is just running fine.

If i manage to quit the game while it's eating up ressources, everything returns to normal after a few seconds.

 

So I will pass through a SSD that I will try to connect to the correct controller and pass it through directly. Ugh I really thought i wouldn't need to do this at all.

 

Nobody else on Unraid Forum experiences this type of crash in this game.

Other games run fine.

On one VM it runs fine.

WTF?

the only difference is the PCIe slot of the GPU.

as soon as i switch the vms, everything runs fine, even with Cores assigned to the VM that don't even connect properly to the first pcie slot.

Considering adding a bounty or something because this is annoying as hell.

 

rcu_nocbs=0-23 is an option I'm considering adding, but I do not fully understand it.

 

also this thread:

which seems like it has some interesting ideas.

Edited by n3rf
Link to comment

The only game/software specific issues i had so far is with 3DMark benchmarks. At the initial launch it checks für the system specs and on older Unraid versions including 6.6x it hangs and never finished. With the PCIE-root-fix patch implemented in the 6.7-RC versions and the ability to now have the correct pci link speeds reported to the OS 3DMark runs fine now. All i've done was to update to the 6.7 RC, switched the VM to Q35 machine type and added some extra qemu config at  the end of the xml and Nvidia's system info is reporting the correct speeds now and 3Dmark is working. 

 

image.png.55bbf8a2c12f8ca9d36536a8cd65559a.png


  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
  </qemu:commandline>
</domain>

If you're on the current RC version that's maybe something you can try.

 

Another question, are you passing through a physical nic to the VM? @n3rf

Link to comment
2 minutes ago, bastl said:

The only game/software specific issues i had so far is with 3DMark benchmarks. At the initial launch it checks für the system specs and on older Unraid versions including 6.6x it hangs and never finished. With the PCIE-root-fix patch implemented in the 6.7-RC versions and the ability to now have the correct pci link speeds reported to the OS 3DMark runs fine now. All i've done was to update to the 6.7 RC, switched the VM to Q35 machine type and added some extra qemu config at  the end of the xml and Nvidia's system info is reporting the correct speeds now and 3Dmark is working

  

image.png.55bbf8a2c12f8ca9d36536a8cd65559a.png



  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
  </qemu:commandline>
</domain>

If you're on the current RC version that's maybe something you can try.

 

Another question, are you passing through a physical nic to the VM? @n3rf

I'm not passing through a nic.

 

I'm on 6.6.7. I'll try to get to 6.7 RC tonight and add the lines.

Link to comment

So I'm no on 6.7 RC (looking nice btw, Good Job LT!)

I disabled C-States.

This is my VM XML:

 

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='1' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>main</name>
  <uuid>c67a6204-9abf-2b82-239f-d7b5cb5d1a79</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>10</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='7'/>
    <vcpupin vcpu='1' cpuset='19'/>
    <vcpupin vcpu='2' cpuset='8'/>
    <vcpupin vcpu='3' cpuset='20'/>
    <vcpupin vcpu='4' cpuset='9'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='10'/>
    <vcpupin vcpu='7' cpuset='22'/>
    <vcpupin vcpu='8' cpuset='11'/>
    <vcpupin vcpu='9' cpuset='23'/>
    <emulatorpin cpuset='6,18'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <topology sockets='1' cores='5' threads='2'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/domains/sea 2/vdisk1.img'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/Games/fresh sea/vdisk2.img'/>
      <backingStore/>
      <target dev='hdd' bus='virtio'/>
      <alias name='virtio-disk3'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:45:7f:d1'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-main/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
  </qemu:commandline>
</domain>
 

 

This is my append:

 

append vfio-pci.ids=1022:43ba pcie_acs_override=downstream,multifunction initrd=/bzroot isolcpus=1-11,13-23 rcu_nocbs=0-23 video=efifb:off

 

 

 

Issue persists.

 

I started Timing it, and it seems like it's pretty much always at 15 minutes. Will continue to time, because I think it's pretty acurate (I timed 2 times, once a few seconds after clicking on the shortcut: 14:45 - Freeze. Second time, I started the timer a few seconds before clicking on the shortcut 15:13 - Freeze

maybe one of the OS's is doing something then or the Game is doing something at this time that is interfering with something.

 

I doubt it's a GPU issue now. I get like 100% GPU Usage and my fps are pretty much comparable to bare metal, but it freezes.

 

Like the weirdest thing was, I had this Problem before, and I did some stupid thing, Namely I let the VM run on only the Virtual Cores, like i had it on 14-23 and it ran fine with no crashes for like 2 weeks. Why?

 

Any input is really appreciated.

Edited by n3rf
Link to comment

Is one of the VMs idling and after 15min it causes this issue or are both VMs doin something, let's say play a video or music? How is your CPU Scaling Governor set in Unraids "Tips and Tweaks"? The fact it always happens after the same amount of time is strange. Usual if people have issues with GPU passthrough or audio it happens straight if the start an application. Maybe an temperature issue? Try to monitor your temps with 

watch sensors

and the clock speeds with

watch grep \"cpu MHz\" /proc/cpuinfo

And watch if your CPU maybe throttles after that 15min?

 

Are you using a Enermax Liqtech AIO? I had one and it died like for many people after a couple months. Temps slowly raised to the point where the CPU throttled itself below 2GHz pretty much on the slightest load. Maybe you're seeing the first signs here. 

 

 

Edit: 

As i wrote that post i saw Limetech released the 6.7 to stable. Just to let you know. 

Edited by bastl
Link to comment

I have a cooler master 360 mm aio in a push/pull configuration, and I have normal temps with no throttling. 

There is no difference if the other VM is doing anyhting at all, or turned off.

If I play Apex on both VM's, the other VM has 0 issues at all, even during the crash. 

 

I didn't have the Tips and Tweaks plugin. 

I disabled flow control and set CPU Scaling Governor to Performance

 

Something strange happened yesterday night.

 

So i played a bit and then the game started dropping frames and I just alt tabbed out, and let it run, trying to collect dump files and stuff, and after about 10-15 Minutes it suddenly caught itself again?

 

how??

 

I really think I should get a disk to pass through, but I don't really have one lying around and don't wanna buy one on the off chance it might help.

 

Testing again now with 6.7.0 stable and Tips and Tweaks plugin.

 

 

Edited by n3rf
Link to comment

Another thing you can try, set a specific iothread for the VM to limit Unraid to specific cores. 

Example from my xml:

  <vcpu placement='static'>14</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='9'/>
    <vcpupin vcpu='1' cpuset='25'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='26'/>
    <vcpupin vcpu='4' cpuset='11'/>
    <vcpupin vcpu='5' cpuset='27'/>
    <vcpupin vcpu='6' cpuset='12'/>
    <vcpupin vcpu='7' cpuset='28'/>
    <vcpupin vcpu='8' cpuset='13'/>
    <vcpupin vcpu='9' cpuset='29'/>
    <vcpupin vcpu='10' cpuset='14'/>
    <vcpupin vcpu='11' cpuset='30'/>
    <vcpupin vcpu='12' cpuset='15'/>
    <vcpupin vcpu='13' cpuset='31'/>
    <emulatorpin cpuset='8,24'/>
    <iothreadpin iothread='1' cpuset='8,24'/>
  </cputune>

1 iothread specified and limited to use cores 8,24.

 

Another user with some performance issues (hickups,freezes) disabled the folder cashing plugin and this fixed his issues. Check if you using that plugin and try to dissable it completly for testing or exclude your vdisk share. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.