[Solved] VM - Windows 10 little dropouts


T65

Recommended Posts

Dear Community,

I've been using my server for quite some time, but for technical reasons (performance) I didn't use a VM. After several updates (Unraid) I would like to start a new attempt.

 

VM-Specs:
 - 6 Cores and 6 HTs (one numa node)
 - 8192 MB-RAM
 - Q35
 - OVMF

 

I have created a Windows 10 VM and made some configurations (numatune, cpu isolation, pcie root patch, ...). I passed the SSD and the graphics card to the VM. Windows updates are all installed and I also installed the latest graphics card driver from AMD (19.4.1). With the VM you can already work well and also the game speed is good or sufficient for me.

 

BUT
When I move a window quickly or make fast mouse movements in games, I noticed that there are slight dropouts (like when something has to be reloaded). There are no CPU/GPU peaks or other drastic jumps. In a graphics card performance test
I measured almost the same performance (99%) to bare metal.

 

Has this behavior been noticed by some of you or is there a name or thread for it?

 

First of all, I have deliberately avoided further details in order to tackle the problem step by step and collect informations. With concrete ideas I will provide you of course with the necessary informations :-)


Thank you very much and I'm glad to hearing from you.

T65

Link to comment

Short question, is the Vega card working after you reboot the VM which uses it? Usually the Vega cards have a reset bug where you can't restart a VM to pick it up again. Usually you have to restart the whole server. I would be surprised if it's working for you. Are you sure your card is picked up correctly by the VM and shows no errors in the device manager? 

Link to comment

I've heard of it.

I can start the VM, poweroff the VM and restart again without restarting the host.

 

In the AMD driver overview all data are displayed correctly but I haven't looked in the device manager yet.

I will provide you some pictures.

Link to comment

Yes, of course.

I thought it would go without but with it it goes much faster 🙂

 

I will collect all the necessary data and make it available to you.
- VM settings (xml, ...)
- VM driver outputs (device manager, amd adrenalin overview, ...)
- Unraid settings (vfio bind, gouvernor, cpu isolation, system devices, ...)

- Bios settings

flash.JPG

furmark.JPG

governor.JPG

gpu_01.JPG

gpu_02.JPG

gpu_03.JPG

lstopo.png

 

vfio-pci.JPG

vm_01.JPG

vm_02.JPG

 

Edited by T65
Added settings details.
Link to comment

First of all, is there a reason why you isolate all of your cores except of the first one? If you have any dockers on your server or any tasks running in the backround, lets say for syncing your server or for backups, all these task will only run on the left over cores, in your case 0 and 24. Better solution is to only isolate the cores you wanna use for a VM and let the rest unisolated to be handled by Unraid itself. 

image.png.b5151e265b751591d3ed47117d38a03f.png

 

Second thing, as far as I know it's not adviced to have a SSD/NVME as one of your array drives. For testing this might be ok but for long term use this isn't the best solution. Trim isn't supported on array drives, your SSD will become noticeably slow over time and as far as I understand it the way parity works on Unraid this won't be a feature in the near future of Unraid. 

image.png.030efc1fe0528f45283a6c5a0276a5c6.png

 

Next thing, the SSD you passthrough to the VM is a 32GB Transcend SSD, right? I don't know how old that thing is, but if it's one of the first gen SSDs and was used a lot over the past years that might be the reason why you see some stuttering. Also 32GB isn't that much for a Windows install. Running a SSD close to it's max capacity can also cause a decrease in performance. 

image.png.93d523c91f1ec39440f7be8afbca0747.png

image.png.bf09b445c306d246c44ba368cbf13551.png

You have defined your disk as virtio. Usual for a vdisk file thats ok, but you wanna squeeze out a bit more performance and reduce the IO on the host, SCSI is the better choice. Before you switch to SCSI you first have to install the driver in the VM, otherwise Windows won't be able to boot. First add a small dummy SCSI vdisk via the unraid ui lets say only 1G, start your VM and go to device manager to install the SCSI driver from the virtio iso. Shutdown the VM and now switch from virtio to SCSI for your main disk of the VM. The dummy vdisk isn't needed anymore. The part in the xml should look something like this. Adjust it so it matches your config.

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads' discard='unmap'/>
      <source dev='/dev/disk/by-id/ata-Samsung_SSD_850_EVO_1TB_S2RFNX0J606029L'/>
      <backingStore/>
      <target dev='hdd' bus='scsi'/>
      <alias name='scsi0-0-0-3'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>

If you have by any chance a spinning hdd laying around, use that as array drive and use the Samsung SSD as cache drive. There you have a lot of space for a couple vdisks for VMs and can benefit of trim. 

 

 

Next thing, try to reduce your emulator pins to only 2. I don't think more than 2 is useful. Select "6,30" and you should be ok.

image.png.bdb288c29c85c6eb8e47f2fef2cd8349.png

 

 

For the next thing I'am not exactly sure if a "exact" CPU definition is needed for an EPYC CPU. Usual this part is used for Threadripper CPUs to report the correct amount of cache of the CPU to the guest OS.

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC</model>
    <topology sockets='1' cores='12' threads='1'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
    <numa>
      <cell id='0' cpus='6-11' memory='8388608' unit='KiB'/>
    </numa>
  </cpu>

 

Another thing you can try to reduce the stutter that my can be caused by disk IO, is to specify one "iothread" and pin it to 2 cores like in my example. With this i got slightly better latency on disk access. The cores "8,24" are on the same die as the rest of the cores in my example and not included in the passed through cores. In your case use the cores "6,30", remove them from the VM and only set them as emulatorpins and iothreads. 

  <vcpu placement='static'>14</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='9'/>
    <vcpupin vcpu='1' cpuset='25'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='26'/>
    <vcpupin vcpu='4' cpuset='11'/>
    <vcpupin vcpu='5' cpuset='27'/>
    <vcpupin vcpu='6' cpuset='12'/>
    <vcpupin vcpu='7' cpuset='28'/>
    <vcpupin vcpu='8' cpuset='13'/>
    <vcpupin vcpu='9' cpuset='29'/>
    <vcpupin vcpu='10' cpuset='14'/>
    <vcpupin vcpu='11' cpuset='30'/>
    <vcpupin vcpu='12' cpuset='15'/>
    <vcpupin vcpu='13' cpuset='31'/>
    <emulatorpin cpuset='8,24'/>
    <iothreadpin iothread='1' cpuset='8,24'/>
  </cputune>

 

Edited by bastl
  • Like 1
Link to comment

@bastl

Thank you for your help, tips and efforts. Yesterday it was too late to answer your tips or give feedback.

 

13 hours ago, bastl said:

First of all, is there a reason why you isolate all of your cores except of the first one?

13 hours ago, bastl said:

Second thing, as far as I know it's not adviced to have a SSD/NVME as one of your array drives.

13 hours ago, bastl said:

Next thing, the SSD you passthrough to the VM is a 32GB Transcend SSD, right?

I have built a simple setup without docker containers and using only the most necessary hardware, to avoid as many sources of error as possible. No, it's an old configuration from an older experiment and the two SSD hard drives lay around and could be integrated quickly. I also started from scratch, setup the Transcend SSD as array disk and the Samsung SSD as VM disk.

 

1. SCSI (no improvments)

2. Custom CPU (no improvments)

3. Emulator Pin (no improvments)

4. IO Thread (no improvments)

 

Which I'd like to try:
- Remove the SCSI/VirtIO layer and use a NVME instead.
- Pass through various devices that may be important.

 

Maybe we'll get some extra points and the system will work the way we want to. If not, we just have to wait for upcoming updates and improvements.

Link to comment

Could be but I'm a AMD-Fanboy :) I have never bought Intel or Nvidia :D

 

Well, in this case I would like to buy one of the two graphic cards (small form factor):

  • MSI GeForce GTX 1070 Aero ITX OC 8GB

  • MSI GeForce RTX 2070 Aero ITX OC 8GB

I think that these graphics cards (not the same model) will be used by other users and there should be no problems :/

Link to comment

Yesterday I installed Windows 10 on the NVME and on a baremetal system. Then I stubbed the NVME controller and created a new Windows 10 VM on the unraid server. No improvements and now I guess it's definitely the GPU, maybe due to a wrong PCIe configuration x1 instead of x16 Gen 3. Some member discussed that already and maybe that's it - I don't know!

 

PCIe Root Port Patch

 

Now I bought a Nvidia GTX 1070 because I have no other GPU laying around.

 

Link to comment

The PCI Root Port Patch only works on Q35 VMs on newer builds starting with 6.7_RC5. You need to insert the following QEMU arguments at the end of the xml. 

  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
  </qemu:commandline>
</domain>

For me with this change the Nvidia system settings are reporting the correct PCI link speeds and benchmarks like 3DMark are working now. Without this I always had issues with system freezes when 3DMark starts up and checking for system information, GPUZ reporting wrong speeds and I guess there is a couple other software out there having problems. 

Edited by bastl
Link to comment
  • 2 weeks later...

Hello,

the GTX 1070 arrived and I tried it on the "server platform" - the same issue. Then I grabbed my "desktop platform":

 

Ryzen 1700

Asus Prime X370 Pro

32GB RAM

 

and it ends up with the same issue ¬¬. I tried different configurations and found a curious behaviour with VNC (1. GPU) and the 1070 (2. GPU). So i stopped searching for performance issues and started searching for I/O input lags. I stubbed the USB controller, passed it through and voilà!!!

 

I had no success to test with Q35 because I could not find or extract a valid UEFI Rom file for the 1070. That means I couldn't test the "PCI Root Patch". In GPU-Z the gpu was configured with x8 ... - can't remember.

 

The next step will be, trying it on the "server platform" - again.

I'll keep you posted!

Link to comment
  • 2 months later...

Hello,

the last update was three months ago, so its time to close this thread.

 

I didn't tried the GTX 1070 on the server platform but fixed the I/O input lags passing through the USB controller on my two VM's with the two gpu's AMD R9 and AMD RX56 Vega!

 

Thank you for that great support and tips - I love you guys!

 

@SpaceInvaderOne

Today I saw your video about the AMD RX 5700XT and through my experiences I am of the opinion that the motherboard hardware connection is different. I had and have no problems with my RX56 graphics card and the Supermicro motherboard from the first time - install, pass-through, install drivers and have fun. I would have liked to test the RX 5700XT with my setup, but therefore buying, testing and returning a graphics card makes no sense.

 

Maybe we should create an overview that lists different setups and which components produced the best results?

Link to comment
  • T65 changed the title to [Solved] VM - Windows 10 little dropouts
  • 4 years later...
On 4/16/2019 at 8:25 PM, bastl said:

First of all, is there a reason why you isolate all of your cores except of the first one? If you have any dockers on your server or any tasks running in the backround, lets say for syncing your server or for backups, all these task will only run on the left over cores, in your case 0 and 24. Better solution is to only isolate the cores you wanna use for a VM and let the rest unisolated to be handled by Unraid itself. 

image.png.b5151e265b751591d3ed47117d38a03f.png

 

Second thing, as far as I know it's not adviced to have a SSD/NVME as one of your array drives. For testing this might be ok but for long term use this isn't the best solution. Trim isn't supported on array drives, your SSD will become noticeably slow over time and as far as I understand it the way parity works on Unraid this won't be a feature in the near future of Unraid. 

image.png.030efc1fe0528f45283a6c5a0276a5c6.png

 

Next thing, the SSD you passthrough to the VM is a 32GB Transcend SSD, right? I don't know how old that thing is, but if it's one of the first gen SSDs and was used a lot over the past years that might be the reason why you see some stuttering. Also 32GB isn't that much for a Windows install. Running a SSD close to it's max capacity can also cause a decrease in performance. 

image.png.93d523c91f1ec39440f7be8afbca0747.png

image.png.bf09b445c306d246c44ba368cbf13551.png

You have defined your disk as virtio. Usual for a vdisk file thats ok, but you wanna squeeze out a bit more performance and reduce the IO on the host, SCSI is the better choice. Before you switch to SCSI you first have to install the driver in the VM, otherwise Windows won't be able to boot. First add a small dummy SCSI vdisk via the unraid ui lets say only 1G, start your VM and go to device manager to install the SCSI driver from the virtio iso. Shutdown the VM and now switch from virtio to SCSI for your main disk of the VM. The dummy vdisk isn't needed anymore. The part in the xml should look something like this. Adjust it so it matches your config.

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads' discard='unmap'/>
      <source dev='/dev/disk/by-id/ata-Samsung_SSD_850_EVO_1TB_S2RFNX0J606029L'/>
      <backingStore/>
      <target dev='hdd' bus='scsi'/>
      <alias name='scsi0-0-0-3'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>

If you have by any chance a spinning hdd laying around, use that as array drive and use the Samsung SSD as cache drive. There you have a lot of space for a couple vdisks for VMs and can benefit of trim. 

 

 

Next thing, try to reduce your emulator pins to only 2. I don't think more than 2 is useful. Select "6,30" and you should be ok.

image.png.bdb288c29c85c6eb8e47f2fef2cd8349.png

 

 

For the next thing I'am not exactly sure if a "exact" CPU definition is needed for an EPYC CPU. Usual this part is used for Threadripper CPUs to report the correct amount of cache of the CPU to the guest OS.

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC</model>
    <topology sockets='1' cores='12' threads='1'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
    <numa>
      <cell id='0' cpus='6-11' memory='8388608' unit='KiB'/>
    </numa>
  </cpu>

 

Another thing you can try to reduce the stutter that my can be caused by disk IO, is to specify one "iothread" and pin it to 2 cores like in my example. With this i got slightly better latency on disk access. The cores "8,24" are on the same die as the rest of the cores in my example and not included in the passed through cores. In your case use the cores "6,30", remove them from the VM and only set them as emulatorpins and iothreads. 

  <vcpu placement='static'>14</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='9'/>
    <vcpupin vcpu='1' cpuset='25'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='26'/>
    <vcpupin vcpu='4' cpuset='11'/>
    <vcpupin vcpu='5' cpuset='27'/>
    <vcpupin vcpu='6' cpuset='12'/>
    <vcpupin vcpu='7' cpuset='28'/>
    <vcpupin vcpu='8' cpuset='13'/>
    <vcpupin vcpu='9' cpuset='29'/>
    <vcpupin vcpu='10' cpuset='14'/>
    <vcpupin vcpu='11' cpuset='30'/>
    <vcpupin vcpu='12' cpuset='15'/>
    <vcpupin vcpu='13' cpuset='31'/>
    <emulatorpin cpuset='8,24'/>
    <iothreadpin iothread='1' cpuset='8,24'/>
  </cputune>

 

Hello. in 2023 i started using unraid for a small home project that includes a streaming gaming "server" from a windows vm. Your suggestion for iothread saved my pc. I just wanted to thank you.

Have a wonderful life

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.