member378807 Posted March 16, 2019 Share Posted March 16, 2019 Alright guys, I am experiencing microstuttering in games/benchmarks. It happens about every 1-2 seconds and does not last very long, but it will drop frames to 20fps in games. To give an idea I need to play a game like Dark Soul 3 and Monster Hunter World at 1080p and everything setting at low just to be able to hit 55-60fps and I still get hard fps drops. *Dark Souls 3 is capped at 60fps but this should be able to run it at 4k 60fps and if not that Max settings 1080p @60fps with zero drops* In 3Dmark time spy with EVGA 2080 I get 8489 which shows top 82% of tested computers In Passmark 9 total score is 4050 with 79% of tested computers Passmark low scores show terrible 2D scores "Simple vectors : 18, Fonts and Text: 199, Image Filters: 611, Direct 2d: 34, Complex Vectors: 58, Windows Interface: 50, Image Rendering: 750" and a few terrible memory scores "Database Operations: 57, Memory Latency: 39" I am out of answers for searching and testing. Hopefully I can give all the information that would be helpful. Build UnRaid OS : 6-7-0-rc5 BASE CPUs : Intel Xeon E5-2687W v2 3.4GHz 8 Core Motherboard : AsRock Rack EP2C602-4L/D16 RAM : 128GB Kingston DDR3 ECC 1866 PC3 14900 PSU : EVGA P2 1000Watt STORAGE SDD CACHE : Mushkin Source - 250GB (MKNSSDSR250GB) QTY 4 in RAID 10 *I assume that is how UnRaid manages it* HDD Stor/Par : WD Red 8TB NAS Hard Disk Drive - 5400 RPM Class SATA 6 GB/S 256 MB Cache 3.5" (WD80EFAX) QTY 2 Parity QTY 1 Storage PCI EVGA 2080 XC Hybrid Gaming 08G-P4-2184-KR 8GB NVIDIA Quadro P4000 8GB Gigabyte GeForce GTX 1070 Mini ITX OC 8GB FebSmart 4 Ports USB 3.0 Super Fast 5Gbps PCI Express Temperatures of components CPUs are liquid cooled and run 30-35C under load Storage show about 30-31C EVGA 2080 35C Idle 68C load Nvidia Quadro P4000 Idk because it is just used for hardware transcoding and doesnt see much load Gigabyte 1070 mini 40C idle 75-78C load BIOS settings all power settings are set to performance C states are turned off PCI are set to 3rd gen Unraid Settings Tips and Tweaks CPU Gov Performance Enable Intel Turbo Boost Yes SSD trim * I dont know what the settings are on there* I use Cache only VM to store the Disc Images Each Image has 235GB Now to tell you what I have tried. Attempts at VMS: I'm just going to explain what I have used and attempted *which is A LOT*. I currently DO NOT have a VM made for the 2080 because I am waiting for direction from someone with more knowledge than I have. I have made and remade and retested around 10 clean VMs and always end up with micro stuttering CPU mode host passthrough Log CPU I have used 2 core 2 HT, 4 core 4 HT, and 6 core 6 HT in this pattern PIN 2/18,3/19,4/20,5/21,6/22,7/23 ISO 2/18,3/19,4/20,5/21,6/22,7/23 Memory ranging from Initial 2GB MAX 8GB to Initial 50GB to MAX 50GB Machine I have used i440fx-3.1 and Q35-3.1 Bios OVMF Hyper V Tried Enabled and Disabled VirtIO driver ISO : Virtio-win-0.1.160-1.iso Vdisk bus VirtIO I have used with and without Graphics ROM bios In Windows 10 I have all power set to performance most of the basic stuff following VM install and performance Space Invader One on youtube. I installed win10, update everything, install latest nvidia drivers *everything in package*, change all pci devices to MSI mode, disable Spectre and Meltdown Patch *windows patch that causes a lot of usage issues* make standard recommended NVCP settings for max performance run game/benchmark hello microstuttering or bare minimum I install win10, zero updates, install nvidia driver only,, change pci devices to MSI mode, make standard recommended NVCP settings for max performance run game/benchmark hello microstuttering I have also run these commands to no avail bcdedit /set disabledynamictick yes bcdedit /set useplatformclock true bcdedit /set tscsyncpolicy Enhanced Quote Link to comment
Warrentheo Posted March 16, 2019 Share Posted March 16, 2019 Add this to your XML file: <cputune> ... ... ... <vcpupin vcpu='6' cpuset='7'/> <emulatorpin cpuset='0,4'/> </cputune> and adjust it to your setup, this will move the Qemu overhead to whatever CPU core you wish... By default the Linux kernel and the Windows kernel try to use the same CPU cores, this forced handoff and inability for one kernel to reliably schedule tasks for those cores is what is causing this sort of thing... The extreme example of this is to also add: ... isolcpus=1-3,5-7 .... (Again, adjust to your setup... Any cores set this way will be unavailable to the Linux kernel, only usable if something specifically requests them...) to your syslinux config... UnRaid will use CPU 0 no matter what you put here, but any cores you isolate will only be used if something like QEMU comes along and specifically requests to use them after that... This will force a separation between the Linux Kernel and the Windows Kernel... On my setup, an i7-7700k, 64gb ram, at first I ran with the first core and its hyper-thread Reserved for UnRaid, used Emulatorpin to pin QEMU to them also (possibly redundant after CPU isolation), then ran windows on the remaining 3 cores and their HT's... While this worked, I realized I was having about 25% of my CPU sitting idle... Currently I removed the Isolation, which allows Linux to have access to all cores, set EmulatorPin to only core 0 (without it's HT), and run windows on 3 full cores and one HT... This allows Windows and Linux to share the primary core while not actually sharing it... Been running this way for 6 or 7 months, and think it works much better... This however means that if I start using the Linux side more (or Docker also), that Linux has the ability to max out all 8 threads and kill my VM... I just currently use the Linux side sparingly, so for me, this works... Quote Link to comment
member378807 Posted March 17, 2019 Author Share Posted March 17, 2019 I will need explanation on how to utilize the emulator pin correctly so through my testing if the cpuset looks odd its because of a lack of knowledge in this area. If you dont want to read through the scores and tests I can explain. Each test is different setups of emulatorpins until the very end were i settled with "emulatorpin cpuset=0-4" *not that it matters what I chose I just stuck with it just because.* I also made changed to the MSI priority for the RTX 2080 and other interrupts testing overall performance and whether or not I experienced Micostuttering. Final result performance increased with MSI RTX 2080 Priority set to High along with Emulator pinning *regardless of selected cores*. None of the changes I have made has resolved microstuttering. upon install of the VM and testing benchmarks I noted decreases in performance. Previous passmark 4050 or higher. Now 3370 avg All tests show pretty consistent CPU, 3D graphics score and 2D graphics score *albiet still low 2D graphics* No emulatorpin (MSI priority Normal) apparent microstutter 3373 *memory 1467.6 disk 11086* No emulatorpin (MSI Priority High) no apparent microstutter 3373 *memory 1525 disk 9214* <emulatorpin cpuset='0,4'/> MSI priority Normal passmark 3593.1 *memory 1777.4 disk 11730.1* apparent microstutter <emulatorpin cpuset='2-3,18-19'/> MSI priority Normal passmark 3449.2 *memory 1760.7 disk 6517.1* apparent microstutter <emulatorpin cpuset='0-4'/> MSI priority Normal passmark 3515 *memory 1770 disk 7887* apparent microstutter 3D Mark Timespy 8863 (82nd Percentile) apparent microstutter <emulatorpin cpuset='0-4'/> MSI priority High passmark 3633 *memory 1700 disk 10949* apparent microstutter *Even though I wished it not to be true and MSI priority High to be the fix* 3D Mark Timespy 8887 apparent microstutter <emulatorpin cpuset='0-4'/> MSI priority High Changed few other interrupts from undefined priority to normal passmark 3680 *memory 1678 disk 10822* apparent microstutter 3D Mark Timespy could not run caused instability revert <emulatorpin cpuset='0-4'/> MSI priority High Unloaded Most Windows 10 bloatware passmark 3572 *memory 1788 disk 9566* apparent microstutter 3D Mark Timespy 8679 apparent microstutter Quote Link to comment
bastl Posted March 18, 2019 Share Posted March 18, 2019 (edited) On 3/16/2019 at 7:58 PM, member378807 said: CPUs : Intel Xeon E5-2687W v2 3.4GHz 8 Core I noticed your board is an dual scket board. In case you have 2x CPU's please post the output of the following command whilst you have some VMs runnning numastat qemu and numactl --hardware For me it sounds like an issue mostly reported by Threadripper users. The way the CPUs are presented to the OS is different as it actual is. With other words the 2 separate die's of an AMD 1950x, each with 8 cores are presented to the OS as one single CPU with 18 cores as default. Each die has it's own memory controller and OS isn't aware of this in UMA configuration. In this case lets say you have a VM running on cores of node 1 and it uses memory attached to the other node you have latency issues. In NUMA configuration, the dies are presented individualy to the OS and you can force qemu/kvm to use only one specific node and it's ressources like memory or pcie devices. You also have to make sure that the graphics card you pass through to a VM is placed in an slot thats directly attached to the cores the VM are using. You can check the topology with LSTOPO, available in GUI mode only i think. UMA example NUMA example This is an example how it looks on my system. As soon as I use more than 32GB of total 64GB in one VM, I notice this inside the VM with higher latency in memory access and also in small stuttering in games. EDIT: Here a part of my tweaked VM to only use cores and RAM from a specific node. <vcpu placement='static'>14</vcpu> <iothreads>1</iothreads> <cputune> <vcpupin vcpu='0' cpuset='9'/> <vcpupin vcpu='1' cpuset='25'/> <vcpupin vcpu='2' cpuset='10'/> <vcpupin vcpu='3' cpuset='26'/> <vcpupin vcpu='4' cpuset='11'/> <vcpupin vcpu='5' cpuset='27'/> <vcpupin vcpu='6' cpuset='12'/> <vcpupin vcpu='7' cpuset='28'/> <vcpupin vcpu='8' cpuset='13'/> <vcpupin vcpu='9' cpuset='29'/> <vcpupin vcpu='10' cpuset='14'/> <vcpupin vcpu='11' cpuset='30'/> <vcpupin vcpu='12' cpuset='15'/> <vcpupin vcpu='13' cpuset='31'/> <emulatorpin cpuset='8,24'/> <iothreadpin iothread='1' cpuset='8,24'/> </cputune> <numatune> <memory mode='strict' nodeset='1'/> </numatune> Edited March 18, 2019 by bastl Quote Link to comment
member378807 Posted March 18, 2019 Author Share Posted March 18, 2019 Numastat qemu Per-node process memory usage (in MBs) PID Node 0 Node 1 Total ----------------------- --------------- --------------- --------------- 830 (qemu-system-x86) 15740.48 735.10 16475.59 4107 (qemu-system-x86) 10.62 32321.86 32332.48 21028 (qemu-system-x86) 38827.25 11422.36 50249.61 ----------------------- --------------- --------------- --------------- Total 54578.35 44479.32 99057.67 --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 64456 MB node 0 free: 473 MB node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 64508 MB node 1 free: 231 MB node distances: node 0 1 0: 10 20 1: 20 10 Quote Link to comment
Warrentheo Posted March 18, 2019 Share Posted March 18, 2019 5 hours ago, member378807 said: I will need explanation on how to utilize the emulator pin correctly so through my testing if the cpuset looks odd its because of a lack of knowledge in this area. If you dont want to read through the scores and tests I can explain. Each test is different setups of emulatorpins until the very end were i settled with "emulatorpin cpuset=0-4" *not that it matters what I chose I just stuck with it just because.* I also made changed to the MSI priority for the RTX 2080 and other interrupts testing overall performance and whether or not I experienced Micostuttering. Final result performance increased with MSI RTX 2080 Priority set to High along with Emulator pinning *regardless of selected cores*. None of the changes I have made has resolved microstuttering. upon install of the VM and testing benchmarks I noted decreases in performance. Previous passmark 4050 or higher. Now 3370 avg All tests show pretty consistent CPU, 3D graphics score and 2D graphics score *albiet still low 2D graphics* No emulatorpin (MSI priority Normal) apparent microstutter 3373 *memory 1467.6 disk 11086* No emulatorpin (MSI Priority High) no apparent microstutter 3373 *memory 1525 disk 9214* <emulatorpin cpuset='0,4'/> MSI priority Normal passmark 3593.1 *memory 1777.4 disk 11730.1* apparent microstutter <emulatorpin cpuset='2-3,18-19'/> MSI priority Normal passmark 3449.2 *memory 1760.7 disk 6517.1* apparent microstutter <emulatorpin cpuset='0-4'/> MSI priority Normal passmark 3515 *memory 1770 disk 7887* apparent microstutter 3D Mark Timespy 8863 (82nd Percentile) apparent microstutter <emulatorpin cpuset='0-4'/> MSI priority High passmark 3633 *memory 1700 disk 10949* apparent microstutter *Even though I wished it not to be true and MSI priority High to be the fix* 3D Mark Timespy 8887 apparent microstutter <emulatorpin cpuset='0-4'/> MSI priority High Changed few other interrupts from undefined priority to normal passmark 3680 *memory 1678 disk 10822* apparent microstutter 3D Mark Timespy could not run caused instability revert <emulatorpin cpuset='0-4'/> MSI priority High Unloaded Most Windows 10 bloatware passmark 3572 *memory 1788 disk 9566* apparent microstutter 3D Mark Timespy 8679 apparent microstutter Just to be clear, using the english translation of <emulatorpin cpuset='0-4'/> is "Pin the QEMU emulator to only use cores zero through 4 of the host" (a total of 5 cores BTW, only 1 or maybe 2 to be safe should be needed) Most likely you would want to avoid using a hyphen when doing this designation... The english version of: <emulatorpin cpuset='0,4'/> is "Pin the QEMU emulator to only use cores zero or 4 of the host" (a total of 2, or really 1+ its HT, or about 1.25 cores) That comma vs hyphen makes a big difference... The name of your video card threw me for a bit, because MSI actually does have something to do with this issue, but MSI has 2 different meaning in this context: MSI = Micro-Star International and MSI = Message Signaled Interrupts (https://en.wikipedia.org/wiki/Message_Signaled_Interrupts) Message Signaled Interrupts actually was the next thing I was going to have you try, I have attached a utility that may help with that... This is a small simple utility that shows the windows registry settings for the PCI devices on your system, and displays their MSI status, as well as letting you enable it if needed... MSI basically lets the video card bypass the CPU to talk to ram, allowing it to run much faster... It is a bit like DMA access... Most likely if you passed in your GPU, it is not set to enable MSI interrupts, and this utility will let you enable it, then reboot... I have to use it every time the drivers are installed or updated for my GTX1070... MSI_util.exe Quote Link to comment
member378807 Posted March 18, 2019 Author Share Posted March 18, 2019 I did need context on the emulator pin. Before I got a response from bastl then I put two and two together. I didnt know if it was tied to the XML binding or the host binding but I got it set up now. Ill post what I am seeing. It seems to have made a significant difference. Is it all resolved I need more testing but I am on call and my time is limited at home. *I try to be as thorough as I can for all of you who are so generous with your time and knowledge* As far as MSI. I was very aware about MSI because I was getting IRQ disable crashing my server previously. I'm sure you thought "Damn this guy got a one of those alibaba deals MSI EVGA 2080" lol Ill post my update in a moment Quote Link to comment
member378807 Posted March 18, 2019 Author Share Posted March 18, 2019 So, I made the changes to the XML loaded the VM and I from the start it appeared that the stuttering was lessened and the performance appeared better but the results do not exactly express what I was seeing on Passmark. When I ran 3D mark Timespy the Microstuttering was exactly the same. What I noticed on Passmark was the CPU score went down about 2000 points the GPU score went up 700 points the memory score went up 250 points but the disk score went down 5000 points. In 3D mark my score went down 1000 points. XML for RTX 2080 VM <vcpu placement='static'>12</vcpu> <iothreads>1</iothreads> <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='18'/> <vcpupin vcpu='2' cpuset='3'/> <vcpupin vcpu='3' cpuset='19'/> <vcpupin vcpu='4' cpuset='4'/> <vcpupin vcpu='5' cpuset='20'/> <vcpupin vcpu='6' cpuset='5'/> <vcpupin vcpu='7' cpuset='21'/> <vcpupin vcpu='8' cpuset='6'/> <vcpupin vcpu='9' cpuset='22'/> <vcpupin vcpu='10' cpuset='7'/> <vcpupin vcpu='11' cpuset='23'/> <emulatorpin cpuset='0-1,16-17'/> <iothreadpin iothread='1' cpuset='0-1,16-17'/> </cputune> <numatune> <memory mode='strict' nodeset='0'/> </numatune> Quote Link to comment
Warrentheo Posted March 18, 2019 Share Posted March 18, 2019 It is also possible to manually edit the XML for the VM to make what ever configuration of NUMA nodes you wish, basically make what ever kind of processor you wish, as long as windows kernel knows what you want... https://libvirt.org/formatdomain.html#elementsNUMATuning I would recommend search for "NUMA" on that page, and seeing all the different settings you can configure... Quote Link to comment
member378807 Posted March 18, 2019 Author Share Posted March 18, 2019 (edited) 58 minutes ago, Warrentheo said: It is also possible to manually edit the XML for the VM to make what ever configuration of NUMA nodes you wish, basically make what ever kind of processor you wish, as long as windows kernel knows what you want... https://libvirt.org/formatdomain.html#elementsNUMATuning I would recommend search for "NUMA" on that page, and seeing all the different settings you can configure... Ill be on this for months lol. Edited March 19, 2019 by member378807 . Quote Link to comment
Warrentheo Posted March 19, 2019 Share Posted March 19, 2019 2 hours ago, member378807 said: Ill be on this for months lol. Hazards of having a server grade dual socket board... You need to become an I.T. Admin to be able to run all the cool things on it 😛 Quote Link to comment
member378807 Posted March 19, 2019 Author Share Posted March 19, 2019 Just now, Warrentheo said: Hazards of having a server grade dual socket board... You need to become an I.T. Admin to be able to run all the cool things on it 😛 Sigh... I figured as much. It is a lot to learn. I am not even familiar with linux in general. I've only used it for about a month so "ls","cd","mkdir", etc is about all I got. I really just got this for my wife and I to be able to play games from one unit and be able to stream hd content at the same time. I knew it came at the price of headaches, knowledge, and well... price. But, frankly I just want to be able to play Sekiro without micro stutters when it drops on the 22nd. I'm all for going about learning on my own but I hit a brick wall, whether it be my own knowledge of the questions to ask or the specifics of the situation being available online for ease of access. Quote Link to comment
Warrentheo Posted March 19, 2019 Share Posted March 19, 2019 20 hours ago, member378807 said: Sigh... I figured as much. It is a lot to learn. I am not even familiar with linux in general. I've only used it for about a month so "ls","cd","mkdir", etc is about all I got. I really just got this for my wife and I to be able to play games from one unit and be able to stream hd content at the same time. I knew it came at the price of headaches, knowledge, and well... price. But, frankly I just want to be able to play Sekiro without micro stutters when it drops on the 22nd. I'm all for going about learning on my own but I hit a brick wall, whether it be my own knowledge of the questions to ask or the specifics of the situation being available online for ease of access. Here is some consolation... The new AMD Thread Ripper and Epyc chips are basically multi socket CPU's in a single CPU, and I am sure Intel is not too far off from the same sort of thing... The rest of us will need learn this stuff soon anyway, you will just be ahead of us 😛 Quote Link to comment
member378807 Posted March 21, 2019 Author Share Posted March 21, 2019 On 3/19/2019 at 5:04 PM, Warrentheo said: Here is some consolation... The new AMD Thread Ripper and Epyc chips are basically multi socket CPU's in a single CPU, and I am sure Intel is not too far off from the same sort of thing... The rest of us will need learn this stuff soon anyway, you will just be ahead of us 😛 I might have got it solved, I'm not going to say 100% yet because I am still in initial testing. There is a possibility of two solves coming from you on a separate thread. Another one you responded to about gaming performance. Then there is another video about overall performance in gaming. But back to your content. You posted something about the basic specs of your XML. One had to do with passing through host bios as well as discard=unmap. I initially tried to pass through the bios on its own and I could not boot. Removed the code and restarted. No dice. Deleted VM and remade the VM, said not booting video yet* I cant remember exactly but i remember the last time I had that it took me awhile to get it back working so I decided to wipe the Vdisk. *Side note you recommended Scsi for vdisk and i could not get it to boot.* small fish atm I started fresh with these in xml Iothreadpin emulatorpin numatune strict nodeset 0 bios host discard unmap I get everything going. Uninstall all bloatware on Win10. Run passmark +1300 points overall score +2000 CPU +100 2D 3D score fluctuates about the same +150-200 Memory +11000 Disk score *this is not a typo The disk score was something to the tune of 2656 Disk Sequential Read : 381 Disk random Seek + RW : 194 Disk Sequential Write : 158 These are Cache drives Sata3 RAID 10 so to me no sense. After I remade the VM its about 1200MBS read and 500-750 write *Nova Bench shows 2000MBSread 500MBSwrite but for whatever reason while I was testing before I remade this new VM the disk performance kept dropping and dropping. I have no answers. Ok, so performance appears to be better in games. Not the best but better. I still get audio popping but that takes me to my next finding. Many sources speak about increasing the bitrate, updating the driver, or disabling allowing full control within windows 10. The guy in the video was saying that the standard windows audio driver did better than the nvidia driver. There are two audio devices Nvidia Virtual Audio Devices and Nvidia High Definition Audio Device. I disabled the Virtual audio device. No change in audio popping clicking or microstuttering. I Installed the Microsoft Audio driver and the audio pops and clicks went down substantially. I played a 1khz test tone while running a benchmark and the only time it every popped was transitioning between tests. Whereas before I could just let it play without a benchmark running and it would pop intermittently ever couple of seconds. So, somewhere is where I am going. But, I have more questions and I have less paths to take. As far as audio went. I made that same attempt on the last VM and I experienced the same result less popping. I even went as far as disabling all audio and removing the card audio controller within unraid. and it only lowered the popping/clicking to the point that the Microsoft Audio Driver did. So, I may make an assumption that if there is no audio device then it cannot have an effect on microstutter/fps drops. Also, why "WAS" my read/write performance getting crushed on my VM? I have 4x250GB cache drives in Raid10. Vdisk is on the cache. I wipe the vdisk make a new one with the same parameters except Bios =Host and Discard=unmap. Is this a permanent solution? Also, I have 13 txt documents of the tests I've done and changes Ive made working from bios/unraid/windows10. I need to look more into unraid xml editing but I honestly don't know what more I can do. Quote Link to comment
bastl Posted March 22, 2019 Share Posted March 22, 2019 @member378807 Did you tried to apply the MSI_FIX to resolve your audio issue? https://wiki.unraid.net/UnRAID_6/VM_Guest_Support#Enable_MSI_for_Interrupts_to_Fix_HDMI_Audio_Support Quote Link to comment
member378807 Posted March 22, 2019 Author Share Posted March 22, 2019 (edited) 5 hours ago, bastl said: @member378807 Did you tried to apply the MSI_FIX to resolve your audio issue? https://wiki.unraid.net/UnRAID_6/VM_Guest_Support#Enable_MSI_for_Interrupts_to_Fix_HDMI_Audio_Support Yes, sir. Well not by the means of *manual* regedit. I used an app call MSI_util_v2. When I use it and restart the VM, I can use terminal lspci -v and see MSI is enabled Edited March 22, 2019 by member378807 clarification Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.