[6.6.6] [BOUNTY] Ryzen 2700x - [VIDEO] Latency, stuttering, STRANGE issues


Recommended Posts

Hi All. Added a £30 $40 bounty..

 

TL:DR

 

UPDATED LATEST INFO:

Getting Latency in Windows VM. Stuttering. improved it with a lot of stuff in this thread but it's not over..:

 

Seems to boil down to writing to the array from windows (whether thats writing to a vdisk on the cache, or writing to a share (with or without cache).
When writing to the cache it is very slow from in the windows VM, grinds to a halt and i see big cpu usage in unraid. Reading from the array in windows does not causing the latency issues.

 

Strangely, i also see the emu pinned CPU that is isolated and not used with anthing else, spike to 100% even though windows is not doing anything but when i copy from array to cache via unraid (ssh).

 

 

Basics:

Running 2700x, 1080ti passed through, 4 cores (2 threads each) to win10 VM. (WinMain)

I had some performance issues but sorted those (i thought). This is a separate issue so i've created a new thread.

 

i'm seeing stuttering (cracking noises) in games and it's unplayable. My specs are in the post linked below.

 

I'm using the latency tool and i can see spikes - in the video I show you my unraid cpu usage (maybe should have showed TOP), DPC latency checker and running 2 games.

 

1 game - apex legends uses multi core, the latency looks all the same. with the spikes.. (is yellow generally bad?)

in another game Everquest 2 which is old and single threaded, for some reason the average latency drops massively.. 

not sure if related.... still get the spikes though.

 

note the latency tool might not show a red spike at the same time i get stutter / crackle. but normally do..

 

I've unplugged all usb and same issue.

I'm passing through the gpu + hdmi audio. USB controller (tried without it and passing through just the keyboard/mouse). and an NVME Samsung 960 (or 970).


The OS drive is on Cache (dual ssd), and second drive is the NVME 1tb [controller passed through](with the games on).

 

Please check the video (you may need to wait for the resolution to get changed to 4k by youtube once it is processed):

 

 

I made a post here with my setup. and i've attached by diagnostic to this thread.

 

 

 


 
 

urhpg8-diagnostics-20190221-2213.zip

Edited by snailbrain
Link to comment

I also use this tweak on my system. Apply it in the tips and tweaks plugin.

 

vmdirty.png.69c96ad344058b7457e2b5ce1279eb88.png

 

I run samsung nvme drive as my os drive; a samsung ssd for newer games and a older WD blue HD for older games. Both are 1TB. I use virtio blk to pass the WD blue and virtio scsi for the samsung ssd for trim to work properly. I think you are getting the spikes from the virtio overhead on the system.

Link to comment
29 minutes ago, david279 said:

I also use this tweak on my system. Apply it in the tips and tweaks plugin.

 

vmdirty.png.69c96ad344058b7457e2b5ce1279eb88.png

 

I run samsung nvme drive as my os drive; a samsung ssd for newer games and a older WD blue HD for older games. Both are 1TB. I use virtio blk to pass the WD blue and virtio scsi for the samsung ssd for trim to work properly. I think you are getting the spikes from the virtio overhead on the system.

Thanks.

when you say virtio scsi for the samsung, do you mean where it says where the vdisk is, instead of linking to a file, select the harddisk itself some how?

 

virtio blk, what do you mean by this?

 

sorry :)

 

by passing through the controller for the nvme i think trim should work in windows? i also under provisioned the disk by about 50 gb

 

 

Link to comment

Virtio BLK is the default virtio selection when creating a vDisk and using virtio. I use the virtio scsi because my SSD would look like a regular hard drive in Windows so trim would not work. I'm using the /Dev/by-id method to pass though the SSD and HD. Passing through the name controller would allow trim to work on it. You can check the defragment app in Windows to make sure.

Sent from my SM-N960U using Tapatalk

  • Like 1
Link to comment

this may or may not be related.

Copying from my nvme to a share (which uses cache), really locks up the windows VM + uses the first 4 cores like crazy.

even though the windows VM is isolated to the last 4 cores.

as i'm copying for my passed through nvme (controller) to the "share", this should mean it's not related to the virtual storage driver in the win vm? (which the OS is booting from - cache)

image.thumb.png.1b6787b586b6ca232e90dc365a3b500e.png

 

could it be my nvme itself?
i'll try disabling it and jut using the cache drive, and vice versa..

Edited by snailbrain
Link to comment

Update:

I tried not passing through the nvme and just running off the cache. same latency issues

i created  VM and cloned my the virtual os drive (that was on the cache) to the nvme
then disabled the virtual drive for the VM, so that it booted from the nvme. therefore the VM did not use the cache drive at all, it only used the nvme..
same issue

 

so it does not seem anything to do with storage itself?
 

Link to comment

I've now set emulator pin to a free core that is also isolated. not helped.

 

I set htep to no - This has now got me green in latency checker...

image.png.c4aa9118917d0b2ab231e11658ac7f8e.png

 

copying from the nvme (passedthrough) to c drive (which is on the dual cache ssd)
image.png.06693f192bbe717f29fc1b308d0a9118.png

 

 

but.. getting still insane high latency when transfering files and/or download from steam at 40MB/sec onto passed through nvme drive.

with htep disabled it seems this is not an issue..

I'm getting desperate now as i'm unable to work (concentrate) until i get this fixed :)

 

I'm now offering a £30 ($40 USD) Bounty, can be paid with paypal or cryptocurrency.

if someone can find the solution..

 

 

htep "no" has made a massive difference.. the problem now seems to be just copying to the cache drive (maybe because i have 2?)

Even if i copy to the cache drive directly (or the C drive which is on cache), i get the latency issues.

 

 

edit: my 2 ssd cache drives - are on my dell it moded raid card.

wondering if I move them onto motherboard sata ports... Will try that tomorrow. That card is normally sharing with my passed through dvbs card but gets split up with acs

 

edit2: it seems it's if i copy to the array even not onto the cache. e.g. i have a share that does not use cache.. latency increases even if i copy to that... + i copied to a drive + parity that is not using my raid card, it's a drive + parities not using the raid controller (directly onto motherboard sata ports).

So -

issue is - whenever my windows vm touches the array, i get the latency issues.. whether cache or not, and doesn't matter if the drivers are on the motherboard sata controller or the raid card.

 

edit: it's only when WRITING, to the array (getting 100% cpu usage on emu cpu, and unraid cpus + copying goes to zero speed while it sort of catches up

image.thumb.png.a1874a82915a7eedfa6821128876aee6.png

 

 

OK:

 

copying via ssh (so all in unraid) from array to array does not affect latency in the windows VM
copying from array to ssd cache, causes a little spikes but nothing mega.. but still some.

 

i did notice something strange though - my isolated cpu i set for emu in the VM spikes to 100% even though i'm doing all this through ssh on a different machine.. it's not being used by anything at all except emu. why would that be?

 

 

seems latency is at it's worst on windows when  it's when writing to the cache or array, whether thats through a share OR if the vdisk is on the array/cache.

 

Edited by snailbrain
Link to comment

@snailbrain I played around the last weeks with different tweaks to increase the performance inside a VM. I don't have that particular issue with stuttering like you have but maybe a hint what you can try. If you passthrough a storage device directly to the VM the IO for that is handled differently as for an vdisk. The IOThreads are pinned to all physical CPUs by default. You can tweak this and limit it to certain cores. I'am running an 1950x and isolated the second die (cores 8-15, 24-31) completly and passed it through a VM. The OS is directly installed on an Samsung NVME and games are on an SSD, booth passed through. Currently i have it set to 2 iothreads. The first thread to only use the first CCX of the die (cores 8-11, 24-27) and the second thread to the second CCX (cores 12-15, 28-31). Try to play around with it maybe this is something that can fix your stuttering. Try to set only one iothread to maybe a core pair only that the VM already uses or set it to a pair outside the VM. My settings aren't final yet, the Threadripper NUMA/UMA memory allocation, PCIE bus thing is a complete different story. 

  <vcpu placement='static'>16</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='24'/>
    <vcpupin vcpu='2' cpuset='9'/>
    <vcpupin vcpu='3' cpuset='25'/>
    <vcpupin vcpu='4' cpuset='10'/>
    <vcpupin vcpu='5' cpuset='26'/>
    <vcpupin vcpu='6' cpuset='11'/>
    <vcpupin vcpu='7' cpuset='27'/>
    <vcpupin vcpu='8' cpuset='12'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='13'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <vcpupin vcpu='12' cpuset='14'/>
    <vcpupin vcpu='13' cpuset='30'/>
    <vcpupin vcpu='14' cpuset='15'/>
    <vcpupin vcpu='15' cpuset='31'/>
    <emulatorpin cpuset='8,24'/>
    <iothreadpin iothread='1' cpuset='8-11,24-27'/>
    <iothreadpin iothread='2' cpuset='12-15,28-31'/>
  </cputune>

 

 

EDIT:

Btw. from the DPC Latency Checker homepage:

 

Quote

The program supports Windows 7, Windows 7 x64, Windows Vista, Windows Vista x64, Windows Server 2003, Windows Server 2003 x64, Windows XP, Windows XP x64, Windows 2000.

...

Windows 8 Compatibility: The DPC latency utility runs on Windows 8 but does not show correct values. The output suggests that the Windows 8 kernel performs badly and introduces a constant latency of one millisecond, which is not the case in practice. DPCs in the Windows 8 kernel behave identical to Windows 7. The utility produces incorrect results because the implementation of kernel timers has changed in Windows 8, which causes a side effect with the measuring algorithm used by the utility. Thesycon is working on a new version of the DPC latency utility and will make it available on this site as soon as it is finished.

 

Edited by bastl
Link to comment
2 hours ago, bastl said:

@snailbrain I played around the last weeks with different tweaks to increase the performance inside a VM. I don't have that particular issue with stuttering like you have but maybe a hint what you can try. If you passthrough a storage device directly to the VM the IO for that is handled differently as for an vdisk. The IOThreads are pinned to all physical CPUs by default. You can tweak this and limit it to certain cores. I'am running an 1950x and isolated the second die (cores 8-15, 24-31) completly and passed it through a VM. The OS is directly installed on an Samsung NVME and games are on an SSD, booth passed through. Currently i have it set to 2 iothreads. The first thread to only use the first CCX of the die (cores 8-11, 24-27) and the second thread to the second CCX (cores 12-15, 28-31). Try to play around with it maybe this is something that can fix your stuttering. Try to set only one iothread to maybe a core pair only that the VM already uses or set it to a pair outside the VM. My settings aren't final yet, the Threadripper NUMA/UMA memory allocation, PCIE bus thing is a complete different story. 


  <vcpu placement='static'>16</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='24'/>
    <vcpupin vcpu='2' cpuset='9'/>
    <vcpupin vcpu='3' cpuset='25'/>
    <vcpupin vcpu='4' cpuset='10'/>
    <vcpupin vcpu='5' cpuset='26'/>
    <vcpupin vcpu='6' cpuset='11'/>
    <vcpupin vcpu='7' cpuset='27'/>
    <vcpupin vcpu='8' cpuset='12'/>
    <vcpupin vcpu='9' cpuset='28'/>
    <vcpupin vcpu='10' cpuset='13'/>
    <vcpupin vcpu='11' cpuset='29'/>
    <vcpupin vcpu='12' cpuset='14'/>
    <vcpupin vcpu='13' cpuset='30'/>
    <vcpupin vcpu='14' cpuset='15'/>
    <vcpupin vcpu='15' cpuset='31'/>
    <emulatorpin cpuset='8,24'/>
    <iothreadpin iothread='1' cpuset='8-11,24-27'/>
    <iothreadpin iothread='2' cpuset='12-15,28-31'/>
  </cputune>

 

 

EDIT:

Btw. from the DPC Latency Checker homepage:

 

 

 

Hey and thank you very much for the response.

 

I tried the iothreadpin but it didn't seem to help (i set them to the isolated cores that are used by the VM, ones that were reserved for unraid, and an isolated core on it's own[albiet just 1 then]). When i'm copying (from nvme to desktop (cache)), i see the emulator pin max out at 100% and an un-isolated core max out 10 (and it shifts around). i.e. i don't see the iothread cores get invovled when i'm transfering?

 

I did not know what about Windows 10 and DPC latency checker. It does still seem to show an issue (i always use latancy mon), and have tested some games.
If i start transfering to my OS drive (which is on cache) or to anything on the array (not just cache), the game becomes unplayable. Also transfering the files grinds to a halt.

 

any other ideas?

Link to comment

For me the IOThread i set in the XML is used when i copy files from the VM to the array or the cache. Maybe there is some sort of caching involved for you in the backround causing your games to lag. Are your disks for the VM set to "writeback"? Maybe try none instead.

 

      <driver name='qemu' type='raw' cache='none'/>

 

  • Like 1
Link to comment
9 minutes ago, bastl said:

For me the IOThread i set in the XML is used when i copy files from the VM to the array or the cache. Maybe there is some sort of caching involved for you in the backround causing your games to lag. Are your disks for the VM set to "writeback"? Maybe try none instead.

 


      <driver name='qemu' type='raw' cache='none'/>

 

Thank you. I think it is related to some write cache too - as i'm only getting the issue "writing" now.

 

A bit embarassed - the cache set to none was tip'd to me by the other guy in this thread. but i think because i removed the vdisk and tried running with just the nvme, it had got removed and put back to writeback itself when i added the vdisk back i guess.

 

Results:

 

Writing to the vdisk - (OS drive) is now not causing massive issues. Thank you....

 

but - writing to the array is still locking things up - e.g. if i write to a share (that uses cache), it will spike up to 500MB/s, then goes to 0 and waits there for a minute, then slowly goes up, then freezes again - i'm unable to stop the transfer for severalk minutes some times, and the cores on unraid are maxed out.

I thought it could have been something to do with dynamix file integrity plugin, but i removed that.

Are there some more cache settings?

 

 

 

Link to comment

@snailbrain One difference I see so far, is that you use the caching of the shares. All my shares are sitting on the array with caching disabled. Nothing is using my cache drive except for the docker appdata and a couple VMs which I don't use when I play games and the dockers are all lightweigt and not doin much stuff in the background (Unifi, Nextcloud, MariaDB, Duplicati, Netdata). In your example writing to a share is writing to the cache first where at the same time the VM OS disk sits on, right? You copying from and writing to the same device at the same time. If you have a spare SSD as a unassigned device, try to place your VM vdisk on there. I guess i will see the same issues duplicating a large file on the OS device while gaming. Or another option is to try copy to a shared folder where caching is disabled.

  • Like 1
Link to comment
22 minutes ago, bastl said:

@snailbrain One difference I see so far, is that you use the caching of the shares. All my shares are sitting on the array with caching disabled. Nothing is using my cache drive except for the docker appdata and a couple VMs which I don't use when I play games and the dockers are all lightweigt and not doin much stuff in the background (Unifi, Nextcloud, MariaDB, Duplicati, Netdata). In your example writing to a share is writing to the cache first where at the same time the VM OS disk sits on, right? You copying from and writing to the same device at the same time. If you have a spare SSD as a unassigned device, try to place your VM vdisk on there. I guess i will see the same issues duplicating a large file on the OS device while gaming. Or another option is to try copy to a shared folder where caching is disabled.

Hi thanks.

 

I'm writing from my nvme (which has its controller passed through) to the array. but i get the same issue if i write to a share which is using cache and also writing to a share that does not use the cache. (note - since i added writecache = none, writing from nvme to my C drive seems ok).

(seperate drives).

my C Drive is on the cache (domain folder on cache drive)

my E drive is 1 TB nvme

 

copying from E to array (cache or not) causes the issues.

 

i'm wondering if it's something to do with "ram" caching somewhere. Copying to the cache or array transfers really fast in the first few seconds, then grinds to a halt.
 

Edited by snailbrain
Link to comment

what guide did you follow in setting up the windows vm? at this point, I'd make a new windows 10 vm and see if the issues are still persistent to better isolate where the source problem is, as up to this point, there have been wild guesses and standard optimization solutions presented.

 

also, take your main image off the cache and put it on another fast drive via unassigned devices.

Edited by 1812
  • Upvote 1
Link to comment

Your Windows doin the copy operation is sitting on the cache. Windows is doin all sorts of stuff in the background like caching/storing temp files while copying and checking for corruption and your AV also performs stuff on that files at the same time. All that is IO on the cache and all been done at the same time using the vdisk on the cache device. If you can, try to place your OS vdisk on another non cache/cached disk and test again. As i said, if you have a spare ssd use it as an unassigned device and copy your vdisk over there and test with the VM running from the UD disk. Another thing to mention, depending on the device you're using for the cache, older SSD's and controllers often showed extrem performance decreases if the getting above a certain threshold. I have an old spare OCZ 128GB SSD. As soon as it becomes above 70-75% it slows down to 10-20MB/s reads and sometimes below that. Also this device is horrible handling  simultaneous operations.

 

Another thing i noticed you're using 2 cache devices, right? I guess they are mirrored (RAID1). Keep in mind that the controller these devices are connected to has to handle the write operations twice and also the calculations for the parity. Is this a onboard sata controller directly connected to the CPU or connected to the chipset sharing the PCIe lanes with other devices like USB controllers, or even NVME devices. Or do you use an extra RAID card where these disks are connected to? Which slot on the board? How is this PCIe slot connceted? Directly or over chipset? Tons of questions. And you might get an idea which direction i try to point you. Lets say your M.2 slot for your E drive is connected to the chipset which is limited to 4 PCIe lanes to the CPU and at the same time you using a raid card also sharing these 4 lanes, I guess you get the idea the chipset becomes the bottleneck here and can cause issues. Check your motherboards manual and check which M.2 slots and which PCIe slots are connected where and try to eliminate bottlenecks.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.