yesdog

Members
  • Posts

    16
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

yesdog's Achievements

Noob

Noob (1/14)

1

Reputation

  1. @goober07 - also, im in the same boat as you. i5 lol. This approach has definitely let me run 2 games on the 2 VMS and saturate the CPU without noticeable latency issues. Overall performance does go down tho because the CPU is maxed out. My next improvement after getting the kernels good is switching to an i7 for HT. I've done extensive research on this, and in theory HT should be amazing for 100% overcommiting exactly 2 VMs- 8 logical host cores, 4 logical cores per VM. The whole goal here with my kernel tuning is to let the kernel effectively share CPU resources 'equally'. The 2 problems here being: linux has to context switch procs in order to 'time share' a core, and a VM might not get to start executing immediately. So, with HT, this actually takes a bunch of load off of the kernel. With the 2 VMs, linux will spend most of its time alternating between mostly just 2 VM threads on a single core (some other threads here and there, but mostly the VM). With HT, we can assign a VM thread from each VM to each 'virtual core' of a single physical core. So the kernel has to do less, it can leave the threads there longer, and the physical CPU itself will handle executing both threads BASICALLY AT THE SAME TIME. Big misconception with HT- both threads are equal. Worse case scenario is each thread performs at half the speed of the physical core. Compared to non-HT at 4 physical non-HT cores and 8 VM threads- roughly half speed of a physical core, PLUS issues with timing/scheduling/context switches time dividing your 8 VM threads. So ultimately HT i think is going to give me the boost I want, but I'll still need a high tick kernel. We need to give linux all the ability it can to start executing a VM thread *immediately*. CPUs/cores/hyper threads dont always perform the same- CPU may clock down for thermal issues, a core might suffer from intense cache misses or something, so performance is never stable. What is stable tho, is that execution starts immediately when requested. AFAIK this guarantee can only be met with either CPU pinning or a lowlatency/rt kernel.
  2. Whoops, sorry let this go dark for a second. I didn't do any technical measurements unfortunately The only numbers I was able to observe was a solid 10ms latency drop in Steam Streaming (40ms -> 30ms), plus stability in latency over time (steam will graph the latency for you), so far less spikes in the end. @squark yeah even with single VM (no isolcpu for linux or core isolation for guests) disk access would always give me the biggest 'pops' in latency. With lowlatency on single VM, its butter smooth. No tasks really seem to delay any others noticeably (at the cost of context switching, im sure). This of course goes the same for another VM, even with CPU resources shared equally between 2 guests and the host, its really smooth. As far as the kernel... I'm actually doing this on a reference setup (ubuntu xenial) before i switch over to unraid full time. Roughly the same package versions as unraid beta 6.2.0- kernel 4.4, qemu 2.5, libvirt 1.3. Kernel configs i tried: 'linux-generic' - 4.4, 250 tick, vol_preempt 'linux-lowlatency' - 4.4, 1000 tick, invol_preempt, forced irq threads Lowlatency definitely gave me the latency i wanted, but not sure which of the features did it =\ According to what I've read, CONFIG_PREEMPT will allow any higher priority process to interrupt and preempt any LOWER priority process (including kernel 'threads'). So technically, my normal priority VMs shouldnt actually be preempting anything except each other. Why this makes a difference with a single VM is a good question, but its mostly likely just due to the tick rate and probably doesnt have much to do with preemption (maybe on a IO device level). The tick makes sense as you are just subdividing the time more between procs- afaik the normal tick based preemption works similarly, but takes into account time allocation- higher prio procs will be given more 'ticks' than lower. So in theory it should be switching between running drivers/kernel and the VM process rather often, but with priority still given to the drivers. This alone could improve latency by letting procs even do 'small' amounts of work rather quickly. The other interesting thing is the 'forced irq threads'- this basically gets a lot of soft/hard irqs out of the kernel itself- afaik this subjects these kernel threads to the same scheduling principles as regular procs- this alone may also improve latency just by making the bulk of the kernel work follow the same 'tick' the rest of the procs do and time share more evenly. So all in all I do really need to figure out which one of those 3 features gives me the massive boost =\ Maybe its truly the combo of all three, but I suspect CONFIG_PREEMPT might be unnecessary here. Plus the recommended 'tick' rate for gaming kernels is something with a multiple of 60 (600 i hear is good) to roughly match timing needed for frame gen. 600 would provide the gaming vm thread with a theoretical minimum of 10 involuntary switches per frame. Not bad. The default 100 is pretty terrible- 10ms ticks means you may waste up to 10ms at the beginning of a frame gen. You can also end up with a 'beat frequency', or interference between two frequencies (60 and 100 here), that can cause regular/period latency changes (latency/stability 'pulsing' on a regular interval), so you would theoretically get interference at 40Hz. The interference frequency should be either really close to 0, or really high so its not noticeable. If its close but not 0, you'll get really long periods of bad/good latency.
  3. doing the... 1:1 pinning like that just makes sure that linux doesnt try to move a VMs VCPU thread to another physical CPU (like it would with normal threads, schedules them based on a bunch of rules). just for more backstory- I wanted to allow the windows VM to take advantage of more resources when the other VM is off. I've got a pretty simple script that solves that easily- just watches libvirt and just repins the VCPUs depending on which box(s) are running. Trying to run both VMs across all cores is.... more of an experiment. With a single VM I was already having kinda bad latency issue. Audio pops, stuttering graphics when disk access was high, kinda jerky FPS. Then as an experiment I tried running another game with the overlapped CPU pinning on the other VM, and things got 1000x worse. So this definitely looked like a scheduling issue. My CPU wasnt 'pegged', so it just wasnt getting to the right task fast enough. The general 'solution' to this seems to be with CPU pinning, can even get crazy enough to move all IRQs to a system core so the VM cores will be almost 100% clean except what the VM runs on it. What i noticed with just the 1 VM already seemed to indicate a scheduling issue, so i kinda went with this as a 'worst case scenario' to diagnose it. So, low latency kernel- in theory this does 2 things for me: a) increases the kernel tick speed- used for process preemption, basically subdivides the CPU time more (in this case, went from 100 tick to 1000 tick). b) allows the kernel to preempt kernel processes (drivers, kernel space and time). I think 'a' here is really whats making the difference- if you're split-pinning VCPUs, a VM really is free to basically use as much CPU time as it wants as theres no real demand for it. But, unless you made linux stay away from that core entirely, its never really 'clean'. IRQBalance might try to fill it with interrupts, the linux scheduler may put tasks there, and kernel tasks may run there which cant normally be preempted/stopped. So unless great care is taken, a VM still might generally have to 'wait' on a real CPU before it can do anything. Basically, low latency kernel seems to have done a lot to fix things for me. Wondering if anyone else has experimented with preemption and/or high-tick kernels before? Hypothetically the lowlatency kernel should actually be rather inefficient and offer lower-throughput than the normal kernel (due to context switching). Anyone have any experience with that? This is also a skylake CPU which is supposed to offer optimizations for context switching. Are context switches potentially just not that expensive on this architecture?
  4. I've recently just gotten over the last challenge in my setup- a harsh drop in performance and stability when running 2 gaming VMs at once. Running KVM with PCIe passthrough, a game in one VM always seemed to affect a game in the other VM. This seemed to get worse with disk access, so it definitely seemed to be some kind of scheduling/priority issue. The general verdict is that running the 'generic' 100-tick kernel is more than enough for games, but no matter what there seemed to be a constant battle between the VMs. Im pinning VCPUs but both VMs ultimately share all the cores. I'm sure this is the root of my issue as there wouldnt be scheduling issues between the VMs if they didnt share cores. So I installed a low latency kernel ('lowlatency' 1000-tick w/ kernel-space preemption).. and it kinda fixed everything. The general verdict is that the lowlatency kernel will most likely be a performance drain in gaming situations, but I havent noticed FPS loss (when running just one WM) so it must not be that much overhead (might be a different scenario when gaming in a KVM). I'm also not sure I could notice any FPS loss because beforehand the games were so jumpy and jittery i really couldnt gauge it. Afterwards everything is butter smooth... but with im assuming is some lost CPU efficiency. Has anyone else tried a high-tick preempting kernel? I'm really expecting something horrible to happen, but so far it seems to have been a great decision. I read that preempting can drop frame rates up to 10%, so i just overclocked the cpu/cache 10%... fixed, right?
  5. Hmm, are you sure the ROM bios matches the bios on your card? I would maybe try to leave out the rom line as OVMF is really good at getting it from your card.
  6. I was reading more about the changes to the 'activation' process in Windows 10.... from what I can tell theres not a great reason to ever activate it. AFAIK you still get the full gamut of updates, windows defender updates all of that... you're just blocked out from using some of the more convenient 'home desktop' features, UI customization some things like that. So for me, I'm running a dual pure steam-stream rig... barely using windows as it is. Is windows basically just free now for non-desktop applications?
  7. I would double check the drivers and also the MSI interrupt config for the devices. Choppy sound, poor UI performance, overall instability, device activity adding to instability- usually a good cause of this is mismanaged interrupts, and having MSI disabled will cause windows to handle alll devices poorly because the GPU is choking them out.
  8. np! big thanks to @captain134 as i am now also running full hyper-v acceleration
  9. I would go with Steam maybe. I've got a dual windows steam streaming server running headless that i use for gaming. It's also super easy to hook in non-steam games and most of them work great. Also works the best with NVidia because of the NVFBC drivers As far as keeping GPU acceleration running, get a Fit Headless HDMI dongle. I use these for steam streaming and they've been great, $15 mocks a 1080p monitor. Also have a 4k version for a bit more
  10. hmm have you tried setting the machine as EFI (uses OVMF bios)?
  11. hmm does it actually solve it or just disable the extensions?
  12. Have you tried the MSI fix? http://lime-technology.com/wiki/index.php/UnRAID_6/VM_Guest_Support#Enable_MSI_for_Interrupts_to_Fix_HDMI_Audio_Support This was causing lots of stuttering and performance issues for me due to mismanaged interrupts. With this fix I was able to get back to using HPET (no hyperv) and everything is 110% perfect. EDIT: this issue can occur on almost any device. There's some failure in detecting it. When i updated drivers sometimes it detects it, sometimes it doesnt. But always keep a look out on MSI in your host 'lspci'
  13. You're using the i440fx emulated chipset, so you dont actually have to define another pci bridge/bus. It can sit on the root PCI bus without issue. Some drivers may complain about it, but the NV ones dont seem to. The bus configuration for windows is more about managing DMA memory space and for i440fx it seems to do it just fine on the root bus. Adding more bridges/buses windows is more likely to mismanage DMA space and not assign enough for the GPU. This can be a silent killer and slow down access to the GPU preventing it from performing to its max. i would remove the ioh3420 device, as well as the bus fields on the passthrough devices. GPU-z is also better at showing the PCI speeds because it shows both the max and current connection speeds. Newer GPUs will reduce their PCI spec for a power-saving effort, so you might see a max of 3.0 x16, but current of 1.0 x8
  14. 1) from what I can tell the hyperv extensions dont make any real difference. ive seen benchmarks that proved it (ill try to find link later), plus personal 'look and feel' ive noticed nothing. also my findings: *hyperv relaxed - mostly exists to surpress some noisy logs from windows 'watchdog timer' service. *vapic - virtual APIC controller. this flag just tells windows to manage its own interrupts instead of using the emulated hardware APIC. might matter for some virtual pci devices, but shouldnt actually affect passthrough devices. *spinlocks - think this tells windows to favor spinlocks instead of kernel locks when it can. this probably matters most when you have several windows guests and it causes the hypervisor to context switch too often. shouldnt matter with only a few guests *hypervclock - high performance virtual timer. It's good to have a high performance timer, but HPET works fine. I think this exists as an alternative to HPET which might not be as secure or scale as well or something. A good thing to keep in mind is that the hyperv extensions enable a hpyervisor to: run more guests and run more securely without performance penalty. So i think for the most case it doesnt apply to the handful of gaming guests. 2) My theory is its because of the GRID SDK. NVIDIA does actually offer commercial support for proprietary virtualization solutions. This can be seen first hand by the 'floating point' cloud computing units available through services like AWS. They basically virtualize GPU access to virtual machines. From what i can tell there was ways to subdivide cards and do other fun stuff and you were exposed raw CUDA devices to the virtual machine. I'm guessing the hyperv extensions tip off the drivers that there's additional hypervisor functionality coming its way to tell it what to do and it never gets it and shuts down. Plus the virtual devices exposed to the guest might not even be real GPUs and may be some generic CUDA device. Some day I might try to to wander through the grid SDK and see whats up. 3) Only windows. There's generally specialized 'virtual' kernels for linux that are designed to be guests only and offer some of the same performance advantages listed above. I think hyper-v is just also the microsoft preferred flavor. This is what i finally settled on for my nvidia guests: <features> <acpi/> <apic/> <hyperv> <relaxed state='off'/> <vapic state='off'/> <spinlocks state='off'/> </hyperv> <kvm> <hidden state='on'/> </kvm> </features> <clock offset='localtime'> <timer name='hypervclock' present='no'/> <timer name='rtc' present='no'/> <timer name='pit' present='no'/> <timer name='hpet' present='yes'/> </clock> EDIT: must make sure HPET is enabled in bios, x64 for x64
  15. So, big benefit of UEFI here- boot disk simplicity. An EFI boot partition is just a plain old partition (partition 1, guid etc) with files you can actually manage. If you swap around OSs and stuff as much as I do, it's wonderful to be able to preserve your boot partition, as well as the EFI boots for your other OSs etc. And if you use a smart EFI bootloader like clover etc. it is amazingly good at finding other EFI booters on any disks, and you can actually call on EFI booters from not-fat partitions like ext, ntfs and the like. I understand multi-booting and stuff like that defeats the purpose of unraid, but there's still lots of power in chaining EFI boots, plus just the simplicity.