coppit

Community Developer
  • Posts

    498
  • Joined

  • Last visited

Everything posted by coppit

  1. Despite doing all the tweaks for VM performance, I was still having intermittent issues with latency spikes. Most of the time I could play games in Windows 10 without any issues, but occasionally there would be a hiccup that would knock my audio out. I had to switch audio devices back and forth to re-initialize it. Here's a list of all the things I can remember doing that helped: Use the isolcpus kernel option to prevent processes from running on my VM CPUs Pin the CPUs to the VMs in the XML config Make sure all Windows devices were using MSI interrupts I tried a bunch of other things that didn't help at all. Then I ran across /proc/interrupts. It showed that even with CPUs isolated and no VMs running, Linux was still putting IRQs on my so-called "isolated" CPUs. I wrote a tool (attached) to reformat the output to show the total count of interrupts on each CPU, and their names and relative contributions. Here's the output of the tool after running my VMs. I separated CPUs 15-22 and 23-31, as those are the ones allocated to my two Windows VMs. CPU0: 213497575 [132 (nvme0q0 32), 156 (vfio-msix[5](0000:03:00.0) 2), LOC (interrupts 198826728), RES (interrupts 233396), CAL (interrupts 14133666), TLB (shootdowns 303665), MCP (polls 86)] CPU1: 230485540 [128 (xhci_hcd 1033), LOC (interrupts 207630475), RES (interrupts 9997854), CAL (interrupts 12680386), TLB (shootdowns 175705), MCP (polls 87)] CPU2: 57069471 [133 (nvme0q1 1534), 152 (vfio-msix[1](0000:03:00.0) 4), LOC (interrupts 24404918), RES (interrupts 47193), CAL (interrupts 32449829), TLB (shootdowns 165906), MCP (polls 87)] CPU3: 12970342 [130 (ahci[0000:00:17.0] 2189433), LOC (interrupts 7598861), RES (interrupts 49447), CAL (interrupts 3119447), TLB (shootdowns 13067), MCP (polls 87)] CPU4: 42690394 [128 (xhci_hcd 85262), 157 (vfio-msix[6](0000:03:00.0) 4), LOC (interrupts 20370578), IWI (interrupts 2), RES (interrupts 42767), CAL (interrupts 22032259), TLB (shootdowns 159435), MCP (polls 87)] CPU5: 3718224 [16 (vfio-intx(0000:06:00.0) 64833), 130 (ahci[0000:00:17.0] 43065), 134 (nvme0q2 1295), LOC (interrupts 2378827), RES (interrupts 26167), CAL (interrupts 1195047), TLB (shootdowns 8903), MCP (polls 87)] CPU6: 38643823 [LOC (interrupts 21176519), IWI (interrupts 1), RES (interrupts 44285), CAL (interrupts 17282585), TLB (shootdowns 140346), MCP (polls 87)] CPU7: 3080525 [16 (vfio-intx(0000:06:00.0) 8258), 130 (ahci[0000:00:17.0] 13116), 135 (nvme0q3 837), LOC (interrupts 2152493), RES (interrupts 24560), CAL (interrupts 873251), TLB (shootdowns 7923), MCP (polls 87)] CPU8: 29710513 [129 (mei_me 49), 153 (vfio-msix[2](0000:03:00.0) 4), LOC (interrupts 12704086), IWI (interrupts 11), RES (interrupts 29275), CAL (interrupts 16728509), TLB (shootdowns 248492), MCP (polls 87)] CPU9: 3580425 [128 (xhci_hcd 26139), 136 (nvme0q4 678), 159 (vfio-msi[0](0000:01:00.0) 42180), LOC (interrupts 952215), RES (interrupts 23191), CAL (interrupts 2528085), TLB (shootdowns 7850), MCP (polls 87)] CPU10: 30773164 [130 (ahci[0000:00:17.0] 501096), LOC (interrupts 11560706), IWI (interrupts 7), RES (interrupts 31580), CAL (interrupts 18427101), TLB (shootdowns 252587), MCP (polls 87)] CPU11: 5620428 [131 (eth0 615637), 132 (nvme0q0 38), 137 (nvme0q5 4155), LOC (interrupts 1188584), RES (interrupts 24245), CAL (interrupts 3779950), TLB (shootdowns 7732), MCP (polls 87)] CPU12: 70693705 [148 (i915 185), LOC (interrupts 13170345), IWI (interrupts 5), RES (interrupts 35803), CAL (interrupts 57299597), TLB (shootdowns 187683), MCP (polls 87)] CPU13: 7612080 [131 (eth0 10553), 132 (nvme0q0 21870), 138 (nvme0q6 820), LOC (interrupts 1542426), RES (interrupts 22332), CAL (interrupts 6004952), TLB (shootdowns 9040), MCP (polls 87)] CPU14: 99064221 [128 (xhci_hcd 293), 154 (vfio-msix[3](0000:03:00.0) 2), LOC (interrupts 15491637), RES (interrupts 36128), CAL (interrupts 83350964), TLB (shootdowns 185110), MCP (polls 87)] CPU15: 8032905 [132 (nvme0q0 24), 139 (nvme0q7 368), LOC (interrupts 2738775), RES (interrupts 26176), CAL (interrupts 5258001), TLB (shootdowns 9474), MCP (polls 87)] CPU16: 95549070 [163 (vfio-msi[0](0000:01:00.1) 67), LOC (interrupts 81796090), RES (interrupts 20017), CAL (interrupts 13732327), TLB (shootdowns 482), MCP (polls 87)] CPU17: 18215426 [140 (nvme0q8 329811), 151 (vfio-msix[0](0000:03:00.0) 22), LOC (interrupts 16768877), RES (interrupts 746), CAL (interrupts 1002437), TLB (shootdowns 1106), MCP (polls 87), PIN (event 112340)] CPU18: 2578003 [152 (vfio-msix[1](0000:03:00.0) 26), 159 (vfio-msi[0](0000:01:00.0) 1608), LOC (interrupts 2267997), RES (interrupts 93), CAL (interrupts 250481), TLB (shootdowns 252), MCP (polls 87), PIN (event 57459)] CPU19: 7969434 [153 (vfio-msix[2](0000:03:00.0) 3), LOC (interrupts 3254584), RES (interrupts 94), CAL (interrupts 4017610), TLB (shootdowns 471), MCP (polls 87), PIN (event 696585)] CPU20: 3768833 [8 (rtc0 1), 154 (vfio-msix[3](0000:03:00.0) 12), 162 (vfio-msi[0](0000:00:1f.3) 761), LOC (interrupts 2876639), RES (interrupts 33), CAL (interrupts 598201), TLB (shootdowns 284), MCP (polls 87), PIN (event 292815)] CPU21: 8077829 [155 (vfio-msix[4](0000:03:00.0) 2), LOC (interrupts 3893519), RES (interrupts 84), CAL (interrupts 2024219), TLB (shootdowns 485), MCP (polls 87), PIN (event 2159433)] CPU22: 3475586 [156 (vfio-msix[5](0000:03:00.0) 7), LOC (interrupts 2781680), RES (interrupts 78), CAL (interrupts 450941), TLB (shootdowns 250), MCP (polls 87), PIN (event 242543)] CPU23: 83357830 [143 (nvme0q11 124217), 157 (vfio-msix[6](0000:03:00.0) 18), LOC (interrupts 81715105), RES (interrupts 608), CAL (interrupts 1517439), TLB (shootdowns 356), MCP (polls 87)] CPU24: 15859395 [LOC (interrupts 15703633), RES (interrupts 99), CAL (interrupts 140581), TLB (shootdowns 705), MCP (polls 87), PIN (event 14290)] CPU25: 5490743 [LOC (interrupts 5456928), RES (interrupts 103), CAL (interrupts 27299), TLB (shootdowns 235), MCP (polls 87), PIN (event 6091)] CPU26: 5389008 [LOC (interrupts 5132122), RES (interrupts 57), CAL (interrupts 233201), TLB (shootdowns 99), MCP (polls 87), PIN (event 23442)] CPU27: 5716421 [LOC (interrupts 5644566), RES (interrupts 62), CAL (interrupts 59173), TLB (shootdowns 98), MCP (polls 87), PIN (event 12435)] CPU28: 6081679 [LOC (interrupts 5993507), RES (interrupts 104), CAL (interrupts 66777), TLB (shootdowns 291), MCP (polls 87), PIN (event 20913)] CPU29: 6120828 [LOC (interrupts 6063599), RES (interrupts 109), CAL (interrupts 46776), TLB (shootdowns 310), MCP (polls 87), PIN (event 9947)] CPU30: 11159 [LOC (interrupts 1070), CAL (interrupts 10002), MCP (polls 87)] CPU31: 11116 [LOC (interrupts 1068), CAL (interrupts 9961), MCP (polls 87)] Researching this issue led me to this bug report about the unsuitability of Linux for running realtime workloads on isolated CPUs, because the interrupts would introduce unpredictable jitter. (Sound familiar?) That bug report has a lot of things that don't apply to UNRAID, since they're running a realtime kernel. (kthread_cpus sounds great for helping with isolation, as does the managed_irq parameter to the isolcpus option.) But one thing that we can do is enable nohz_full. This tells the kernel to use less aggressive scheduling-clock interrupts to certain CPUs, provided that they aren't running more than one process. (This is the case with our pinned CPUs.) For more information, see here. Using this feature is simple. Just use the same isolcpus in the nohz and nohz_full kernel options: isolcpus=16-31 nohz=on nohz_full=16-31 I set my VMs to not auto-start, and rebooted UNRAID. This is the result: CPU0: 211326 [LOC (interrupts 159770), RES (interrupts 14287), CAL (interrupts 31688), TLB (shootdowns 5580), MCP (polls 1)] CPU1: 263785 [LOC (interrupts 227258), RES (interrupts 14634), CAL (interrupts 18335), TLB (shootdowns 3556), MCP (polls 2)] CPU2: 133637 [LOC (interrupts 92213), RES (interrupts 1351), CAL (interrupts 33165), TLB (shootdowns 6906), MCP (polls 2)] CPU3: 44003 [LOC (interrupts 24973), RES (interrupts 867), CAL (interrupts 15688), TLB (shootdowns 2473), MCP (polls 2)] CPU4: 101902 [LOC (interrupts 72210), RES (interrupts 1493), CAL (interrupts 20602), TLB (shootdowns 7595), MCP (polls 2)] CPU5: 37533 [134 (nvme0q2 31), LOC (interrupts 22811), RES (interrupts 790), CAL (interrupts 11852), TLB (shootdowns 2047), MCP (polls 2)] CPU6: 105059 [LOC (interrupts 77821), RES (interrupts 1387), CAL (interrupts 18969), TLB (shootdowns 6880), MCP (polls 2)] CPU7: 30248 [135 (nvme0q3 33), LOC (interrupts 15473), RES (interrupts 766), CAL (interrupts 11855), TLB (shootdowns 2119), MCP (polls 2)] CPU8: 111403 [129 (mei_me 49), LOC (interrupts 85072), IWI (interrupts 5), RES (interrupts 1151), CAL (interrupts 17624), TLB (shootdowns 7500), MCP (polls 2)] CPU9: 56114 [128 (xhci_hcd 25604), 136 (nvme0q4 1), LOC (interrupts 16371), RES (interrupts 868), CAL (interrupts 11570), TLB (shootdowns 1698), MCP (polls 2)] CPU10: 543178 [130 (ahci[0000:00:17.0] 416220), LOC (interrupts 95483), IWI (interrupts 1), RES (interrupts 1234), CAL (interrupts 23381), TLB (shootdowns 6857), MCP (polls 2)] CPU11: 35243 [132 (nvme0q0 38), 137 (nvme0q5 130), LOC (interrupts 20704), RES (interrupts 1063), CAL (interrupts 11712), TLB (shootdowns 1594), MCP (polls 2)] CPU12: 112279 [148 (i915 185), LOC (interrupts 78004), IWI (interrupts 5), RES (interrupts 1469), CAL (interrupts 25689), TLB (shootdowns 6925), MCP (polls 2)] CPU13: 55567 [131 (eth0 7971), 138 (nvme0q6 28), LOC (interrupts 33595), RES (interrupts 759), CAL (interrupts 11584), TLB (shootdowns 1628), MCP (polls 2)] CPU14: 110571 [LOC (interrupts 83220), RES (interrupts 1279), CAL (interrupts 19440), TLB (shootdowns 6630), MCP (polls 2)] CPU15: 31638 [139 (nvme0q7 39), LOC (interrupts 17238), RES (interrupts 829), CAL (interrupts 11571), TLB (shootdowns 1959), MCP (polls 2)] CPU16: 9986 [LOC (interrupts 985), CAL (interrupts 8999), MCP (polls 2)] CPU17: 9979 [LOC (interrupts 984), CAL (interrupts 8993), MCP (polls 2)] CPU18: 9980 [LOC (interrupts 983), CAL (interrupts 8995), MCP (polls 2)] CPU19: 9974 [LOC (interrupts 979), CAL (interrupts 8993), MCP (polls 2)] CPU20: 9984 [8 (rtc0 1), LOC (interrupts 977), CAL (interrupts 9004), MCP (polls 2)] CPU21: 9975 [LOC (interrupts 974), CAL (interrupts 8999), MCP (polls 2)] CPU22: 9972 [LOC (interrupts 975), CAL (interrupts 8995), MCP (polls 2)] CPU23: 9953 [LOC (interrupts 959), CAL (interrupts 8992), MCP (polls 2)] CPU24: 9965 [LOC (interrupts 959), CAL (interrupts 9004), MCP (polls 2)] CPU25: 9944 [LOC (interrupts 948), CAL (interrupts 8994), MCP (polls 2)] CPU26: 9945 [LOC (interrupts 948), CAL (interrupts 8995), MCP (polls 2)] CPU27: 9942 [LOC (interrupts 942), CAL (interrupts 8998), MCP (polls 2)] CPU28: 9940 [LOC (interrupts 938), CAL (interrupts 9000), MCP (polls 2)] CPU29: 9929 [LOC (interrupts 926), CAL (interrupts 9001), MCP (polls 2)] CPU30: 9923 [LOC (interrupts 923), CAL (interrupts 8998), MCP (polls 2)] CPU31: 9923 [LOC (interrupts 926), CAL (interrupts 8995), MCP (polls 2)] As you can see, interrupts on my isolated cores are way down. Interestingly, once you start the VMs, interrupts start to climb on those cores, but remain well below that of cores 0-14. The first report I showed was actually what it looks like with nohz_full, after rebooting and running my two VMs overnight. Inside Windows, using the DPC latency checker, my graph went from averaging around 2000 microseconds to 1000 microseconds. As shown in the image below, I still get nasty latency spikes every now and then (that one killed my USB audio), but overall latency is much better. That image was captured while running Overwatch in the background and watching a YouTube video in the foreground. When idling the latency hovers around 600 microseconds, hitting 1000 microseconds about 20% of the time. monitor_irqs.sh
  2. A couple of things I noticed... I got this working, but both my keyboard and my mouse seem to go to "sleep" after 3 seconds. I have to mash a bunch of keys or frantically move the mouse to wake the device up. After that the mouse and keyboard work fine, until I don't use them for 3 seconds. It looks like "powertop --auto-tune" was to blame. FYI, in case anyone else is running into this. I have two identical keyboards for two different VMs. Sadly, /dev/input/by-name only lists one of them. I'm using by-path, but I don't think that's any better than using the raw /dev/input/event. I wasn't sure how this would interact with USB Manager, where I had bound root hubs to the VMs. I unmapped those and instead mapped only the audio devices, leaving the keyboard and mouse to the XML config. Edit: Forget the comment below. I needed to use the one ending with "event-mouse" rather than just "mouse". I'm having trouble with one of my mice... I get this error: internal error: qemu unexpectedly closed the monitor: 2023-04-18T05:26:01.673092Z qemu-system-x86_64: /dev/input/by-id/usb-Gaming_Mouse_Gaming_Mouse-if01-mouse: is not an evdev device I tried with "/dev/input/by-id/usb-Gaming_Mouse_Gaming_Mouse-if01-event-mouse" as well and got the same error. Is there any hope for this mouse?
  3. @byb I think you want to remove the 2nd "<qemu:arg value="-object"/>" if you're not doing a mouse as well as the keyboard.
  4. I'm having trouble enabling amd_pstates with my 1950X Threadripper, on an ASRock X399 Taichi motherboard. I have the latest BIOS. I added "amd_pstate=passive" as a kernel boot option. When I run "cpufreq-info", I see that it's still using "acpi-cpufreq" as the driver. In dmesg I see "amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled". Given that I see tons of ACPI messages, I'm guessing that ACPI is not disabled. P.S. I don't know if it's related, but I'm also seeing this: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0) ACPI: \_PR_.C000: Found 2 idle states
  5. No. Using the Oculus Quest with wireless instead.
  6. Precisely. The version of Java inside the current container is 16. We need an updated container.
  7. In case it helps anyone... I ran sudo fs_usage -w | grep Volumes.Backups to see what was accessing my backups volume, besides my backup software. (I'm using Carbon Copy Cloner instead of Time Machine.) It turns out that Finder was updating every time the directory was being written, because I had a window open to the directory. After closing that window, speeds shot up. Hopefully that helps someone...
  8. Does anyone know how things change for USB Type C ports? I see the device under usb1, but after I pass that through, the VM doesn't recognize the device on that port. (An Oculus Quest.) I'm ordering a Type C -> USB A cable in the meantime to try a USB3 port.
  9. This container stopped working a bit ago due to an API change. In updating it, I realized that FileBot went to a paid model, and that there's another better supported container available. I'm removing this container. Please use the other one. Personally, I'm switching to Sonarr.
  10. How about if I added a bash script to the config dir, and you could put any command you want in there, including "yum install comskip"?
  11. I reverted the Alpine Linux changes late last night, so things should be back to how they were.
  12. Are you saying that where it prints: Configuration: USERNAME=foo PASSWORD=<hidden> DOMAINS=bar INTERVAL=baz DEBUG=zap It displays your password in the USERNAME line? A couple of thoughts: Make sure that your password is within single-quotes like: PASSWORD='your password goes in here' However, if your password contains a single quote, then you have to escape it. LMK if that's the case. Second, make sure that you aren't defining environment variables for the container that are overriding the config file.
  13. Moved to Alpine linux, reducing the image size from 232MB to 90MB Also: Make the change monitors more like services that restart if they crash. Added a feature to use polling, so that Windows shares can be monitored too.
  14. Does the client connect and complain about the version? Or does it not connect at all to the port? Are you remapping 10090 or 8090, or both?
  15. I moved the image to Alpine linux, which drops the size from 229MB to 64MB. I also added the ability to configure it using env vars instead of the config file. This is a pretty big change, so let me know if you guys see any issues.
  16. I switched the container from the phusion base image to Alpine linux. The size dropped from about 250MB to about 16MB. It also handles options set using environment variables rather than the config file better. This is a pretty big change, so let me know if you guys see any issues.
  17. New update is out. I moved from the phusion (Ubuntu-based) base image to Alpine. The image is 5x smaller. Also, you can now specify all the settings using only environment variables. In this case you don't need the config file. Env vars take precedence over the config file values.
  18. Hi everyone, I found a bug in the container that causes the recordings to be saved inside the container, instead of to the attached disk volume. I'll release a fix soon. You must run the following command before updating your container, or your videos will be lost! docker exec -it Xeoma bash -c 'cd /usr/local/Xeoma/XeomaArchive; for n in *;do mv "$n"/* "/archive/$n";done' I'll wait about a day before releasing the fix, in case anyone has auto-update configured for their docker containers.
  19. The author (or moderators of Community Applications) of the plugin template (https://raw.githubusercontent.com/coppit/unraid-mosh/master/mosh.plg) has specified that this plugin is incompatible with your version of unRaid (6.5.3). You should uninstall the plugin here: Maximum OS Version: 6.4.1 I didn't specify any such thing. Tested as working with 6.5.3. How do I get this fixed?
  20. User reports it works: https://github.com/coppit/unraid-mosh/issues/1 I installed it and it works as well. Are you the keeper of plugins? Can someone please point me to the latest plugin documentation? My plugin files aren't part of a txz like I see others have. When it gets installed by the system, the directory is missing "other" read and execute permissions, which means that emhttp can't read the icon. I also see that the latest plugin plg's don't have "iconfile". I'd like to renovate my plugin, but don't know how.
  21. Can someone please point me to the latest XML template for plugins, so that I can update my mosh plugin? It's currently not even showing up in CA, I guess because it's assumed to be incompatible with the latest version of UNRAID until I indicate otherwise.
  22. Whoa. My SageTV and UNRAID worlds are colliding! To be honest, I'm not very familiar with USB devices and docker. This post seems to suggest that you can pass the devices through. The container is based on Ubuntu, so you could try getting into it with "docker exec -it xeoma bash" and installing USB drivers there. Heck, maybe it will "just work" without any special drivers, being "universal" and all. If that works, I could see about including those drivers in the container. But to be honest, USB cameras are not that great. I would get HikVision cameras, which are affordable, have good features, and support power-over-ethernet. Then maybe get some powerline adapters to provide PoE and also networking over your power outlets. Something like this and this.