m00nman Posted January 22 Share Posted January 22 (edited) I've switched from Proxmox to unraid not that long ago and I've run into several problems/inconveniences that I managed to overcome and would like to share with everyone as it doesn't seem to be a common knowledge. The issues: 1. Windows VM (Win10 1703 +) high CPU usage when idle. Pretty self explanatory. Default settings the VMs are created with cause CPU to be busy with interrupts contstantly. 2. No KSM enabled by default = much less ram is free/available to other services when running 2+ windows VMs at the same time. This has caused OOM (out of memory) to kick in and kill one of the VMs over time when docker containers start using more RAM. This will probably be useful to people with limited RAM on their servers. I only have 32GB myself and this made a huge difference for me. 3. CPU pinning is the default in unRaid. This is great to isolate certain cores to only be used with a certain VM in some situation, when for example, unraid is also your main PC and you want some cores dedicated to the VM that you use day to day to play games or whatever else you do, but terrible for server workloads, especially if your server doesn't have that many cores and a lot of containers/services/VMs and there is no way to know which core will be loaded at any given time, while others are idling. Solutions: 1. I stumbled upon a thread on the forums that recommended enabling HPET timer which seemed to resolve the issue somewhat. The issue is that HPET is an unreliable clock source and often goes out of sync. The real solution is to enable Hyper-V Enlightenments which was introduced in qemu 3.0. It is already partially enabled in unRaid by default. This is what Proxmox uses by default for Windows VMs Go to settings for your Windows VM, and enable XML view in the upper right corner. We will need to edit 2 blocks: add the following to <hyperv mode='custom'> block <vpindex state='on'/> <synic state='on'/> <stimer state='on'/> add the following to <clock offset='localtime'> block <timer name='hypervclock' present='yes'/> In the end, it should look like this The bonus is that this reduces idle CPU usage even further compared to HPET, without all of the HPET drawbacks. Please note this ONLY applies to Windows VMs. Linux and *BSD already use a different paravirtualized clock source. 2, unRaid does come with a kernel that has KSM (kernel samepage merging) enabled (thank you, unraid dev team). What it does is it looks for identical pages in memory for multiple VMs and replaces them with write-protected single page, thus saving (a lot) of RAM. The more similar VMs you have, the more ram you will save with almost no performance penalty. To enable KSM at runtime append the following line to /boot/config/go echo 1 > /sys/kernel/mm/ksm/run And remove the following block from all of the VMs configs that are subject to KSM: <memoryBacking> <nosharepages/> </memoryBacking> Let it run for an hour or 2, and then you can check if it's actually working (besides seeing more free ram) by cat /sys/kernel/mm/ksm/pages_shared The number should be greater than 0 if it's working. If it isn't working then either your VMs aren't similar enough, or your server hasn't reached the threshold of % used memory. The result (This is with Windows 11 and Windows Server 2022 VMs, 8GB ram each) 3. We want to disable CPU pinning completely and let the kernel deal with scheduling and distributing load between all the cores on the CPU. Why is CPU pinning not always good? Let's assume you did your best to distribute and pin cores to different VM. For simplicity let's assume we have a 2 core CPU and 4 VMs. We pin core #1 to VM1 and VM3, and core #2 to VM2 and VM4. Now it so happened that VM1 and VM3 started doing something CPU intensive at the same time and they have to share that core #1 between the two of them all while core #2 is doing completely nothing. By letting kernel schedule the load without pinning it will distribute the load between both cores. Let's go back into the VM settings and Delete the following block <cputune> . . . </cputune> Make sure that the line <vcpu placement='static'>MAX_CPU_NUMBER</vcpu> and <topology sockets='1' dies='1' cores='MAX_CPU_NUMBER' threads='1'/> still has the maximum number of cores your VM is allowed to use (obviously MAX_CPU_NUMBER is a number of cores you want to limit this particular VM to) NOTE: if you switch back from XML view to the basic view and change some setting (could be completely unreleated) and save, unraid may overwrite some of these settings. Particularly I noticed that it likes to overwrite max cores assigned to VM to just a single core. You will just need to change back to XML view and change "vcpu placement" and "topology" again Bonus: - Make sure you are only using VirtIO devices for storage and network - For "network model" pick "virtio" for better throughput ("virtio-net" is the default) - If you have Realtek 8125[A || B] network adapter and having issues with throughput, have a look at @hkall comment below. Edited January 22 by m00nman 1 1 Quote Link to comment
LumpyCustard Posted February 20 Share Posted February 20 Thank you SO MUCH for this. Your post resolved issues with a small production server I created for a family friend running 11 individual Windows 10 VM's. Before (all VM's sitting on login screen, completely idle) After: 1 Quote Link to comment
m00nman Posted February 20 Author Share Posted February 20 Thanks for the feedback. That's a beefy CPU and should never be maxed out when all VMs are just idling. I wish unraid team just implemented these fixes into the image, but VMs is probably not the main focus of unraid. 2 Quote Link to comment
m00nman Posted February 28 Author Share Posted February 28 (edited) Posted in the "feature requests" subforum to include these by default. I doubt it will get any traction though. Edited February 28 by m00nman 1 1 Quote Link to comment
DebrodeD Posted March 23 Share Posted March 23 This is amazing! I'm in the process of building an Unraid Server to replace both my current plex server and gaming pc. I game very rarely (like once every few weeks) and so I was hoping to use a windows VM that had access to the majority of the resources on my unraid server, but would only need them for a while and would then free-up when not in use and hibernating (and then using wake on lan) to wake it up when needed again). Would this allow me to accomplish that? Specs of my server: i5-8500 (6 core, no HT) 16GB DDR4 1TB NVME Cache Drive 1TB NVME unassigned drive (was hoping to use this for windows vm/steam installs) AMD RX6400 A bunch of bigger drives for the array I obviously don't have enough cores/ram to just be assigning them away to a VM permanently, so would your method allow me to have my cake and eat it too? Quote Link to comment
mitch98 Posted March 25 Share Posted March 25 (edited) I've been having the same issues and I am honestly about to kick the server over and just give up on self-hosting. This isn't my first rodeo with self-hosting (I've run XCP-ng, Hyper-V etc. in the past without a hitch and have been in the IT industry for many years so I've had exposure to all the big players) but it has by far been the most painful experience with getting VMs working. Docker containers? Works great in my testing so far. WebUI? Clean, relatively intuitive and responsive. Setting up SMB/NFS shares? Easy. Have had minimal issues. Running Linux VMs? No dramas at all from what I've seen. Running a Windows VM? No chance in my (albeit limited) experience on Unraid. The CPU in my Windows 11 VM is pegged at 100% while idling, with MS Edge consuming the most CPU (I don't believe that is accurate as Edge isn't even running, so it's just a background process.) It's a brand new install of Windows and I haven't even installed any apps. Host CPU util spikes to 100% on random cores. My system is an HP ML380 Gen 9 with an Intel Xeon E3-1240 v5, 64 GB of ECC RAM, 4 disks in my array (3 data 1 parity, two of the disks are spun down as I haven't filled up the array enough yet and I have idle spin-down enabled) plus a 500 GB SSD cache drive. This shouldn't be happening. Disk for the VM sits under /mnt/users/domains/<folder_for_vm> and I have the cache for the domains share set to Prefer: Cache I'm using virt-io for everything. I've installed the latest virt-io/KVM drivers on the VM. I implemented your fixes and it seemed to alleviate some of the burden on the host (Unraid shows slightly less CPU util and not across all cores now) but still the same, unusable performance results inside Windows. I'm not even trying to run a gaming VM or anything, it's more of a jump box/VDI for testing. Really disappointing, this might force me to move away from Unraid and just run a full XCP-ng/Xen Orchestra stack with a TrueNAS Scale VM and direct passthrough for my storage needs. This over-complicates my setup and locks me into ZFS along with it's ECC memory "requirements" (no one seems to have a straight answer as to whether it's a requirement or just strongly recommended. I have ECC RAM but my next server might not which limits my future expansion) ZFS/TrueNAS also have stricter disk requirements so I can't just slap in whatever I have laying around, not a concern for now as all my disks are the same (came with the server) but they are old and one's bound to fail soon, I don't intend on purchasing an OEM replacement direct from HP when I have a plethora of other 3.5" drives on hand. Not sure if this is an issue specific to the version of Unraid I am running (6.11.5) but it's a real shame as Unraid was gearing up to be the "all-in-one" solution that would've suited my environment well. Glad I'm only running the trial version but I've still poured more hours into configuring this than I should have given these results. Any advice from anyone would be appreciated as I'm about to pull the plug on the whole thing and go back to the drawing board. Edited March 25 by mitch98 Quote Link to comment
m00nman Posted March 25 Author Share Posted March 25 16 hours ago, mitch98 said: Any advice from anyone would be appreciated as I'm about to pull the plug on the whole thing and go back to the drawing board. Sorry I can't suggest more than what's already described in OP. #1 Enabling Hyper-V enlightenments should fix the issue you are having if done correctly, but like I said, if you are switching back and forth between XML view and regular view with saves in between, you are most likely losing some of the configuration (as unraid just overrides it). I guess there is one more thing: Settings -> VM Manager -> Default VM storage path change directly to cache instead of /mnt/user/*. /mnt/user path is FUSE mount with, I believe, a custom driver that unites all filesystems into a single virtual one, similar to 'mergerfs' on mainline linux distros. The point is, it can be very expensive in terms of CPU time so you can probably take some burden off of the CPU by just going directly to the SSD mount (which does not use FUSE ). For example one of my SSD 'pools' (it's really just a single SSD) is named "appstorage". So I changed "Default VM storage path" from "/mnt/user/domains/" to "/mnt/appstorage/domains/". If you had already created VMs with the old settings you will need to either recreate them, or copy the Primary vDisk Location image to your new path and change the setting in the VM itself. I agree, I was disappointed in unraid coming from much more enterprise oriented systems, but unlike you I had already paid when I discovered these issues. It is working for me now, but it very inflexible, compared to even Proxmox. ------- ECC Ram is not a hard requirement for ZFS. You can run without it, the fear is bit flips that happen because of natural radiation (seriously, solar flares etc). can corrupt the filesystem and ZFS stores critical data in RAM while it's running (unlike other fs). One bit flip can lead to data corruption of the whole array. So it is strongly recommended to have ECC ram, but not strictly required. Quote Link to comment
m00nman Posted March 25 Author Share Posted March 25 On 3/23/2023 at 12:15 PM, DebrodeD said: I obviously don't have enough cores/ram to just be assigning them away to a VM permanently, so would your method allow me to have my cake and eat it too? I'll be honest, I have never tried to hibernate VMs before as all my VMs are meant ot be running 24/7. I know that when I forgot to turn off sleep timer on a windows VM it would go into sleep but would never wake up after so I had to force stop it and restart. Hibernate might be different. Unfortunately I can't try it because it is not available for me (I turned off sleep functionality in windows, that might be why the option isn't there). Quote Link to comment
Jorgen Posted March 26 Share Posted March 26 On 3/25/2023 at 12:57 PM, mitch98 said: Running a Windows VM? No chance in my (albeit limited) experience on Unraid. Since you’re new to unraid, have you looked at Spaceinvaderone’s video guides? There are tweaks you can do on the Windows side to get it to work better as an unraid VM. I had similar CPU spiking issues until I tweaked the MSI interrupt settings inside Windows. The hyper-v changes in this thread also helped of course. I’m not actually sure if the MSI interrupts were covered in this video series, could also have been in: Quote Link to comment
mitch98 Posted March 29 Share Posted March 29 Thanks all for your responses, I'll keep looking at it and see how I go. Much appreciated. Quote Link to comment
mitch98 Posted May 1 Share Posted May 1 Update on my Windows 11 VM: It looks like I've been able to squeeze moderate performance out of a fresh Windows 11 VM now by using all of the tips outlined in this post as well as a few others that I will mention below. I suspect some of the issues I've experienced are due to a number of factors, one of which I suspect is the Windows Core Isolation setting in Windows Security. I have had a Windows 10 VM running for about a month now without any major issues and, while performance is not great, it's serviceable for my needs. The only difference I found between the VMs is that Windows Core Isolation setting being turned OFF on the Windows 10 VM (that is working) and turned ON in the Windows 11 VM that was experiencing issues. I also believe that Bitlocker drive encryption may be a contributing factor as I also tested migrating an existing Windows 11 Hyper-V VM (with Bitlocker enabled) to Unraid which worked but was met with the 100% CPU utilization problem again despite any tweaks I made. I could've tried turning off Bitlocker on the VM to see if it helped but I decided to leave that rabbit hole for another day. TL;DR: I think there are a lot of "gotchas" with regards to Windows VM performance in Unraid and it's not exactly straight-forward. I guess some of these could be mitigated by including the XML changes mentioned in this post in the Windows 11 VM template. Quote Link to comment
DasMarx Posted May 18 Share Posted May 18 (edited) I tried this on 6.12.0rc5 and it works great to some regards. The idle consumption went from 100% to 20% on my windows vm. However I am not able to open the VM page anymore. I can still the see VMs in the Dasboard and stop/start it but I can't change the VM anymore when clicking on edit as no XML selection is loaded. Is there any other way to change the xml of my VM once it is in this broken state? This may also be a bug in 6.12.0rc5. Edited May 18 by DasMarx host => vm Quote Link to comment
Mrtj18 Posted Tuesday at 03:36 AM Share Posted Tuesday at 03:36 AM I just would like to add my experience in fixing my windows 11 vm once it got to a broken state and would not boot. Becuase I was trying to fix the dredded mouse lag that was slowing my windows VM to a crawl with 100% CPU usage. I tried all other fixes i could find on the web, then I attempted to play with the MSI interrupts and my windows VM became unbootable. ( Be careful attempting to fix things with this, yours could break too) FYI- My windows install was on a seperate NVME drive outside of Unraid. I had to completly reset and whipe my windows install. ( I had previously broken it, due to my PC cutting off in the middle of a windows update) So I had to create a new windows VM with the virt io drivers and windows install.iso, listed on the template, with USB selected and I had to play with the install to get windows to give me the blue screen and allow me to select the "reset this PC option". ( I did this within the Unraid OS, I did not reset my windows drive OUTSIDE of Unriad) I had to do this all within the VNC option. Attemting to boot with my GPU passthrough i was getting a solid black screen. Only after the PC completly reset the NVME drive that had windows on it, and I got to the windows 11 setup screen, then was I able to reset the VM, boot with the GPU passed through, then continue the install process. Now My windows VM is back to normal no more lag, and crazy 100% cpu spikes while trying to game. Working sound and even didnt have to use the MSI interrupts.😁 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.