JustOverride Posted April 20, 2020 Share Posted April 20, 2020 (edited) Right after starting a VM (game oriented VM with GPU passthro) One of my cores goes to 100% usage. It stays like that even if I shutdown the VM. Sometimes starting the VM is fine, but then upon shutting it down, it causes the whole UNRAID server to crash. I notice I can temporarily "fix" the issue, of one or more of the cores going to 100%, by going to Shares and editing ANY share at all. Even if is just the Share's comment section. Basically, as long as I can hit 'Apply' when editing a share it 'resolves' the issue and the CPU core(s) goes back to normal. Sometimes, more cores do this and eventually leading to unraid crashing and needing a forceful shutdown. All dockers were turned off. VM works fine by the way. This doesn't appear to be the case with the second VM that doesn't have a pass-throu GPU. It may be connected to something that gets 'restarted/cleared' upon editing one of the shares. I've looked around and the cases I've found were not resolved. This is my current diagnostics after the removal of the USB PCIE card, and a sound card I wasn't using. Currently, one of the cores is stuck at 100%. This happened after I started the VM. I shut it down, and it is still at 100%. The server is currently running a parity check from the last server crash. Edited January 25, 2021 by XiuzSu Quote Link to comment
JustOverride Posted April 22, 2020 Author Share Posted April 22, 2020 Update: So far, this is what I've found; I actually have 2 issue, the VM crashing unraid, and the core(s) going to 100% are two separate issues. I thought it was one. The core(s) going 100% appear to be caused by WSD. So I've disabled it to have fixed this issue. Before you ask, yes the server still appears under my network in Windows....after you've browsed to it directly, and it will disappear once you close the File Explorer. But who cares, you can still access it, or if you have it mapped it still there, still works, just not discovered automatically. Yes, its not great as we just got this feature recently (because it wasn't quiet working before on previous unraid versions unless you enabled SMB1 or something). Settings -> SMB I'm now looking at the issue of the VM crashing unraid upon shutting down. I found an old post here on the forums about a few things to try which is what I'm testing, so I will update on that soon. Quote Link to comment
JustOverride Posted April 23, 2020 Author Share Posted April 23, 2020 (edited) Update; [FIX to VM crashing unraid] Unraid was crashing whenever you sometimes reset/shutdown a VM due to the short 'disconnect' timer from unraid. Go to Settings->VM Manager->VM shutdown time-out, and set it to 300 (5 minutes). Go to Settings->Disk Settings->Shutdown time-out, and change the time-out to 420 (7 minutes). I'm not sure why the time-out would outright crash unraid, something must be wrong. If you're interested as to why this happens and want to know the full details, please read the post below: (edit) Update 4/24/2020 - 12:20am - While they where shutting down without issue, today the whole server crash... There appear to have been some corruption done... looks bad.. I can't even see my shares... contemplating setting the server on fire and setting it outside at this point. Edited April 24, 2020 by XiuzSu Quote Link to comment
JustOverride Posted April 29, 2020 Author Share Posted April 29, 2020 (edited) I'm still experiencing this issue which is making it impossible for me to even have VM's. I have read around the forum and I can't find the solution as similar problems just don't get replied to and slowly just disappear. Anyone have any ideas? At this time, I'm just testing if this is due to the GPU passthrough by using VNC as I've basically tried almost everything else. If this fails, I'm considering try to run ESXI and unraid on top solemmly for the data array. (Edit) I have been turning it off and on without issues after using VNC only. Edited April 29, 2020 by XiuzSu Quote Link to comment
JustOverride Posted May 1, 2020 Author Share Posted May 1, 2020 (edited) I guess this is for anyone who finds this threat in the future. I finally FIXED IT Special thanks to @peter_sm I followed a bunch of his threads and the information from the Lv1Forums where I found that adding "pcie_no_flr=1022:149c,1022:1487" would solve this issue. They (2 of them) suspected that this is related to the x79 platform (which is also what I have). Anyway, I have shut it down, restarted, and updated it more times that I cared to count without issues. I've also left the VM on over night, played heavy games on it, and there was still no issue. Now that everything is working as it should, I'm back to loving unraid. Edited May 1, 2020 by XiuzSu Link 1 Quote Link to comment
zionlion Posted May 12, 2020 Share Posted May 12, 2020 I am experiencing a similar problem with incorrectly displayed high CPU load and crashes of unraid, when starting or stopping my Win 10 VM. Can you tell me what your solution ( "pcie_no_flr=1022:149c,1022:1487" ) does? What are these pcie devices, I couldn't find anything about them in your diagnostic files. Quote Link to comment
JustOverride Posted May 13, 2020 Author Share Posted May 13, 2020 I don't have the specific details as to what it does, other than it fixes an issue with the kernel for the GPU and other devices passthro. I can only speculate to be honest. I'm just happy the issue was fixed. Quote Link to comment
steve1977 Posted November 8, 2020 Share Posted November 8, 2020 I seem to face the same issue. But just with one my VMs. I have a Windows VM (with GPU passthrough), which works very well. I also have a MacOS VM (with a different GPU passthrough), which crashes Unraid upon rebooting from within the VM. I don't fully understand what I need to add where to fix this. Any chance you can help elaborate? Quote Link to comment
JustOverride Posted November 8, 2020 Author Share Posted November 8, 2020 8 hours ago, steve1977 said: I seem to face the same issue. But just with one my VMs. I have a Windows VM (with GPU passthrough), which works very well. I also have a MacOS VM (with a different GPU passthrough), which crashes Unraid upon rebooting from within the VM. I don't fully understand what I need to add where to fix this. Any chance you can help elaborate? Sure, to help you, what is your server specs, is it x79 platform? Have you edited the XML files in the VM for the passthrou? Are the drivers up-to-date? Have you edited the bios on the graphics ROM to remove the header? I haven't tried a Mac OS yet only windows. Quote Link to comment
steve1977 Posted November 9, 2020 Share Posted November 9, 2020 Thanks for your help! I am on X299 platform (Asus X299-A). Yes, XML has been updated for my MacOS. My Windows XML is un-edited (and works). Drivers all up to date. Header is removed. I am thinking to bind my GPU in the vfio-pci settings. Worried it may break Unraid? It used to work, but seems it is recently broken. Not clear though what's different now. Maybe recent Unraid update or MacOS update? Quote Link to comment
clay_statue Posted November 28, 2020 Share Posted November 28, 2020 (edited) OMFG... this is the magic bullet that finally solved my wonky VM issues! I will be riding this wave of contentment and joy every time I get a clean shutdown from a VM for years to come. I've been having a heck of a time trying to get windows 10 to shutdown clean, it had been making unRaid freeze. Fortunately I could still execute a graceful shutdown because I deliberately left my keyboard on a non-passthrough USB controller. So although I was frozen out of the webgui and running headless, I could still blindly log into the terminal and type "powerdown". That probably saved me from hundreds of hard reboots and god knows how many hours of parity checks. For dunderheads like me who are still lost in the weeds and are desperately seeking further clarification I will spell it out in the excruciating detail I wish that I had... 1) Check your IOMMU groups for the number 1022:149c or 1022:1487 attached to a USB controller called "starship/matisse". If you are trying to pass that through, that's (at least part of) what's causing boot and/or shutdown problems with your VM. solutions... 2) Don't stub it and don't passthrough the entire controller, instead passthrough individual devices. This didn't work in my case because my Focusrite external sound card was giving me demonic sound unless I passed through the whole USB controller. (I also fell down the rabbit hole of fidgeting with MSI interrupts to no avail) or.... 3) Edit the /syslinux/syslinux.cfg file on your unRaid USB (don't use notepad, use notepad++ or wordpad if on windows). You will see the various unRaid boot menu options listed in there. Under the first menu option will be "append blehblehblehstuff initrd=/bzroot". That's where you need to put "pcie_no_flr=1022:149c,1022:1487" without the " ". If you typically boot from another menu option, put it there instead. In my case the file looked like default menu.c32 menu title Lime Technology, Inc. prompt 0 timeout 50 label Unraid OS menu default kernel /bzimage append pcie_no_flr=1022:149c,1022:1487 vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot label Unraid OS GUI Mode and so on... This worked to get windows shutting down nicely. However my Ubuntu VM was still not shutting down clean, even after appending the syslinux.cfg file. That's because this bug is a quirk between the linux kernel and this specific USB controller. Ubuntu was still flubbing the shutdown because it has more or less the same kernel as unRaid. So the final step is to get Ubuntu to behave itself... 4) Edit the Kernel Boot Parameters in /etc/default/grub by moving the cursor to the line beginning with "GRUB_CMDLINE_LINUX_DEFAULT" then edit that line, adding your parameter (pcie_no_flr=1022:149c,1022:1487) to the text inside the double-quotes after the words "quiet splash". (Be sure to add a SPACE after "splash" before adding your new parameter.) Click the Save button, then close the editor window. 5) sudo update-grub 6) restart ubuntu The earlier comments in this thread and the following three links are the source of everything I just described: https://forum.level1techs.com/t/attention-flr-kernel-patch-fixes-usb-audio-passthrough-issues-on-agesa-1-0-0-4b/151877 https://old.reddit.com/r/VFIO/comments/eba5mh/workaround_patch_for_passing_through_usb_and/ https://wiki.ubuntu.com/Kernel/KernelBootParameters Edited November 29, 2020 by clay_statue 1 Quote Link to comment
jamesy829 Posted January 5, 2021 Share Posted January 5, 2021 On 11/9/2020 at 12:17 AM, steve1977 said: Thanks for your help! I am on X299 platform (Asus X299-A). Yes, XML has been updated for my MacOS. My Windows XML is un-edited (and works). Drivers all up to date. Header is removed. I am thinking to bind my GPU in the vfio-pci settings. Worried it may break Unraid? It used to work, but seems it is recently broken. Not clear though what's different now. Maybe recent Unraid update or MacOS update? I am also seeing the same issue with the x299 platform, did you end up fixing the issue for your MacOS vm? Quote Link to comment
steve1977 Posted January 6, 2021 Share Posted January 6, 2021 18 hours ago, jamesy829 said: I am also seeing the same issue with the x299 platform, did you end up fixing the issue for your MacOS vm? Yes, still experiencing the crashing. But not always. It only crashes in two situations: 1) When passing through vbios. It works well when not passing throught he vbios (and the GPU is even identified without) 2) When having heavy transfer activity with a network disk It used to work flawless with the prior Macinabox with Catalina, but issue exists since moving to the new Macinabox with Big Sur. I have not changed my motherboard, but I have upgraded my CPU (to a 10980xe). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.