[SOLVED] VM Start-up/Shutdown crashes UNRAID


Recommended Posts

Right after starting a VM (game oriented VM with GPU passthro) One of my cores goes to 100% usage. It stays like that even if I shutdown the VM. Sometimes starting the VM is fine, but then upon shutting it down, it causes the whole UNRAID server to crash.

I notice I can temporarily "fix" the issue, of one or more of the cores going to 100%, by going to Shares and editing ANY share at all. Even if is just the Share's comment section. Basically, as long as I can hit 'Apply' when editing a share it 'resolves' the issue and the CPU core(s) goes back to normal. Sometimes, more cores do this and eventually leading to unraid crashing and needing a forceful shutdown. All dockers were turned off. VM works fine by the way. This doesn't appear to be the case with the second VM that doesn't have a pass-throu GPU.

It may be connected to something that gets 'restarted/cleared' upon editing one of the shares. I've looked around and the cases I've found were not resolved.
 


 


unraid1.png.8f9523c284ed01062a0d0fcfc01b52a0.png
unraid2.png.580aeb4d24c989aec546f525816d43dc.png
unraid3.png.33c69e167935c53fde9b5c0c0b447210.png
unraid5.png.0ea492600635e0e25de4ad63f959b9d5.pngunraid4.thumb.png.dfde6e7f119d86a38ae153ab83020cb0.pngunraid6.thumb.png.8976fab541c5c263628cc48bdc39bf91.png

 




This is my current diagnostics after the removal of the USB PCIE card, and a sound card I wasn't using. Currently, one of the cores is stuck at 100%. This happened after I started the VM. I shut it down, and it is still at 100%.

The server is currently running a parity check from the last server crash.

 

 

Edited by XiuzSu
Link to comment

Update: So far, this is what I've found; 
 

I actually have 2 issue, the VM crashing unraid, and the core(s) going to 100% are two separate issues. I thought it was one.

 

The core(s) going 100% appear to be caused by WSD. So I've disabled it to have fixed this issue. Before you ask, yes the server still appears under my network in Windows....after you've browsed to it directly, and it will disappear once you close the File Explorer. But who cares, you can still access it, or if you have it mapped it still there, still works, just not discovered automatically. Yes, its not great as we just got this feature recently (because it wasn't quiet working before on previous unraid versions unless you enabled SMB1 or something). 
 
Settings -> SMB

image.png.159d42a1e96afd8526c49fab4be3d12a.png

I'm now looking at the issue of the VM crashing unraid upon shutting down. I found an old post here on the forums about a few things to try which is what I'm testing, so I will update on that soon.
 

Link to comment

Update;

 

[FIX to VM crashing unraid]

 

Unraid was crashing whenever you sometimes reset/shutdown a VM due to the short 'disconnect' timer from unraid.
Go to Settings->VM Manager->VM shutdown time-out, and set it to 300 (5 minutes).
Go to Settings->Disk Settings->Shutdown time-out, and change the time-out to 420 (7 minutes).

 

I'm not sure why the time-out would outright crash unraid, something must be wrong. If you're interested as to why this happens and want to know the full details, please read the post below:
 


(edit) Update 4/24/2020 - 12:20am - While they where shutting down without issue, today the whole server crash...
There appear to have been some corruption done... looks bad.. I can't even see my shares... 
contemplating setting the server on fire and setting it outside at this point.

Edited by XiuzSu
Link to comment

I'm still experiencing this issue which is making it impossible for me to even have VM's. I have read around the forum and I can't find the solution as similar problems just don't get replied to and slowly just disappear. Anyone have any ideas? 

 

At this time, I'm just testing if this is due to the GPU passthrough by using VNC as I've basically tried almost everything else. If this fails, I'm considering try to run ESXI and unraid on top solemmly for the data array.

 

 

(Edit) I have been turning it off and on without issues after using VNC only.

Edited by XiuzSu
Link to comment

I guess this is for anyone who finds this threat in the future.

I finally FIXED IT 
Special thanks to @peter_sm

I followed a bunch of his threads and the information from the Lv1Forums where I found that adding "pcie_no_flr=1022:149c,1022:1487" would solve this issue. They (2 of them) suspected that this is related to the x79 platform (which is also what I have).

Anyway, I have shut it down, restarted, and updated it more times that I cared to count without issues. I've also left the VM on over night, played heavy games on it, and there was still no issue. Now that everything is working as it should, I'm back to loving unraid. 

Edited by XiuzSu
Link
  • Thanks 1
Link to comment
  • JorgeB changed the title to [SOLVED] VM Start-up/Shutdown crashes UNRAID
  • 2 weeks later...

I am experiencing a similar problem with incorrectly displayed high CPU load and crashes of unraid, when starting or stopping my Win 10 VM.

Can you tell me what your solution ( "pcie_no_flr=1022:149c,1022:1487" ) does? What are these pcie devices, I couldn't find anything about them in your diagnostic files. 

Link to comment
  • 5 months later...

I seem to face the same issue. But just with one my VMs. I have a Windows VM (with GPU passthrough), which works very well. I also have a MacOS VM (with a different GPU passthrough), which crashes Unraid upon rebooting from within the VM.


I don't fully understand what I need to add where to fix this. Any chance you can help elaborate?

Link to comment
8 hours ago, steve1977 said:

I seem to face the same issue. But just with one my VMs. I have a Windows VM (with GPU passthrough), which works very well. I also have a MacOS VM (with a different GPU passthrough), which crashes Unraid upon rebooting from within the VM.


I don't fully understand what I need to add where to fix this. Any chance you can help elaborate?

Sure, to help you, what is your server specs, is it x79 platform?

Have you edited the XML files in the VM for the passthrou? 

Are the drivers up-to-date?

Have you edited the bios on the graphics ROM to remove the header? 

I haven't tried a Mac OS yet only windows.

Link to comment

Thanks for your help!

 

I am on X299 platform (Asus X299-A).

 

Yes, XML has been updated for my MacOS. My Windows XML is un-edited (and works).

 

Drivers all up to date.

 

Header is removed.

 

I am thinking to bind my GPU in the vfio-pci settings. Worried it may break Unraid?

 

It used to work, but seems it is recently broken. Not clear though what's different now. Maybe recent Unraid update or MacOS update?

Link to comment
  • 3 weeks later...

OMFG... this is the magic bullet that finally solved my wonky VM issues!  I will be riding this wave of contentment and joy every time I get a clean shutdown from a VM for years to come.

 

I've been having a heck of a time trying to get windows 10 to shutdown clean, it had been making unRaid freeze.  Fortunately I could still execute a graceful shutdown because I deliberately left my keyboard on a non-passthrough USB controller.  So although I was frozen out of the webgui and running headless, I could still blindly log into the terminal and type "powerdown".  That probably saved me from hundreds of hard reboots and god knows how many hours of parity checks.

 

For dunderheads like me who are still lost in the weeds and are desperately seeking further clarification I will spell it out in the excruciating detail I wish that I had...

 

1) Check your IOMMU groups for the number 1022:149c or 1022:1487 attached to a USB controller called "starship/matisse".  If you are trying to pass that through, that's (at least part of) what's causing boot and/or shutdown problems with your VM.

 

solutions...

 

2) Don't stub it and don't passthrough the entire controller, instead passthrough individual devices.  This didn't work in my case because my Focusrite external sound card was giving me demonic sound unless I passed through the whole USB controller.  (I also fell down the rabbit hole of fidgeting with MSI interrupts to no avail)

 

or....

 

3) Edit the /syslinux/syslinux.cfg file on your unRaid USB (don't use notepad, use notepad++ or wordpad if on windows).  You will see the various unRaid boot menu options listed in there.  Under the first menu option will be "append blehblehblehstuff initrd=/bzroot".  That's where you need to put "pcie_no_flr=1022:149c,1022:1487" without the " ".  If you typically boot from another menu option, put it there instead.  In my case the file looked like 

 

default menu.c32
menu title Lime Technology, Inc.
prompt 0
timeout 50
label Unraid OS
  menu default
  kernel /bzimage
  append pcie_no_flr=1022:149c,1022:1487 vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot
label Unraid OS GUI Mode
and so on...

This worked to get windows shutting down nicely.

 

However my Ubuntu VM was still not shutting down clean, even after appending the syslinux.cfg file.  That's because this bug is a quirk between the linux kernel and this specific USB controller.  Ubuntu was still flubbing the shutdown because it has more or less the same kernel as unRaid.  So the final step is to get Ubuntu to behave itself...

 

4) Edit the Kernel Boot Parameters in /etc/default/grub by moving the cursor to the line beginning with "GRUB_CMDLINE_LINUX_DEFAULT" then edit that line, adding your parameter (pcie_no_flr=1022:149c,1022:1487) to the text inside the double-quotes after the words "quiet splash". (Be sure to add a SPACE after "splash" before adding your new parameter.) Click the Save button, then close the editor window.

 

5) sudo update-grub

 

6) restart ubuntu

 

The earlier comments in this thread and the following three links are the source of everything I just described:

 

https://forum.level1techs.com/t/attention-flr-kernel-patch-fixes-usb-audio-passthrough-issues-on-agesa-1-0-0-4b/151877

 

https://old.reddit.com/r/VFIO/comments/eba5mh/workaround_patch_for_passing_through_usb_and/

 

https://wiki.ubuntu.com/Kernel/KernelBootParameters

Edited by clay_statue
  • Like 1
Link to comment
  • 1 month later...
On 11/9/2020 at 12:17 AM, steve1977 said:

Thanks for your help!

 

I am on X299 platform (Asus X299-A).

 

Yes, XML has been updated for my MacOS. My Windows XML is un-edited (and works).

 

Drivers all up to date.

 

Header is removed.

 

I am thinking to bind my GPU in the vfio-pci settings. Worried it may break Unraid?

 

It used to work, but seems it is recently broken. Not clear though what's different now. Maybe recent Unraid update or MacOS update?

I am also seeing the same issue with the x299 platform, did you end up fixing the issue for your MacOS vm?

Link to comment
18 hours ago, jamesy829 said:

I am also seeing the same issue with the x299 platform, did you end up fixing the issue for your MacOS vm?

Yes, still experiencing the crashing. But not always. It only crashes in two situations:

 

1) When passing through vbios. It works well when not passing throught he vbios (and the GPU is even identified without)

 

2) When having heavy transfer activity with a network disk

 

It used to work flawless with the prior Macinabox with Catalina, but issue exists since moving to the new Macinabox with Big Sur. I have not changed my motherboard, but I have upgraded my CPU (to a 10980xe).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.