Jump to content

Still Broken :-( Restarting/Shutting Down a VM w/ GPU passthrough causes the entire system to crash?


Recommended Posts

I tried using a 1050 Ti back in 6.8.3 and this happened as well.  I had hoped that moving to 6.9.1 would fix it but I guess not 😞

 

I decided to try it again now that I'm on 6.9.1 and it did it again.  Every time I have to run a Parity Check which takes 25+ hours.

 

 

I've tried both machine types to no avail.  It never does this when I don't have GPU passthrough enabled.  Any thoughts as I'm at the end of my rope with this?

 

 

 

fsnas1u-diagnostics-20210401-1053.zip

Edited by jlficken
Link to comment
  • jlficken changed the title to Restarting/Shutting Down a VM w/ GPU passthrough causes the entire system to crash?!?!

So I take it I'm the only one having this problem?

 

Since disabling GPU passthrough I've restarted the VM multiple times as well as shut it down with no issues so I have no idea what the problem is.

 

The ROM file came from my GPU using the SpaceInvaderOne script so I'm hoping that's not the problem.  I've tried the TechPowerUp one for my card as well.

Link to comment
  • jlficken changed the title to Restarting/Shutting Down a VM w/ GPU passthrough causes the entire system to crash?

Hi there,

 

This could be due to the brand of device you're trying to use.  Never heard of Inno3D before.  They may have bad device firmware that doesn't like being assigned to a VM.  I'm assuming that without the device assigned, the VM can be spun up and shut down as many times as you want.  A simple test would be to replace the GPU with an alternate from EVGA or another quality brand (EVGA is recommended) and see if you can reproduce the same result.  If so, there may be more to investigate, but if not, you'll at least know the issue is specific to the hardware you're using.

 

If you want to try and gather more information on what is causing this issue, you'll need to connect a monitor and keyboard to the system's onboard graphics and boot into console mode.  Login to the console with your root account and then type the following command:

 

tail /var/log/syslog -f

 

This will begin printing log events directly to your monitor.  Now recreate the VM crashing issue and I bet you'll see some critical information posted to the screen that you can then take a picture of and post back here for us to analyze.

Link to comment

Yeah I can't really replace the card as there's no 8-pin/6-pin supplemental power option in the Supermicro CSE-846 which is why I used the Inno3D card as it doesn't require supplemental power and no other company makes a single slot 1050 Ti.  Finding another card really isn't an option right now with the global GPU shortage either.

 

Inno3D is a huge company though and has been around since the 90's.  I remember using their GPU's back when Packard Bell was still around.

https://www.inno3d.com/static.php?refid=1

 

I have IPMI on the server so I can view the console and run that command to see what happens if I can get it to crash.

 

ETA: I also Isolated the CPU cores that are dedicated to the VM yesterday and stopped passing through a USB DVD drive.  I have ordered a PCI-E USB 3.0 card and will pass the entire card through instead when it arrives on Tuesday.

 

Edited by jlficken
Link to comment
7 hours ago, jlficken said:

Inno3D is a huge company though and has been around since the 90's.  I remember using their GPU's back when Packard Bell was still around.

https://www.inno3d.com/static.php?refid=1

😂 Inno3D's heyday was well before me, I only know about them from some LTT videos where they've reviewed "weird" cards. The thermals on that card must be insane, how do you manage the heat?

 

The diagnostics.zip you provided doesn't seem to have the answer in it, having the server crash + needing to reboot means the logs are gone. All that's in there is the logs that show the system booting and data that otherwise suggests your server should be working fine.

 

@jonp's answer should give you the information you need, I wish my "server" had IPMI. That or configuring a remote Syslog Server in Unraid WebUI Settings > Syslog Server. But it looks like you've found yourself a possible solution.

Edited by lnxd
Link to comment

That device is a Mellanox ConnectX-2.  I have a ConnectX-3 as well that's not being used so I swapped them out.

 

I also installed a USB3 card to test hot swap.

 

I still have my fingers crossed I'll get this figured out but I'm getting burnt out on it.

Edited by jlficken
Link to comment
  • 2 weeks later...
  • jlficken changed the title to Still Broken :-( Restarting/Shutting Down a VM w/ GPU passthrough causes the entire system to crash?

It's still broken.  It's totally killed the server twice now when rebooting the VM.

 

How can I get the Syslog file when I can't even get IPMI to respond and it's not on the Flash drive anywhere?

 

IPMI console viewer lets me enter root for the user and then locks up.  All drive activity ceases on the server.  It's just dead....

 

I'm getting so utterly frustrated with unRAID right now that I'm not sure I even want to mess with it anymore.

  • Like 1
Link to comment
  • 2 weeks later...
  • 1 month later...

It rebooted once after about a week sucessfully.

 

Now after 30 days I get this:

image.png.d9afd60900bdfc496d328986a07d740b.png

image.png.986a58f90d57f2e3d01c70b85cecbf05.png

 

It can send an email but otherwise goes unresponsive on anything but the Dockers and Settings tabs.

 

Trying a "shutdown -r now" it just goes away completely and won't even respond to a ping but it never reboots.

 

Would a recording of everything that happens next time be helpful?

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...