Rebooting GPU Passthrough VM takes entire unraid to freeze


takkkkkkk

Recommended Posts

Hoping someone really could help... I'm even considering making unraid as just dumb NAS because of this issue...

 

I have unraid with 1080 TI passed through to one of the VM. but sometimes the VM becomes very unresponsive, and when I try to reboot the VM, the entire unraid becomes non-responsive, forcing me to hard reset the entire server.

Anyone else experienced this? and/or know how to troubleshoot??

Link to comment
17 hours ago, metathias said:

Alot of people are chasing ghosts right now. Mostly due to some issues with the latest version of unraid. You might try going down to a 6.6.x or lower version of Unraid. And see if that improves your conditions any.

I really have to downgrade to make it work? So, this is a common issue, but no one really know what's causing it? Feels super odd to know that there's a major fraud in one of unraid's major sales point, yet the workaround is to downgrade...

Link to comment

Supposedly its fixed in 6.8 but that has'nt been released yet. Soon they say. There have been significant changes to the underlying kernel in recent iterations. It is very difficult to keep up with the latest and greatest patchs, and feature improvements that go into it. And unfortunately some of those underlying systems have changed significantly just in the past few iterations. Which has a tendency to break things in unexpected ways. You have to keep in mind that the technologies being employed on Unraid. Such as the new KVM with passthrough support are bleeding edge new software. And have only been practical for a couple years now. But i can say from using it for the past 14 months of so. Its been as absolute god send of an OS. And has been really incredibly flexible environment for me to work, play, and experiment in. With next to no down sides. Other than my wife and childs VMs are taking up all my cores =P. And to be totally fair. A downgrade is like the easiest thing in the world. You throw all your current file from the USB drive into a backup folder on the drive (a 4GB thumb drive can store the entire OS like 4x with that) And then just unzip the downgraded zip files right in the root. Bam downgraded. Just dont forget to copy and paste in your license file.

Link to comment
1 hour ago, takkkkkkk said:

Feels super odd to know that there's a major fraud in one of unraid's major sales point, yet the workaround is to downgrade...

Problem is, it only effects a subset of hardware. So there are thousands of unraid installs working fine. Every time there is a kernel or hardware or driver change, it can take some time to work out the issues that can come up. The workaround is indeed to use a version of the software that doesn't cause issues. Limetech makes every effort they can to ensure everything works perfectly for everybody, but that ideal will never be reached, because the goalposts are constantly moving.

Link to comment
  • 1 month later...
On 9/30/2019 at 9:43 PM, takkkkkkk said:

Hoping someone really could help... I'm even considering making unraid as just dumb NAS because of this issue...

 

I have unraid with 1080 TI passed through to one of the VM. but sometimes the VM becomes very unresponsive, and when I try to reboot the VM, the entire unraid becomes non-responsive, forcing me to hard reset the entire server.

Anyone else experienced this? and/or know how to troubleshoot??

Did you managed to solve this? I have the same problem with 6.8.0-rc5. 

Complete freeze of whole system when i shutdown Windows 10 vm with nvidia passthrough. 

Link to comment
On 9/30/2019 at 9:43 PM, takkkkkkk said:

Hoping someone really could help... I'm even considering making unraid as just dumb NAS because of this issue...

 

I have unraid with 1080 TI passed through to one of the VM. but sometimes the VM becomes very unresponsive, and when I try to reboot the VM, the entire unraid becomes non-responsive, forcing me to hard reset the entire server.

Anyone else experienced this? and/or know how to troubleshoot??

I have this issue as well, and I'm on latest 6.8.0-rc5. I Have removed a lot of SSD and HD and left only some few in a array ( was thinking if  I have any strange drive?). but when restart/stop a VM it's totally freeze the unraid and a hard reset is only way to go. And there are absolute nothing in the logfiles. And I don't know when this started to happens since I have been on all rc and beta 🙂 

Link to comment

@perhansen So you say it happens everytime you restart/stop a VM using the GPU???

 

In 2 years using unraid and a Windows VM with GPU passthrough I had 2 maybe 3 situations, where the video driver crashed and the server hung afterwards. It could happen by a game crash or by an application using the card for transcoding or rendering. The normal way is the card will be reset and should work again. But in these 2-3 rare cases this won't happen for me and I had to restart the whole server. AMD cards are more often affected by these types of "reset bugs".

Link to comment
@perhansen So you say it happens everytime you restart/stop a VM using the GPU???
 
In 2 years using unraid and a Windows VM with GPU passthrough I had 2 maybe 3 situations, where the video driver crashed and the server hung afterwards. It could happen by a game crash or by an application using the card for transcoding or rendering. The normal way is the card will be reset and should work again. But in these 2-3 rare cases this won't happen for me and I had to restart the whole server. AMD cards are more often affected by these types of "reset bugs".


No, actually not everytime. When i only use the vm for a short amount of time, lets say 30minuts, its working fine. But when i game with my son, for a couple of hours, it hangs the intire system.
I think it has something to do with the Vega 10 reset bug,@limetech implemented in 6.8, but removed again in rc5, because it caused more problems.



Sent from my iPhone using Tapatalk
Link to comment
1 minute ago, perhansen said:

I think it has something to do with the Vega 10 reset bug

The patch should only affect AMD cards and not Nvidia. I don't think this is the issue. Otherwise way more people would be affected by this if the AMD reset fix would have broken the Nvidia passthrough.

 

How old is your card? Overclocked? Cooling ok? Could also be a hardware issue you facing. If for some reason the card completly fails because of overheating or are hardware defect it can't be reset without a reboot like on a real PC. If it shuts down completly it's gone from the system and unraid will not recover from that.

 

I would try to use some benchmark tools in a loop and watch what happens to the temps of the card.

Link to comment
The patch should only affect AMD cards and not Nvidia. I don't think this is the issue. Otherwise way more people would be affected by this if the AMD reset fix would have broken the Nvidia passthrough.
 
How old is your card? Overclocked? Cooling ok? Could also be a hardware issue you facing. If for some reason the card completly fails because of overheating or are hardware defect it can't be reset without a reboot like on a real PC. If it shuts down completly it's gone from the system and unraid will not recover from that.
 
I would try to use some benchmark tools in a loop and watch what happens to the temps of the card.


I think i have read about some with nvidia cards, that had the same issue, but i could be wrong.
Its a brand new Asus Phoenix 1660, not overclocked or anything else and with great temps. I will investigate further and use benchmark to test.
Lets say its because of to high temp, why is it then only killing the whole system on shutdown?


Sent from my iPhone using Tapatalk
Link to comment

@perhansen In a normal scenario where the VM gets shutdown/rebootet or a driver error occures the card will reset. If there is a damage or it overheats it will shut down to prevent further damage. A device in a reset state can be managed or lets say it should be managed from Unraid to get used again. In case of a shutdown I guess there is no way for Unraid to recover from it. Think about running it in a normal PC, what will happen if you have for example an OC running and the card get's to hot and it shuts down. The PC will hang or crash completly.

 

Do you have the hdmi audio controller from the card also passed trough to the VM as a soundcard? If not, try that. Maybe this helps. If there is a USB device which is dedicated to the card, most cases for VR, you should pass that through.

 

Using an extra vbios for the card could also help. I guess you already tried that.

 

Link to comment
12 hours ago, perhansen said:

Did you managed to solve this? I have the same problem with 6.8.0-rc5. 

Complete freeze of whole system when i shutdown Windows 10 vm with nvidia passthrough. 

FWIW my main win10 workstation uses passed-through GTX 780 - never a single problem.  Not saying it's model-specific, just I never see these kinds of 'freezes' - if I did we'd be able to fix.

Link to comment
@perhansen In a normal scenario where the VM gets shutdown/rebootet or a driver error occures the card will reset. If there is a damage or it overheats it will shut down to prevent further damage. A device in a reset state can be managed or lets say it should be managed from Unraid to get used again. In case of a shutdown I guess there is no way for Unraid to recover from it. Think about running it in a normal PC, what will happen if you have for example an OC running and the card get's to hot and it shuts down. The PC will hang or crash completly.
 
Do you have the hdmi audio controller from the card also passed trough to the VM as a soundcard? If not, try that. Maybe this helps. If there is a USB device which is dedicated to the card, most cases for VR, you should pass that through.
 
Using an extra vbios for the card could also help. I guess you already tried that.
 


I know what happens in a normal pc, and this is not the same problem i have here. I can use the vm for 10 hours of gaming, without any problems. The issue first occure when i shutdown the vm or restart it, not when i’am using the vm. That can’t be a temp problem or driver problem, or am i wrong?
Then this would occure when i was using the card at high usage, right?

I actually havent tried with the vbios, thats my next thing on the list. I will report back.


Sent from my iPhone using Tapatalk
Link to comment
@perhansen In a normal scenario where the VM gets shutdown/rebootet or a driver error occures the card will reset. If there is a damage or it overheats it will shut down to prevent further damage. A device in a reset state can be managed or lets say it should be managed from Unraid to get used again. In case of a shutdown I guess there is no way for Unraid to recover from it. Think about running it in a normal PC, what will happen if you have for example an OC running and the card get's to hot and it shuts down. The PC will hang or crash completly.
 
Do you have the hdmi audio controller from the card also passed trough to the VM as a soundcard? If not, try that. Maybe this helps. If there is a USB device which is dedicated to the card, most cases for VR, you should pass that through.
 
Using an extra vbios for the card could also help. I guess you already tried that.
 


Sorry, i dident answer all your questions.
Yes, the audio controller is also passthrough, and no usb.
The vm is running now with vbios from techpowerup, so i will report back later.


Sent from my iPhone using Tapatalk
Link to comment
@perhansen In a normal scenario where the VM gets shutdown/rebootet or a driver error occures the card will reset. If there is a damage or it overheats it will shut down to prevent further damage. A device in a reset state can be managed or lets say it should be managed from Unraid to get used again. In case of a shutdown I guess there is no way for Unraid to recover from it. Think about running it in a normal PC, what will happen if you have for example an OC running and the card get's to hot and it shuts down. The PC will hang or crash completly.
 
Do you have the hdmi audio controller from the card also passed trough to the VM as a soundcard? If not, try that. Maybe this helps. If there is a USB device which is dedicated to the card, most cases for VR, you should pass that through.
 
Using an extra vbios for the card could also help. I guess you already tried that.
 


Okay, the whole system just now hangs, after a driver update. Damn that sucks.


Sent from my iPhone using Tapatalk
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.