6.9.2 broke my vm's with nvidia gpu's


Recommended Posts

Hello.

 

I just decided to take the jump to 6.9.2 today, as the threads seems to have slowed down on bug reporting. Everything worked just fine on 6.8.3. After the first reboot into 6.9.2 my main windows 10 vm was really slow. And I check the ssd and it seemed to have been effected in regards to the 1 MiB alignment "bug". So I moved everything from it and reformatted using unassigned devices plugin. And moved my vm's back.

 

Everything seemed fine for around 30 minutes and it just crashed on me. The funny thing is that it works just fine using VNC. Their is no problems at all. Everything seems to work fine just up until the gpu driver seems to be loaded in windows. Then everything freezes or crashes. I have tried rebooting. removing the gpu from vfio and back in. Nothing seems to help me at all. - I then found this thread and followed the guide by removing all the gpu drivers using vnc. Added the gpu back in. Booted just fine into windows with display on my screens.

 

Then when I tried installing the gpu driver it just crashed. And now i'm back to square one. I got a flash drive backup from before upgrading. So might just end up downgrading again. But I hope someone can help me out here.

I get this line almost everytime I boot up a VM with my gpu passed through to it.

Tower kernel: vfio-pci 0000:81:00.0: vfio_ecap_init: hiding ecap 0x19@0x900

- not sure if this is intended or not. I have not seen it before. But could ofc be a 6.9 thing.

 

Best regards,

Brydezen

tower-diagnostics-20210410-2236.zip

Edited by Brydezen
Link to comment
  • 1 month later...

VM: Ubuntu 20.04 LTS server VM with a Nvidia 1050Ti set to pass-through.
Configuration: IPC NVR with Tensorflow for object detection and email alerts.
Behavior: I observed that in 6.9.2 release applied the Tensorflow alerts would be delayed by half an hour and up to an hour. The attached email screenshots did not show the captured object as expected. Rolled back Unraid to 6.8.3 release and the object detection / alerts reported as expected.
Notes: In 6.9.2 release viewing the CPU usage of the various assigned programs to PM2 in this case reported Tensorflow as using 200% of the CPU consistently. In contrast with 6.8.3 PM2 Tensorflow reported < 100% CPU usage averaging 95 - 98%.
Conclusion: The only change to this Unraid server where the VM is hosted was upgrading from 6.8.3 release to 6.9.2 release. Downgrading to v.6.8.3 was the only way to resolve this issue.

Edited by Thirs
  • Like 1
Link to comment
  • 4 weeks later...
  • 1 month later...
  • 4 weeks later...
  • 4 months later...

Hi All,  I am late to the party but tried to go to both 6.9 and 6.10. recently from 6.8  As soon as any Windows VM with Nvidia GPU passed in initialises it crashes.  (nvlddmkm.sys video TDR failure)  I think I have this same issue as OP and am looking for a solution.  

 

It sounds like this thread diverged to a different issue with KVM crashing entirely vs just the VM's or am I reading it wrong and there is a solution for the VM/Nvidia issue? This same issue is posted here too where I also posted.  Looking at that diagnostics I noticed I have a very similar CPU and same motherboard as OP so can't help but think that maybe that has something to do with this perhaps - i.e. hardware related issue.  I have searched a lot on this problem and only found a few mentions of this problem.  If it was a widespread 6.9/10 issue I think it would have more attention so again seems more specific to hardware or maybe some Sw conflict with a add-on or something.

 

Anyhow, if this is solved would you guys elaborate further on the 'fix' and if not what is the best course of action to get more help?  I was going to start a new thread and post all my info but if this is fixed want to skip that.

 

Thanks

 

Link to comment
  • 2 months later...

Slammed my head against a brick wall for months trying to solve this problem.  X99 motherboard, Haswell-E CPU, Nvidia 1050ti GPU.  6.8.3 works perfectly, but both 6.9.2 and 6.10 completely hose my Windows 10 VM, video TDR failure when initializing the GPU.

 

Is anyone at Lime aware of this issue?  Because it's starting to look as though 6.8.3 is the last version I'll ever be able to use, and I'm already having trouble with apps requiring 6.9.  

Link to comment
17 hours ago, iphillips said:

Quick update -- this link, posted by mikeg_321 turned out to be the key.  Enabling MSI interrupts for the GPU in question after booting in safe mode did the trick.  So far it looks like we have a happy ending!

That's great. I have tried this on my own VM. It does get more usable but is still unable to play games using it. So i'm shopping for new hardware soon. Was time for a upgrade anyways.

Link to comment

Hi Long time listener of this issue early 2020,  I was hoping this would of been resolved by now, it would seem after my update to 6.9 I have been working well with all the apps I need and the NAS.

 

One of the main resaons for my purchase was having the best of both worlds regarding storage and the ability to game the docker apps have been a nice to have, there the cats out the bag but to the point.

 

I am using what I estimated to be a tidy setup or at very least capable for my minor usage

 

Gigabybte x570

3950 AMD

1080Ti

64 ram 2600mhz

 

Giving acknowlegdement to other users with the same set of events and efforts made to not loose hope with this first world suffering here, I think this is now well documentated to at least get a small ear to the main dudes.

 

For my issue I have managed to get it down to a managble issue where if I have to restart I have to redump my GPU bios to get the benefits of my GPU via my Win10 KVM

 

However this does not happen when my drives spin down for sleep mode and I re-intialise my system so this is isolated to the reboots only

 

My ask at this point is shouldn't the Unraid system be allowed to reboot intact from the usb drive using its last state, why is it changing on a clean reboot, there is certainly something strange going on, maybe its a piority may be its not, I sure would love to know though it bugging the hell out of me, considering my options moving forward, I certianly have had some fun using this co op of application, shame that changes from one version to next can't be caputred to isolate this issue.

 

 

 

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.