6.9.2 broke my vm's with nvidia gpu's


Recommended Posts

Hello.

 

I just decided to take the jump to 6.9.2 today, as the threads seems to have slowed down on bug reporting. Everything worked just fine on 6.8.3. After the first reboot into 6.9.2 my main windows 10 vm was really slow. And I check the ssd and it seemed to have been effected in regards to the 1 MiB alignment "bug". So I moved everything from it and reformatted using unassigned devices plugin. And moved my vm's back.

 

Everything seemed fine for around 30 minutes and it just crashed on me. The funny thing is that it works just fine using VNC. Their is no problems at all. Everything seems to work fine just up until the gpu driver seems to be loaded in windows. Then everything freezes or crashes. I have tried rebooting. removing the gpu from vfio and back in. Nothing seems to help me at all. - I then found this thread and followed the guide by removing all the gpu drivers using vnc. Added the gpu back in. Booted just fine into windows with display on my screens.

 

Then when I tried installing the gpu driver it just crashed. And now i'm back to square one. I got a flash drive backup from before upgrading. So might just end up downgrading again. But I hope someone can help me out here.

I get this line almost everytime I boot up a VM with my gpu passed through to it.

Tower kernel: vfio-pci 0000:81:00.0: vfio_ecap_init: hiding ecap 0x19@0x900

- not sure if this is intended or not. I have not seen it before. But could ofc be a 6.9 thing.

 

Best regards,

Brydezen

tower-diagnostics-20210410-2236.zip

Edited by Brydezen
Link to comment
  • 1 month later...

VM: Ubuntu 20.04 LTS server VM with a Nvidia 1050Ti set to pass-through.
Configuration: IPC NVR with Tensorflow for object detection and email alerts.
Behavior: I observed that in 6.9.2 release applied the Tensorflow alerts would be delayed by half an hour and up to an hour. The attached email screenshots did not show the captured object as expected. Rolled back Unraid to 6.8.3 release and the object detection / alerts reported as expected.
Notes: In 6.9.2 release viewing the CPU usage of the various assigned programs to PM2 in this case reported Tensorflow as using 200% of the CPU consistently. In contrast with 6.8.3 PM2 Tensorflow reported < 100% CPU usage averaging 95 - 98%.
Conclusion: The only change to this Unraid server where the VM is hosted was upgrading from 6.8.3 release to 6.9.2 release. Downgrading to v.6.8.3 was the only way to resolve this issue.

Edited by Thirs
  • Like 1
Link to comment
  • 4 weeks later...
  • 1 month later...
  • 4 weeks later...
  • 4 months later...

Hi All,  I am late to the party but tried to go to both 6.9 and 6.10. recently from 6.8  As soon as any Windows VM with Nvidia GPU passed in initialises it crashes.  (nvlddmkm.sys video TDR failure)  I think I have this same issue as OP and am looking for a solution.  

 

It sounds like this thread diverged to a different issue with KVM crashing entirely vs just the VM's or am I reading it wrong and there is a solution for the VM/Nvidia issue? This same issue is posted here too where I also posted.  Looking at that diagnostics I noticed I have a very similar CPU and same motherboard as OP so can't help but think that maybe that has something to do with this perhaps - i.e. hardware related issue.  I have searched a lot on this problem and only found a few mentions of this problem.  If it was a widespread 6.9/10 issue I think it would have more attention so again seems more specific to hardware or maybe some Sw conflict with a add-on or something.

 

Anyhow, if this is solved would you guys elaborate further on the 'fix' and if not what is the best course of action to get more help?  I was going to start a new thread and post all my info but if this is fixed want to skip that.

 

Thanks

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.