Nvidia-smi can't see GPU once it's passed through to a VM


Recommended Posts

Hello all,

 

So I'm pretty good with servers and server related activities. But something keeps on eluding me, and I don't know how to fix it. So, I have a Ubuntu Virtual Machine setup for a Plex Server. I have a GPU that is being passed through to this Server for Hardware Transcoding. Before I did any of this, I installed the Nvidia drivers and installed nvidia-smi. Now, this is what I see before I start up the VM with the attached GPU.

Before.PNG.050b0a04572459dd0e4189aee3089eca.PNG

 

All seemed fine, so I setup the VM, and passed through the GPU, and started it. Once I setup Plex, all is working fine. I can transcode using the GPU just fine, but then I noticed that I no longer can see the stats of the GPU in GPU Statistics. This is what I see

After.PNG.07e90ce72687594f066e3242e5b40b21.PNG

842210838_GPUStats.PNG.49e24c00231778224b344247be608e9a.PNG

 

So I was messing around, turned the VM off, and read all sorts of stuff, and one common thing I saw was to Bind the GPU to VFIO, so I did that, rebooted my server, and now, whether the VM is started or not started, this is what I see.

 

Bound.PNG.bd106a62ab84cda6a2bb692690d4fa1d.PNG

865560917_GPUStats.PNG.b1040938f6aef8b244bfed615f81ba6b.PNG

 

I unbound the card from VFIO since it felt like I was closer before I did that. I will attach the diagnostics stuff with the GPU not bound to the VFIO.

 

So, what I want to know is, is there a way that I can Pass this GPU through to the Plex VM, and still see the GPU statistics while it's being used? I see other people doing it, but I don't know how they are doing it. My original thought is, maybe it's because they are using Plex with a Docker, but I don't want to do that. I want to use 1 VM for Plex, and another VM for the downloading/media management. Please help me haha :)

 

nautilus-diagnostics-20220414-1249.zip

Link to comment
  • 1 year later...

@itimpi I currently have the same "Issue", so I understand that, when a VM is started, I can't monitor the GPU usage right ?

That weird, it's kind of the whole point of this, but ok, anyway, on the other hand, when I shutdown the VM, the issue persist, I can't see the stats anymore, my GPU is not recognized and I have ~5% CPU usage because of log flood:

Jun 13 14:21:45 Yggdrasil kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x23:0x65:1426)
Jun 13 14:21:45 Yggdrasil kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

PS: Yggdrasil is my server name

 

Any idea how I could fix this ??

Link to comment
3 minutes ago, Sneyek said:

@itimpi I currently have the same "Issue", so I understand that, when a VM is started, I can't monitor the GPU usage right ?

That weird, it's kind of the whole point of this, but ok, anyway, on the other hand, when I shutdown the VM, the issue persist, I can't see the stats anymore, my GPU is not recognized and I have ~5% CPU usage because of log flood:

Jun 13 14:21:45 Yggdrasil kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x23:0x65:1426)
Jun 13 14:21:45 Yggdrasil kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

PS: Yggdrasil is my server name

 

Any idea how I could fix this ??

Have you bound the gpu to vfio if yes the correct driver will not be bound to the card.

Link to comment

It's not checked.

Right now I have stats showing, but of course my GPU is in idle, when a VM is started with the GPU passthrough it just doesn't monitor anything anymore. It's not even able to display the GPU name or temperature...

And when I stop the VM, it doesn't work again, instead I have the message mentioned previously flooded in the Logs and ~5% CPU usage. (But also my entire Unraid seems to be slow, it takes almost 1 minute to change the current page.) The only way to fix the log flood, cpu usage and slowness it to reboot the entire system...

Screenshot 2023-06-13 174220.png

Link to comment
1 minute ago, Sneyek said:

It's not checked.

Right now I have stats showing, but of course my GPU is in idle, when a VM is started with the GPU passthrough it just doesn't monitor anything anymore. It's not even able to display the GPU name or temperature...

And when I stop the VM, it doesn't work again, instead I have the message mentioned previously flooded in the Logs and ~5% CPU usage. (But also my entire Unraid seems to be slow, it takes almost 1 minute to change the current page.) The only way to fix the log flood, cpu usage and slowness it to reboot the entire system...

Screenshot 2023-06-13 174220.png

That is expected as the gpu is exclusivily allocated to the vm. it is no longer visible to the host.

 

Why do you have iommu group 12 bound to vfio? post diagnostics,

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.