[Plugin] Nvidia-Driver


ich777

Recommended Posts

43 minutes ago, cristiano1492 said:

Then question is : can be right keep the plugin for both uses ? 

No, this can lead to serious problems and even hard system lock ups.

 

If your main goal is to use it in a VM then this plugin isn't what you want to use.

 

44 minutes ago, cristiano1492 said:

Can you tell me what do you mean about ABOVE 4G Decoding ?  Where I could find this option in the BIOS menu ?

Please ask in the VM support subforums.

Link to comment
1 hour ago, cristiano1492 said:

Can you tell me what do you mean about ABOVE 4G Decoding ?  Where I could find this option in the BIOS menu ?

 

may start a look at your bios or your manual from your mainboard ;)

 

mainly in use when using more then 1 GPU or rbar should be in the game, then you need it.

  • Like 1
Link to comment
On 8/6/2021 at 4:01 PM, mike2246 said:

I had GPU Statistics setup as well. I thing that's where I was getting confused is nvidia-smi showed it but dashboard gpu stats didn't show anything

GPU Stats.JPG

any ideas for why gpu statistics isn't showing anything but plex says HW transcoding?

Link to comment

I have a 1650s that I use for transcoding in a plex docker and in an ErsatzTV docker (emulates DVR scheduling of plex media to the Live TV section). When I boot the server up and check the nvidia plugin, I see the card listed as well as its info and driver version. If I go into the console of both docker containers and run the nvidia-smi command, I can see the card listed and working.
 

After a few days though, the card just disappears from the nvidia plugin, and the containers. If I run nvidia-smi in the unraid console, I still see the card listed in the devices, and if I run the lspci command, I see it listed in the hardware there as well.

It is plugged into the first pcie slot. I have a sata adapter plugged into my m.2 slot, in order to add more drive capacity to the motherboard. My PSU is 650w, so it should be enough to power everything simultaneously.

 

Im currently away from my tower, so I'll add any required images in a few hours when I have access to it again.

Link to comment
1 hour ago, Agent531C said:

After a few days though, the card just disappears from the nvidia plugin, and the containers. If I run nvidia-smi in the unraid console, I still see the card listed in the devices, and if I run the lspci command, I see it listed in the hardware there as well.

What do you mean with the containers? Can you open up a Console window from the container where it dissapears and try to issue the command nvidia-smi there and see if you get a output from it?

 

What does the plugin page say instead?

Link to comment
27 minutes ago, ich777 said:

What do you mean with the containers? Can you open up a Console window from the container where it dissapears and try to issue the command nvidia-smi there and see if you get a output from it?

 

What does the plugin page say instead?

 

So, I actually ended up crashing my server by trying to run nvidia-smi on the unraid console inside a vm.

Now that it's rebooted, its recognizing the graphics card, which Ive attached a couple images of. You can see that the server recognizes it.

Now that it's reappeared, I'll have to wait for them to disappear for some pictures of it, but it basically stops being detected by the plugin. Since it isn't detected, the docker containers can no longer utilize it. (which I can show when it happens again). This isn't specific to this graphics card, as it happened with my old 970 too.

Functioning GPU - Containers.PNG

Functioning GPU - Driver Package.PNG

Edited by Agent531C
Link to comment
7 minutes ago, Agent531C said:

inside a vm.

Inside a VM?

 

Can you please post your Diagnostics on what hardware do you run this card? The next time it dissapears from the system please also download an post the Diagnostics, otherwise troubleshooting is really hard...

 

You can also try to enable persistence mode, simply issue 'nvidia-persistence' from the console.

Link to comment
4 minutes ago, theun said:

however now all my nVidia Info: is missing.  Any chance you can have a look into this for me?

Can you attach your Diagnostics please?

I also run a test server with the Nvidia drivers installed and they work flawlessly.

 

Have you already tried to reinstall the plugin?

 

Have you got a message when upgrading from 6.9.2 to RC1 to not reboot yet and that the download started in the background?

Link to comment
3 minutes ago, theun said:

No I did not get a message about a download being started in the background.

Interesting, can it be the case that you where on a old plugin version?

 

How did you upgrade? Through the built in updater?

Under normal circumstances you should get a message that you should wait to reboot because a new unraid version is detected, should have looked something like this:

image.png.3595875c42916cdca185d76b1bf4d694.png

Link to comment

Hi,

I'm having a similar issue with the graphics card no longer being detected. This has happened to me one more time a few months ago and card was visible after a restart.

No hardware changes were done inbetween these events and no major unraid version changes, only the plugin/driver updates.

nvidia-smi output from unraid console or from a container (plex) now shows "No devices were found"

In syslog (now and a few months ago) i can see this event:

Aug 10 10:31:32 Tower kernel: NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
Aug 10 10:31:32 Tower kernel: NVRM: A GPU crash dump has been created. If possible, please run
Aug 10 10:31:32 Tower kernel: NVRM: nvidia-bug-report.sh as root to collect this data before
Aug 10 10:31:32 Tower kernel: NVRM: the NVIDIA kernel module is unloaded.

Attaching the diagnostics file.

tower-diagnostics-20210810-1402.zip

Link to comment
25 minutes ago, CBS said:

This has happened to me one more time a few months ago and card was visible after a restart.

Yes, this message:

Aug 10 10:31:32 Tower kernel: NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.

indicates that the card has fallen off from the bus and the driver is unloaded.

This is basically the same as if you pull the GPU from a running system, then the same message would appear in the syslog.

 

If you don't mind me asking but can you please try the next time this happens open up a terminal from unRAID and issue the command 'lspci' (without quotes) and post the output here or send it to me via PM?

I'm really curious if the GPU is visible after it fell from the bus from my perspective it shouldn't, if it is visible anyways then try to do from the same console window 'modprobe nvidia' and go to the plugin page again and look if the card is visible again.

When it's visible to the system there has to be something wrong with the communication between the card and the system itself and it dropped for a few moments the connection so that this actually appears in the syslog and the driver is unloaded.

 

Hope that makes sense to you.

 

Don't want to indicate that this is a AMD problem but it seems to me that this indeed a AMD problem since it only drops on these platforms, sadly enough I don't own a AMD system personally but I also read about PCIe stability issues on AMD motherboards (v3.0 and v4.0).

 

Looking forward to your response and your findings. :)

 

EDIT: May I also ask if you have a Dashboard window from unRAID somewhere open which show the GPU Statistics?

Link to comment

Hi, thanks for the quick reply.

lspci does show the card (even if not visible in the plugin or for nvidia-smi):

09:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1)
09:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
09:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)
09:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)

 

And doing 'modbprobe nvidia' had no effect, no change in the plugin displayed info.

 

image.png.38fe6ec68f1673eedca92358e98e6858.png

Edited by CBS
  • Thanks 1
Link to comment
2 minutes ago, CBS said:

And doing 'modbprobe nvidia' had no effect, no change in the plugin displayed info.

Try to reboot that's the only thing that I can recommend.

If that doesn't help try to shutdown the server entirely, pull the power cord, press the on/off and reset buttons a few times (to empty the caps), plug the power cord back in the wall and power on the server again.

 

This method helps also with my DVB cards, sometimes after much reboots the seem to be in a hanging state where only this method works.

Link to comment

Hi everyone, apologies for the question below:

 

I have 2 Nvidia graphics card installed, a GTX1080 and a GTX750Ti. I would like to use GTX1080 for a Windows 10 VM and a GTX750Ti for Plex transcoding.

 

Will it be correct to use this plugin or the above scenario is not possible, please?

 

Thanks in advance.

Link to comment
26 minutes ago, hkgnp said:

Will it be correct to use this plugin or the above scenario is not possible, please?

Yes and no. 😉

 

Bind the card that you want to use in a VM to VFIO, reboot and the plugin won't see it anymore.

After you bound the card to VFIO and rebooted only the second card should be visible in the plugin, but keep in mind a 750Ti is not capable of h265 (HEVC) and will only transcode h264 and below.

I would recommend to look for a 1050 or 1050Ti on the used market, that card covers the most "new" formats.

Link to comment
On 8/9/2021 at 12:13 PM, ich777 said:

Inside a VM?

 

Can you please post your Diagnostics on what hardware do you run this card? The next time it dissapears from the system please also download an post the Diagnostics, otherwise troubleshooting is really hard...

 

You can also try to enable persistence mode, simply issue 'nvidia-persistence' from the console.

 

 After a few days, as usual, the graphics card has disappeared from use. You can see the containers with no devices, the tower itself with no devices, the plugin missing the device, and finally the device at the very least being seen by the OS with lspci. Ive also attached the diagnostics, now that they should have relevant info.

Nvidia Gone.PNG

NonFunctioning GPU - Containers.PNG

tower.PNG

tower-card.PNG

tower-diagnostics-20210816-1231.zip

Link to comment
1 hour ago, Agent531C said:

After a few days, as usual, the graphics card has disappeared from use.

I have two things that might help:

 

  1. Can you try to add this to your syslinux.conf:
    grafik.thumb.png.20091bb744de2504c9cecea4cf13704d.png
    (Got to "Main", click on the blue text "Flash", scroll a little down and add "iommu=soft", scroll to the bottom, click "Apply" and reboot)

     
  2. If the above doesn't help please try to turn off AMD-V/AMD-Vi in your BIOS.

 

 

I have now many reports of GTX1650/Super and GTX1660/Super that fell of the bus and/or are not visible after a few days, don't know if it's related to the cards itself or if it's a hardware (combination) issue in general.

Link to comment
8 hours ago, ich777 said:

I have two things that might help:

 

  1. Can you try to add this to your syslinux.conf:
    grafik.thumb.png.20091bb744de2504c9cecea4cf13704d.png
    (Got to "Main", click on the blue text "Flash", scroll a little down and add "iommu=soft", scroll to the bottom, click "Apply" and reboot)

     
  2. If the above doesn't help please try to turn off AMD-V/AMD-Vi in your BIOS.

 

 

I have now many reports of GTX1650/Super and GTX1660/Super that fell of the bus and/or are not visible after a few days, don't know if it's related to the cards itself or if it's a hardware (combination) issue in general.

I've added the iommu bit to the config, so Ill give it a few days and see if it drops off again. My old 970 would drop off as well. I initially thought it was a graphics card problem, so It was incentive to replace it, but now that its happened on both, Im sure it something else.

The BIOS option sounds like it could be a culprit as well.

Link to comment

Has anyone gotten a RTX 3060 12GB gpu to work?  I had a 3060 Ti installed and working, but swapping it out with a 3060 has been extremely problematic.  It shows in Tools->System Devices, but only sometimes in the Nvidia plugin.  The GPU Statistics plugin doesn't show it at all.  When it does show, if I run nvidia-smi it shows usage on the console, but then disappears on the driver plugin page.  I've uninstalled/reinstalled the latest driver, removed the GPU Statistics plugin, rebooted countless times, it's just really flaky.

Link to comment
17 minutes ago, wills said:

Has anyone gotten a RTX 3060 12GB gpu to work?

Can you attach the Diagnostics?

Which verison from the driver where you using?

What brand was the 3060 and the 3060Ti?

 

But yes, I think a few pages back someone said that his 3060 works flawlessly, I only got reports from 3070's and 3080's that also work flawlessly.

 

17 minutes ago, wills said:

The GPU Statistics plugin doesn't show it at all.

Have you made sure to change the UUID from the card on the GPU Statistics plugin page?

If it doesn't showed up anyways you have to make a post in the support thread.

 

17 minutes ago, wills said:

When it does show, if I run nvidia-smi it shows usage on the console, but then disappears on the driver plugin page.

When you run nvidia-smi it dissapers from the plugin page?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.