[Plugin] Nvidia-Driver


ich777

Recommended Posts

2 minutes ago, gergtreble said:

Whenever I reload the plugin page or run nvidia-smi. 

That‘s caused because the plugin has to call nvidia-smi for the information that you see in the plugin page and since it tries to initialize it every time you get this meassage.

 

Hope that makes sense.

Link to comment
14 hours ago, ich777 said:

Hope that makes sense.

 

Thanks for all the help, that does make perfect sense. 

The weird thing is I've done extensive googling around and there are several people who have seen similar issues. Most have been due to the way their systems boot. I'm wondering if I just have some kind of fundamental incompatibility with my existing hardware. I expect the supermicro server boards never expected to see a GPU in their PCIe3.0 slots. Maybe this just accelerates my planning for a new hardware upgrade.

Link to comment
4 hours ago, gergtreble said:

I'm wondering if I just have some kind of fundamental incompatibility with my existing hardware.

Maybe, but I think most of the users where able to fix their issues with the cards.

 

I suspect that the card is maybe defective but I can't tell for sure if you can't test it in another system.

What you can also try is to boot with UEFI mode instead of Legacy mode and see if this helps.

Link to comment
20 minutes ago, ich777 said:

Maybe, but I think most of the users where able to fix their issues with the cards.

 

I suspect that the card is maybe defective but I can't tell for sure if you can't test it in another system.

What you can also try is to boot with UEFI mode instead of Legacy mode and see if this helps.

 

I've given up on it and started an eBay return. Spent a good 12hrs yesterday frying my brain trying to figure it out. The guy from reddit got back to me and he had no issues using the  linux drivers, his got recognised first time in the same config as me with no BIOS tweaks. So I'm almost certain its a dud card. Just my luck! :D

  • Like 1
Link to comment
1 hour ago, ich777 said:

What you can also try is to boot with UEFI mode instead of Legacy mode and see if this helps.

 

Well the p600 is all packed up and being shipped back. Just won an auction for a p1000. So if I get the same issues, thats my next action! Hope not to be back next weekend! :D

Edited by gergtreble
Link to comment

Does anyone have or know the process to point to an old driver?  My old GPU was previously working but I updated unraid and it seems to try and use the newer drive version 530 as seen below.  I have tried manually changing a couple of files that worked for me previously but I do not seem to be having any luck to make it use version 470.

 

Quote

nvidia-nvlink: Unregistered Nvlink Core, major device number 242
Apr 23 11:54:58 Asgard kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 242
Apr 23 11:54:58 Asgard kernel: NVRM: The NVIDIA Quadro K4200 GPU installed in this system is
Apr 23 11:54:58 Asgard kernel: NVRM:  supported through the NVIDIA 470.xx Legacy drivers. Please
Apr 23 11:54:58 Asgard kernel: NVRM:  visit http://www.nvidia.com/object/unix.html for more
Apr 23 11:54:58 Asgard kernel: NVRM:  information.  The 530.41.03 NVIDIA driver will ignore
Apr 23 11:54:58 Asgard kernel: NVRM:  this GPU.  Continuing probe...

 

Link to comment
4 minutes ago, dubbleb said:

Does anyone have or know the process to point to an old driver?  My old GPU was previously working but I updated unraid and it seems to try and use the newer drive version 530 as seen below.  I have tried manually changing a couple of files that worked for me previously but I do not seem to be having any luck to make it use version 470.

 

 

Goto tbe setting page nvidia settings select 470 download then you need to reboot.

  • Thanks 1
Link to comment
1 hour ago, dubbleb said:

NVIDIA Quadro K4200 GPU

May I ask for what do are you using the card?

This is now a very dated card and makes not much sense to use it on Unraid.

 

1 hour ago, dubbleb said:

I have tried manually changing a couple of files that worked for me previously but I do not seem to be having any luck to make it use version 470.

And yes, as @SimonF already pointed out, select driver version 470.xx click on Update & Download, wait for the Done button to appear and reboot your server.

Link to comment

Yes, I actually use it for transcoding.  I was able to get it to work again.  I only use it internally and just one machine at a time.  Thanks everyone for your help.

 

Oddly enough, that option did not work for me in the past.  This time I was able to uninstall, reinstall and it worked.

Edited by dubbleb
  • Like 1
Link to comment
43 minutes ago, dubbleb said:

Yes, I actually use it for transcoding.

But the K2000 can only transcode h264 (AVC) IIRC and if you want to h265 (HEVC) you are forced to use CPU.

Wouldn't be something more recent like a Nvidia T400 interesting to you?

Link to comment

Having an issue where the plugin (and console) reports "Failed to initialize NVML: Unknown Error" after the plugin is initialized. If I run "nvidia-smi" prior to opening the plugin page, I do get an output. After opening the plugin page, I only receive that error.

I've attached an image of the console/plugin page as well as my diagnostics file. This is a ProLiant ML150 Gen9 with a GTX 1080 inside of it.

Thanks!

image.png

fileserver-diagnostics-20230423-1350.zip

Link to comment
1 hour ago, ocean_breeze said:

I've attached an image of the console/plugin page as well as my diagnostics file. This is a ProLiant ML150 Gen9 with a GTX 1080 inside of it.

Can you try to boot with Legacy Mode (CSM) and see if that makes a difference? I see nothing obvious.

 

Also enable Above 4G Decoding or in your case it can be called like Support large PCIe Address Space or something with 64bit Address Space since HP and Dell calls these things always differently... :/

 

What I can tell for sure these settings should be located in the PCI section from your BIOS.

Link to comment
4 hours ago, ich777 said:

Can you try to boot with Legacy Mode (CSM) and see if that makes a difference? I see nothing obvious.

 

Also enable Above 4G Decoding or in your case it can be called like Support large PCIe Address Space or something with 64bit Address Space since HP and Dell calls these things always differently... :/

 

What I can tell for sure these settings should be located in the PCI section from your BIOS.

So I enabled legacy mode and PCI Express 64-bit BAR support was already enabled. If anyone else reads this and wants to know where to find it, you have to hit CTRL+A while in the RSBU to open a secret "Service Options" menu. Still had the same problem, unfortunately. I know this server/GPU configuration works though as I've previously used it on a baremetal Ubuntu Server install on the same machine.

 

image.png

Edited by ocean_breeze
Link to comment
5 hours ago, ocean_breeze said:

I know this server/GPU configuration works though as I've previously used it on a baremetal Ubuntu Server install on the same machine.

It is really strange to me because the system log does show nothing, have you yet tried to boot into Unraid GUI mode and see if you get a output from your GPU?

Maybe also try the legacy version 470.xx and reboot your server and see if this makes a difference.

Link to comment
19 hours ago, ich777 said:

But the K2000 can only transcode h264 (AVC) IIRC and if you want to h265 (HEVC) you are forced to use CPU.

Wouldn't be something more recent like a Nvidia T400 interesting to you?

yes, at the moment Im the only one who uses the plex server.  It was the GPU that came with the workstation and I wanted to test it I could use it.  I would like to add an additional GPU and pass this old one through to a VM.

Link to comment
18 hours ago, ich777 said:

It is really strange to me because the system log does show nothing, have you yet tried to boot into Unraid GUI mode and see if you get a output from your GPU?

Maybe also try the legacy version 470.xx and reboot your server and see if this makes a difference.


Okay, so to update you on this:

Booting into GUI mode produced no video output from the GPU, however I was able to monitor progress via iLO. After loading SMB, it get stuck on a flashing cursor. The webpage does still work via a browser. I logged in, checked the plugin, and saw that it was displaying a GPU ID and that nvidia-smi was working still. However, seconds following this, the web UI became completely non-functional and refused to load pages. I tried to perform a graceful reboot via iLO but it errored out (attached a screenshot of it). It hung here for several minutes before I had to forcibly reboot the system.

Booting back into non-GUI mode works fine, but the plugin ceases to work. It did cause a parity check to be started, but no big deal.

Two things to note: This is a fresh unRAID install (less than a week old) and switching to legacy 470.XX drivers resulted in the same issue.

Super weird, right?

reboot.PNG

Link to comment
6 hours ago, ocean_breeze said:

Two things to note: This is a fresh unRAID install (less than a week old) and switching to legacy 470.XX drivers resulted in the same issue.

Wait, I completely missed that you've bound the card to VFIO or at least it seems that it's bound to VFIO but I didn't see any indication in the logs that the system itself bound it to VFIO, did you maybe do that manually? :

09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
    Subsystem: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:119e]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidia_drm, nvidia
09:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
    Subsystem: NVIDIA Corporation GP104 High Definition Audio Controller [10de:119e]

 

Taking a deeper look into it from what I can see you bind the card in your OPNSense VM (why?) :

-device '{"driver":"vfio-pci","host":"0000:09:00.0","id":"hostdev0","bus":"pci.4","addr":"0x0"}'

 

  • Like 1
Link to comment
6 hours ago, ich777 said:

Taking a deeper look into it from what I can see you bind the card in your OPNSense VM (why?) :


It shouldn't be bound there at all. But you're right, it was. I've removed that and nvidia-smi as well as the plugin page are both working now. No clue how it got bound.

Thank you so much!

Edited by ocean_breeze
  • Like 1
Link to comment
Quote

Please be sure to never use one card for a VM and also in docker containers (your server will hard lock if it's used in a VM and then something want's to use it in a Container).

 

If i have the driver installed but no containers running that will use it, is it still safe to spin up a VM that does use the card?

Link to comment
1 hour ago, sage2050 said:

If i have the driver installed but no containers running that will use it, is it still safe to spin up a VM that does use the card?

Yes, but I don't recommend doing it like that since this can lead to hard crashes if not used properly.

Link to comment
18 minutes ago, sage2050 said:

in that case is there a way to disable the plugin?

No, you have to reboot.

 

19 minutes ago, sage2050 said:

I'm doing some A/B testing with a container vs a vm and i'm just looking for the safest and easiest way to go back and forth.

Just make sure that nothing is usig the card on the host or in a Docker container when starting up a VM and vice versa.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.