[Plugin] Nvidia-Driver


ich777

Recommended Posts

7 minutes ago, ich777 said:

What do you mean with that, the Driver Plugin only uses wget to download files...

Seems like there is something not configured properly...

 

This is always a bad idea to block the Google DNS server, why would you do that? :D

The plugin tries to ping 8.8.8.8 because it's an anycast IP address and is basically always online, just a little side note: not only my plugin tries to connect to 8.8.8.8 to check if there is actually a internet connection available...

In this case the boot takes about 30 seconds longer.

 

i had problems with unraid when IPv6 was enabled, when its enabled the vm's and docker container wont work with ipv6, so i disabled it and using the second NIC with IPv6 enabled only for the docker containers what need ipv6, and this solution works for me. the vm's got an ipv6 address and docker will work on ipv6.

but sometimes unraid itself will try to use that nic to download stuff by using the ipv6 address and this will fail of course because unraid isn't setup for ipv6...

 

i changed the firewall rule by only blocking 8.8.8.8 on port 53

(i want to prevent any device on my network to use googledns)

Link to comment
1 hour ago, sjaak said:

with IPv6 enabled only for the docker containers what need ipv6

But then something also let's Unraid itself use IPv6 otherwise this can't be the problem.

 

1 hour ago, sjaak said:

i changed the firewall rule by only blocking 8.8.8.8 on port 53

Then it should be fine since it will only ping 8.8.8.8

Link to comment
5 hours ago, Seven7527 said:

I can't figure out why its not working.

Have you also added the parameter '--runtime=nvidia' to the Extra Paramerters when you switch on the advanved view in your template?

 

Please keep in mind the application itself also has to be compatible with Nvidia and when it's not compiled with Nvidia support then it can't work.

Link to comment
Posted (edited)

@ich777 Sorry in advance If I should've created my own thread...

I am having trouble getting this plugin to detect my Nvidia Geforce GTX 1650.

 

I have tried all options for ACS Override as well as trying the different versions of the driver which show up in the plugin page (I have checked btw, and all these drivers should be compatible with the card).

On an earlier version of Unraid (pre 6.9) the same graphics card was working and I had even managed to get the Plex container to make use of it - but it stopped being detected one day out of nowhere.

 

Attached are various screenshots, showing the plugin page, IOMMU groups, and nvidia-smi output for each of the different ACS Override options, and additionally one image showing the system info.

 

What can I do to troubleshoot and fix this? The card works fine in another machine (Windows 10).

 

Thanks

Dan

 

ACS Disabled

iommu_acs_disabled.thumb.png.b20c33040e21b22aad35353660c3290d.png

plugin_acs_disabled.thumb.png.3bd3a18ca704035da0b5184315cf5edb.png

smi_acs_disabled.PNG.a86d38720a62aa6a7054ff438a19a905.PNG

 

ACS Downstream

iommu_acs_downstream.thumb.png.1633e0f4e9c0b2a472dd442f2cfecf53.png

plugin_acs_downstream.thumb.png.f16c4f3a4cc456ac2089ee003ceb86d0.png

smi_acs_downstream.PNG.722baac5df1382114082ec188ecc1e2c.PNG

 

ACS Multifunction

iommu_acs_multifunction.thumb.png.ea12b0de167e499e73998d02aef56f39.png

plugin_acs_multifunction.thumb.png.8bfb3f0ec79f3cfad5ca5a33f76d1941.png

smi_acs_multifunction.PNG.c0111f8d3cd7c9cf72aa1a349270f167.PNG

 

ACS Both

iommu_acs_both.thumb.png.02edca8b3bfa705944832aced3a42e9e.png

plugin_acs_both.thumb.png.a5eb0cecff45e58b42232942d250b11c.png

smi_acs_both.PNG.77c953fb64901f813d5c4bd2f7f9b9a2.PNG

 

System Info

systeminfo.PNG.96e07d91d32c046c79db781ac926fcd9.PNG

Edited by Ninjadude101
Labelling images - Previously I did not expect the images to be inserted in-line
Link to comment
43 minutes ago, Ninjadude101 said:

I have tried all options for ACS Override as well as trying the different versions of the driver which show up in the plugin page (I have checked btw, and all these drivers should be compatible with the card).

This would nothing do to the plugin and the way that the driver detects the card.

 

43 minutes ago, Ninjadude101 said:

What can I do to troubleshoot and fix this? The card works fine in another machine (Windows 10).

Without the Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded zip file here in the text box) I really can't tell what's the problem here.

 

EDIT: Also I would recommend to upgrade to 6.9.2 first.

Link to comment
1 hour ago, ich777 said:

This would nothing do to the plugin and the way that the driver detects the card.

 

Without the Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded zip file here in the text box) I really can't tell what's the problem here.

 

EDIT: Also I would recommend to upgrade to 6.9.2 first.

 

Ok thanks - I have upgraded Unraid to 6.9.2 and ACS is back on disabled since you said that doesn't matter.

The plugin still shows "No devices were found" and nvidia-smi says the same.

 

Diagnostics attached

 

Thanks

empress-diagnostics-20210527-1705.zip

Link to comment
20 minutes ago, Squid said:

May or may not be your problem, but you might also want to update the plugin..

Oh yeah, good point - thanks.

Unfortunately that made no difference to the problem - still can't find the card according to the plugin page and the nvidia-smi output. I have changed my driver selection to use the "Production branch" (I'm doubtful it makes any difference here but once it's working I want to stick to stable builds only).

 

New diagnostics attached in case that's needed again. The server has been restarted twice - once after updating the plugin and once after changing the driver selection. The diagnostics are from the most recent boot.

 

Thanks

Dan

empress-diagnostics-20210527-1805.zip

Link to comment
1 hour ago, Ninjadude101 said:

New diagnostics attached in case that's needed again. The server has been restarted twice - once after updating the plugin and once after changing the driver selection. The diagnostics are from the most recent boot.

This seems like a hardware related issue, are you sure that your card get's enough power from your power supply?

Please make sure that you enable above 4G decoding in your motherboard.

 

Is it eventually possible that you put the card in a system where it works under Windows, create a new Unraid USB Key with a Trail key (don't start the array), install the CA App, then install the Nvidia Driver plugin and after that look if the card is recognized on this system (just to be sure that it's not a issue of the hardware combination cpu/motherboard/gpu?

 

The last thing that you can do is to reset your BIOS (only if you know how to set back everything back so that you can boot Unraid).

 

Also make sure that you boot Unraid with Legacy Mode and not UEFI.

Link to comment
Posted (edited)

Greetings all!

First of all, I love this plugin. It works great and I really appreciate all the work you've put into it!

 

My rig (running 6.9.2) has a Quadro P2200, and I've observed that any driver newer than 460.73.01 causes a bug check/kernel panic. If I install one of these newer drivers and try to load the plugin page, it will never load (because the system has locked up). Alternatively, if I connect via the terminal and run 'nvidia-smi,' this will also cause the machine to lock up.

 

I've tested up through version 465.31 and have experienced the same results. For now I'll stay on 460.73.01 as I assume this is an NVIDIA driver issue. I just wanted to get a post out there in case anyone else runs into this. If a future version resolves this, I'll note it here.

 

If you do happen to get stuck in this scenario, here's how I've gotten out of it:

- Reboot your machine. When it comes back up, avoid going to the dashboard or the nvidia driver settings page

- Uninstall the nvidia driver plugin

- Reboot, then reinstall the nvidia driver plugin. Close the window when it starts downloading the driver package (I know it says not to, but this part is important)

- Now go to the nvidia driver plugin page. Select the driver version you want to download, then click the Download button on the left

- Reboot your machine. When it comes back up, reinstall the nvidia driver plugin. The installer will use the local version instead of downloading the latest, and your machine should hopefully be back in business! Test things are ok by running 'nvidia-smi' or opening the driver plugin page.

 

Cheers,

altrhombus

Edited by altrhombus
Link to comment
Posted (edited)
3 hours ago, ich777 said:

This seems like a hardware related issue, are you sure that your card get's enough power from your power supply?

Please make sure that you enable above 4G decoding in your motherboard.

 

Is it eventually possible that you put the card in a system where it works under Windows, create a new Unraid USB Key with a Trail key (don't start the array), install the CA App, then install the Nvidia Driver plugin and after that look if the card is recognized on this system (just to be sure that it's not a issue of the hardware combination cpu/motherboard/gpu?

 

The last thing that you can do is to reset your BIOS (only if you know how to set back everything back so that you can boot Unraid).

 

Also make sure that you boot Unraid with Legacy Mode and not UEFI.

 

The card only takes power from the motherboard and it SHOULD be enough but that's something I need to check - maybe with all the hard drives I've added since the time it worked has caused a shortage of power?

Not sure what above 4G decoding is so will look into that.

 

It's a pain in the butt but yes if it comes to it I have another system I can use to temporarily run unraid and test the card that way.

 

The BIOS has been reset at least twice since I have been having the problem so I'm not convinced I could make a difference doing it again, and I am booting with Legacy Mode already so that's definitely not the issue.

 

I will try the 4G decoding, check the power situation, and try the card in a temp unraid server and get back to you.

Thanks

Edited by Ninjadude101
  • Like 1
Link to comment
On 5/1/2021 at 1:17 PM, jtmoore81 said:

The only hardware I changed was adding the P2000 and it's been a while since I updated the bios. This is a custom build with new hardware and I'm not sure what gnif/vnedor-reset patch is so I'm assuming I don't have it.

@jtmoore81

 

I was having the same problem with a blank screen for the nvidia driver plugin and the system locking up until the other GPU in my system was bound to vfio. This was with unraid version 6.9.2. With version 6.9.1, I was able to get it working without binding the GPU to vfio as long as the BIOS was configured to legacy boot. 

Link to comment
7 hours ago, Ninjadude101 said:

Not sure what above 4G decoding is so will look into that.

This should be a setting in your BIOS.

 

7 hours ago, Ninjadude101 said:

It's a pain in the butt but yes if it comes to it I have another system I can use to temporarily run unraid and test the card that way.

I can't think of a different way to check if the card is working and to check if there isn't any weird hardware combination related issue on your existing setup.

Link to comment

  

14 hours ago, ich777 said:

This should be a setting in your BIOS.

 Unfortunately it's already enabled 

 

14 hours ago, ich777 said:

I can't think of a different way to check if the card is working and to check if there isn't any weird hardware combination related issue on your existing setup.

I haven't tried that yet but I ran the server details through a PSU calculator and even if I over-exaggerate on the number of drives etc, my PSU should have plenty of overhead. It's about 400-460W recommended but the fitted PSU is 750W.

 

I'll endeavour to follow the other suggestion (temporary unraid server using an entirely different machine - one where this card can work in Windows) tomorrow.

Thanks

  • Like 1
Link to comment

I've run into an issue I can't figure out.

I noticed my Plex server stopped using hardware transcode, so it was sucking up the CPU instead.
When I added "--runtime=nvidia" to the extra parameters in the docker from Linux Server guys.

I was using the latest version of the driver (v465.31), and the UUID of the GPU was showing.

So I decided to go the for the production version instead, and rebooted the server.
Now it's no longer showing the UUID of the GPU.
 

I can't figure out what has changed, and I'm not sure how to troubleshoot it.

Any ideas?

Link to comment
1 hour ago, Nanobug said:

Now it's no longer showing the UUID of the GPU

Please post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded zip file here in the text box), otherwise I can't say anything...

 

What does it show instead?

Link to comment
2 hours ago, Nanobug said:

It's just blank

What is the output of 'nvidia-smi' in the console?

 

2 hours ago, Nanobug said:

I've added the diagnostics below.

Thank you! :)

 

EDIT: Can you try to do a 'modprobe nvidia' (without qutoes) from a terminal after you tried the command 'nvidia-smi' from above?

Link to comment
12 minutes ago, ich777 said:

What is the output of 'nvidia-smi' in the console?

Nothing:

image.png.7e05b7640d7730636a78dca798409b03.png

13 minutes ago, ich777 said:

EDIT: Can you try to do a 'modprobe nvidia' (without qutoes) from a terminal after you tried the command 'nvidia-smi' from above?

This:
image.png.7214e22a546d058e6450ef9cc24baee7.png
 

Link to comment
Posted (edited)
23 minutes ago, ich777 said:

It seems that the driver isn't installed.

Can you try to press "Download" on the Plugin page and tell me what happens?

It popped up with a new driver, so I need to reboot. But right now (before the reboot) it says:
image.png.5107fda4a1ecc27b1cec0c07b54c46e4.png

Gonna reboot, and get back to you in 5-10 minutes.

 

EDIT:

It's working atm...

Edited by Nanobug
Link to comment
Posted (edited)

It works now.
I did verify the checksum before I posted here, since I was talking to the Linux Server guys on Discord.

It's hard to tell what worked, since it had an update to the production driver at the same time.

Anyway, thank you for your help, as always :)

 

EDIT:
One side question;

How do I make the GPU part work on the unRAID dashboard?
It looks like this:
image.png.49b49905917beab70708229d3d2e25fd.png

It's a P2200.

Edited by Nanobug
Link to comment
7 minutes ago, Nanobug said:

It's hard to tell what worked, since it had an update to the production driver at the same time.

Eventually the download failed and it can't install the driver package, but on the other hand that can't happen because I have built in a check if the download fails that the plugin installation aborts.

Really don't know what caused this...

 

7 minutes ago, Nanobug said:

How do I make the GPU part work on the unRAID dashboard?

That's a good question, some cards display more than others do, what is the output from 'nvidia-smi' now?

Eventually make a short post in the GPU Statistics plugin support page, @b3rs3rk is the real master when it comes to this plugin. :)

Link to comment
Just now, ich777 said:

Eventually the download failed and it can't install the driver package, but on the other hand that can't happen because I have built in a check if the download fails that the plugin installation aborts.

Really don't know what caused this...

It checked okay before I started having issue.

I only noticed it since it randomly saw it CPU transcoded on my Plex server.

1 minute ago, ich777 said:

That's a good question, some cards display more than others do, what is the output from 'nvidia-smi' now?

Eventually make a short post in the GPU Statistics plugin support page.

Looks fine to me, and it's transcoding as well:
image.png.d39d2909b1243326e3b24d5924388d20.png

image.png.be4653d3bbc19bbfc6f8a0bafa32021c.png

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.