[Plugin] Nvidia-Driver


ich777

Recommended Posts

Just now, ich777 said:

Have you now changed the DNS server from the server?

I've had 3 DNS servers on 2 different bits of hardware fail on me in the last 24 hours (power blackouts really aren't fun), so yes I have; it's now set to a known public DNS provider and everything "just works". :)

  • Like 1
Link to comment
2 hours ago, ich777 said:

Yes please delete the two plugins with the 'Delete' button and then install first the Nvidia and then the GPU Statistics plugin (you can already do that with the running Parity Check).

 

And PiHole or something like that is not running on your Unraid box?

 

Have you the Diagnostics for me so I can look into why it won't install or at least the syslog from this boot?

 

I reinstalled both and will try a reboot. Adguard is installed and the whole network is using it. Is that the issue?

 

 

 

Edited by mkayze
Link to comment
16 minutes ago, mkayze said:

Adguard is installed and the whole network is using it. Is that the issue?

No should not be an issue, just make sure that you don't block the access to Github in AdGuard.

 

17 minutes ago, mkayze said:

I reinstalled both and will try a reboot.

Please report back, it should hopefully work, if not please post the Diagnostics again (since these are individual on every reboot).

Link to comment
42 minutes ago, michaelmateria said:

mephisto-diagnostics-20210323-0845.zip 140.79 kB · 0 downloads

 

Thank you for your assistance, I appreciate it.

This is from your syslinux.cfg:

BOOT_IMAGE=/bzimage isolcpus=6-11,18-23 vfio-pci.ids=10de:1f02,10de:10f9,10de:1ada,10de:1adb initrd=/bzroot

 

Make it that it looks like this (I actually don't know if you isolated the cpus 6-11,18-23 or by accident so I didn't removed it):

BOOT_IMAGE=/bzimage isolcpus=6-11,18-23 initrd=/bzroot

 

To change that go to 'Main' -> 'Flash' (click on the blue text that from your boot device) and scroll a little down and change the line, after you made the changes click on 'Apply' at the bottom and restart your server.

Link to comment
16 hours ago, ich777 said:

No should not be an issue, just make sure that you don't block the access to Github in AdGuard.

 

Please report back, it should hopefully work, if not please post the Diagnostics again (since these are individual on every reboot).

 

Its still happening. Checked the plugin error tab and nvidia driver is showing up.

 

 

EDIT: I think i fixed it, On the Nvidia Driver settings tab. I selected v460.67 instead of Latest and that seemed to fix it.  I should've read this part

"Please keep in mind if you update Unraid to a newer version and there is no internet connection on boot available the download of the driver will fail and you have to reinstall the Plugin."

 

Since Adguard docker isn't running yet after a reboot it would make sense that it cant connect to the internet yet

 

lowkey-diagnostics-20210323-1837.zip

Edited by mkayze
  • Like 1
Link to comment
4 hours ago, mkayze said:

Since Adguard docker isn't running yet after a reboot it would make sense that it cant connect to the internet yet

Exactly, so it can't resolve the DNS of the download link and will fail.

 

Eventually try to set a custom DNS entry like 8.8.8.8 for the Google public DNS server because I always recommend dath you set the servers DNS's server always to the one from your ISP or a public (in my opinion the server always needs exclusive access to the internet without any blocking service in front of it).

 

You can configure your Containers always to use a custom one by adding '--dns=8.8.8.8' to the Extra Parameters in your Docker templates if you open the Advanced View.

Link to comment
1 hour ago, ich777 said:

Exactly, so it can't resolve the DNS of the download link and will fail.

 

Eventually try to set a custom DNS entry like 8.8.8.8 for the Google public DNS server because I always recommend dath you set the servers DNS's server always to the one from your ISP or a public (in my opinion the server always needs exclusive access to the internet without any blocking service in front of it).

 

You can configure your Containers always to use a custom one by adding '--dns=8.8.8.8' to the Extra Parameters in your Docker templates if you open the Advanced View.

 

 

I actually have the Unraid box set to cloudfare dns, I think I changed that a while ago. Not sure why its still failing to connect on reboot.

 

Do I still have to set it up per container?

Edited by mkayze
Link to comment
1 minute ago, mkayze said:

I actually have the Unraid box set to cloudfare dns, I think I changed that a while ago. Not sure why its still failing to connect on reboot.

 

Something is wrong with the DNS for sure (eventually try the google DNS servers 8.8.8.8):

Quote

Mar 23 18:33:34 Lowkey root: -------Main download URL not reachable, using Fallback URL-------
Mar 23 18:33:38 Lowkey nmbd[2649]: [2021/03/23 18:33:38.769435,  0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
Mar 23 18:33:38 Lowkey nmbd[2649]:   *****
Mar 23 18:33:38 Lowkey nmbd[2649]:   
Mar 23 18:33:38 Lowkey nmbd[2649]:   Samba name server LOWKEY is now a local master browser for workgroup WORKGROUP on subnet 192.xxx.x.x
Mar 23 18:33:38 Lowkey nmbd[2649]:   
Mar 23 18:33:38 Lowkey nmbd[2649]:   *****
Mar 23 18:33:46 Lowkey root:
Mar 23 18:33:46 Lowkey root: -----ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR------
Mar 23 18:33:46 Lowkey root: ---Can't get latest Nvidia driver version and found no installed local driver---

 

Something seems really wrong, normally it downloads the driver at installing the plugin from the CA App, what's really strange is that it can't find a installed driver or do you upgrade from 6.9.0 to 6.9.1 in this Diagnostics?

 

Do you run a virtualizd firewall on your Unraid box?

Link to comment

Hi. Id like to ask if this is expected/normal. Have P400 installed primarily for plex transcoding. Works great but noticed that on server reboot or when upgrading nvidia plugin (or some other instances) , gpu and memory frequency are at max (P1 power state) and not idling (P8). Playing then stopping a plex video while transcoding would then make the gpu idle afterwards. 

Link to comment
1 hour ago, heille1221 said:

Playing then stopping a plex video while transcoding would then make the gpu idle afterwards. 

Yes this is a standard behaviour or at least how the manufacturer of the card implemented it in the BIOS of the card itself, some need some kind of load after booting up the server so that they detect that they can now idle others do this by default.

 

You can also switch on persistence mode but this would be the last thing I would do since Nvidia already announced that they will drop this feature some time in the future.

  • Thanks 1
Link to comment
18 minutes ago, ich777 said:

Yes this is a standard behaviour or at least how the manufacturer of the card implemented it in the BIOS of the card itself, some need some kind of load after booting up the server so that they detect that they can now idle others do this by default.

 

You can also switch on persistence mode but this would be the last thing I would do since Nvidia already announced that they will drop this feature some time in the future.

I see. Thanks.

  • Like 1
Link to comment

Has anyone here experienced this error happening? I am on 6.9.1 and using the latest nvidia driver (v460.67). I am running a Quadro P2200 for Plex transcoding and a GT710 for Unraid video out.

 

I've had this error appear 4 times since Sunday, and it completely locks up the server, requiring a hard reset and parity check.

 

The line in particular that makes me think it's related to the driver is this one, right after the end trace (maybe I'm wrong... idk):

RIP: 0010:_nv036095rm+0x4/0x70 [nvidia]

 

I also want to say that it seems like it started after I updated to this latest driver (v460.67).

 

The first time it happened, I had to uninstall and re-install the driver because I wasn't able to get DNS to work without PiHole, but I fixed that, and the driver is back installed and working. I can transcode in Plex still. The error just randomly happens.

 

I was going try reverting the driver back to an earlier version to see if it helps, but was just curious if anyone else had seen this, or could tell me what to do to troubleshoot it.


Error:

nvidia-error.thumb.png.35118f7f3d28be2a3e1cc936cf31275f.png


Nvidia Driver Package:

nvidia-plugin.thumb.png.5be923e8608839337f3c7b9fa95c8f6e.png

 

GPU Statistics w/ transcoding working:

gpu_stats_transcoding.png.3ef2451bb2ef965e5d3ba1122cd6bd67.png

Link to comment
6 minutes ago, carnivorebrah said:

GT710 for Unraid video out

Can you try to pull the GT710 from the server if you only use it for video output?

Because you can actually use the P2000 for video out and Plex at the same time and it will also save power.

 

8 minutes ago, carnivorebrah said:

I was going try reverting the driver back to an earlier version to see if it helps

I can't imagine that's the problem but you can at least try it.

 

Please report back.

  • Like 1
Link to comment
49 minutes ago, ich777 said:

Can you try to pull the GT710 from the server if you only use it for video output?

Because you can actually use the P2000 for video out and Plex at the same time and it will also save power.

 

I can't imagine that's the problem but you can at least try it.

 

Please report back.

 

Oh, wow. That just goes to show how much of a newb I am. lol

 

I had no idea I could run the video for Unraid on the same GPU as my transcoding GPU.

 

GT710 has been removed. Unraid video out and Plex transcoding confirmed working on just the P2200.

 

Will report back in a few days if it hasn't happened by then.

  • Like 1
Link to comment
1 hour ago, carnivorebrah said:

Will report back in a few days if it hasn't happened by then.

I got a few reports about issues with P2000 cards, after some troubleshooting it turned out that they where defective. Seems like for some cards the lifespan is over... But that's only a guess from me and doesn't mean that it the case for your card.

 

Can you post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded file here in the text box).

 

Please also make sure that you are on the latest BIOS version.

  • Like 1
Link to comment
8 hours ago, ich777 said:

I got a few reports about issues with P2000 cards, after some troubleshooting it turned out that they where defective. Seems like for some cards the lifespan is over... But that's only a guess from me and doesn't mean that it the case for your card.

 

Can you post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded file here in the text box).

 

Please also make sure that you are on the latest BIOS version.

diagnostics-20210327-2247.zip

 

Diagnostics attached.

 

I really hope it's not a bad card. I've only had it for 2 months. It worked fine up until last weekend, but I guess it could happen.

 

I'm 99.99% sure I updated both the BIOS and BMC/IPMI firmware when I built the server. I will have to check the BIOS on the next reboot though, if it locks up again.

 

I have a feeling (mainly hope) pulling the GT710, and running the Unraid video out through the P2200 fixed it though.

 

I'm running a Supermicro X9DRL-iF, which only has an open-ended 8x PCI-E 3.0 slot (top) that I have the P2200 in. The other two are closed 8x PCI-E 3.0 slots, and I was using the bottom for the GT710 just to display Unraid video out. Now video out for Unraid is going through the top slot.

 

If this does solve it, hopefully I can put the GT710 back in, stub it at boot and use it to pass through to something else.

  • Like 1
Link to comment

Following the installation I ran into the dreaded "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver."

 

I do not believe there is an issue with the hardware, and I've made sure that neither of the two GPUs are bound to VFIO (they do appear in the PCI Devices and IOMMU Groups under System Devices in Unraid.)

 

Another odd thing I found while following along SpaceInvader One's setup video (utilizes this plugin) was that after I added the "GPU Statistics" plugin none of my GPUs appeared on the dropdown list:

 

1844681293_Screenshot2021-03-30132118.png.fce90655280d60e74305f906e93ecdcb.png

 

While debugging this issue I made sure that the settings in my BIOs we're OK and during a reboot I decided to check the system logs and saw this:

1417636920_Screenshot2021-03-30130659.png.0d33739c1884d70b75e69e6dcd2453d7.png

 

Which reads 

Quote

 

Unregistered the Nvlink Core, major device number 246

Nvlink Core is being initialized, major device number 246

The NVIDIA Quadro 4000 GPU installed in this system is supported through the NVIDIA 390.xx Legacy drivers. Please visit http://www.nvidia.com/object/unix.html for more information. The 465.19.01 NVIDIA driver will ignore this GPU. Continuing probe...

 

 

Could this be a part of my issue?

 

Thanks for all your help 😃

Edited by greaterbeing
Link to comment
2 hours ago, ICDeadPpl said:

Does the new 465.89 driver's "GeForce GPU Passthrough for Windows Virtual Machine" functionality make any difference for this plugin?

For Linux there is no such driver, I think you are talking about the Windows driver...

For Linux the latest driver is version 465.19.01

 

But no this makes no difference because now they allow officially allow to use their GeForce Cards in Linux, before this was simply a "workaround".

 

 

57 minutes ago, greaterbeing said:

Could this be a part of my issue?

To use the card in Docker containers you have to be at least on Kepler and your card is Fermi based so to speak you can't use this card in Docker containers because it's simply too old and not supported by the driver.

 

Visit the link that you have linked in your post (this is also the link in the first post to the supported cards) and you see a list of cards when you click on the corresponding driver (I know currently the driver version 465.19.01 isn't listed there but you can click on the latest that is available version 460.67 - this is actually the same list and applies to newer drivers as well).

Link to comment
12 minutes ago, ich777 said:

For Linux there is no such driver, I think you are talking about the Windows driver...

For Linux the latest driver is version 465.19.01

 

But no this makes no difference because now they allow officially allow to use their GeForce Cards in Linux, before this was simply a "workaround".

 

 

To use the card in Docker containers you have to be at least on Kepler and your card is Fermi based so to speak you can't use this card in Docker containers because it's simply too old and not supported by the driver.

 

Visit the link that you have linked in your post (this is also the link in the first post to the supported cards) and you see a list of cards when you click on the corresponding driver (I know currently the driver version 465.19.01 isn't listed there but you can click on the latest that is available version 460.67 - this is actually the same list and applies to newer drivers as well).

They said it is for Linux hosts running Windows VMs:
"With virtualization enabled, GeForce customers on a Linux host PC can now enable GeForce GPU passthrough on a virtual Windows guest OS.  There are a few GeForce use cases where this functionality is beneficial such as:

GeForce customers wanting to run a Linux host and be able to launch a Windows virtual machine (VM) to play games 

Game developers wanting to test code in both Windows and Linux on one machine"

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.