[Plugin] Nvidia-Driver


ich777

Recommended Posts

2 hours ago, ICDeadPpl said:

They said it is for Linux hosts running Windows VMs:

As @Squid said, they also said this at the bottom of the press release SOURCE:

Quote

Do you need to have more than one GPU installed or can you leverage the same GPU being used by the host OS for virtualization?
One GPU is required for the Linux host OS and one GPU is required for the Windows virtual machine.

 

Have to agree with @Squid that hopefully in the Guest OS will be no more code 43 but that has nothing to do with the Nvidia Plugin since you are installing it in the Guest OS - Windows.

Link to comment
3 hours ago, ich777 said:

To use the card in Docker containers you have to be at least on Kepler and your card is Fermi based so to speak you can't use this card in Docker containers because it's simply too old and not supported by the driver.

 

Visit the link that you have linked in your post (this is also the link in the first post to the supported cards) and you see a list of cards when you click on the corresponding driver (I know currently the driver version 465.19.01 isn't listed there but you can click on the latest that is available version 460.67 - this is actually the same list and applies to newer drivers as well).

 

Am I out of luck for both of the cards then? Is it possible to use the NVIDIA 340.xx Legacy drivers as mentioned in the logs?

Link to comment
8 hours ago, greaterbeing said:

Am I out of luck for both of the cards then?

I think so, if the cards are not listed you can't use them in Docker containers.

 

8 hours ago, greaterbeing said:

Is it possible to use the NVIDIA 340.xx Legacy drivers as mentioned in the logs?

For what? You can't use your cards in Docker containers with the old driver. As said the container tools that are necessary only support Kepler and newer.

Link to comment
7 hours ago, greaterbeing said:

Do you know if its possible to pass them thru to a VM if I can't use docker?

Passthrough should work.

 

3 hours ago, z0ki said:

Or will the nvidia plugin be smart enough to detect a different card?

It should be smart enough if it's a supported card. ;)

But you have to change the UUID's of you card in the Docker containers so that they actually can work again.

Link to comment

I don't use Nvidia GPUs myself but in the General Support section I'm seeing a lot of other users' diagnostics with this spamming the syslog over and over again, every second or so, eventually filling up the log space, but otherwise seemingly harmless:

 

Apr  1 17:31:32 Tower kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr  1 17:31:32 Tower kernel: caller _nv000708rm+0x1af/0x200 [nvidia] mapping multiple BARs
Apr  1 17:31:33 Tower kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr  1 17:31:33 Tower kernel: caller _nv000708rm+0x1af/0x200 [nvidia] mapping multiple BARs

 

At first I thought it only affects users with Quadro cards but this particular one has a GT 710. The thing they do have in common though is that the users have AMD motherboards, at least all the ones I've seen so far do.

 

EDIT: It's more universal than that. See the reply, two messages below this.

 

 

A quick search revealed this report from 2017, which suggests a buggy BIOS (though it isn't clear what hardware the problem affected then) but promised the messages would be suppressed in an update. It looks as though the problem has resurfaced. Perhaps a future driver update will fix it. Maybe someone with the appropriate combination of hardware can help monitor the situation.

 

 

Edited by John_M
It affects Intel motherboards too
  • Like 1
Link to comment
On 3/27/2021 at 11:32 PM, carnivorebrah said:

Diagnostics attached.

 

I really hope it's not a bad card. I've only had it for 2 months. It worked fine up until last weekend, but I guess it could happen.

 

I'm 99.99% sure I updated both the BIOS and BMC/IPMI firmware when I built the server. I will have to check the BIOS on the next reboot though, if it locks up again.

 

I have a feeling (mainly hope) pulling the GT710, and running the Unraid video out through the P2200 fixed it though.

 

I'm running a Supermicro X9DRL-iF, which only has an open-ended 8x PCI-E 3.0 slot (top) that I have the P2200 in. The other two are closed 8x PCI-E 3.0 slots, and I was using the bottom for the GT710 just to display Unraid video out. Now video out for Unraid is going through the top slot.

 

If this does solve it, hopefully I can put the GT710 back in, stub it at boot and use it to pass through to something else.

 

UPDATE: Still going without issues. Thanks for the help!

  • Like 1
Link to comment
8 hours ago, John_M said:

spamming the syslog over and over again

This also happens on my dev machine with a GTX1050Ti and a Xeon processor C602 Chipset motherboard this is mainly because CONFIG_WATCHDOG is enabled in the Kernel to support more NCT hardware monitor chips.

This is a known "bug" in the Nvidia driver and this happens every time nvidia-smi is called.

If a user have installed the GPU Statistics plugin and the user is on the Dasboard this happens then every one second or so because it calls nvidia-smi every second and produces this message in the syslog.

 

 

  • Like 1
Link to comment
4 minutes ago, ich777 said:

Xeon processor

 

Thanks for confirming that it's a more universal problem than I thought.

 

5 minutes ago, ich777 said:

If a user have installed the GPU Statistics plugin and the user is on the Dasboard this happens then every one second or so because it calls nvidia-smi every second and produces this message in the syslog.

 

So, if the user does not have the GPU Statistics plugin installed the messages don't appear so often in the syslog? That would explain why only some people are affected.

 

I appreciate that the message is harmless in itself but it make reading the syslog difficult and it also causes the log to fill up rapidly, which then triggers other problems. OK, thanks again. I can at least ask people to uninstall it temporarily while troubleshooting and they can decide what to do afterwards. I hope Nvidia suppresses the messages in a future release.

 

Link to comment
24 minutes ago, John_M said:

I appreciate that the message is harmless in itself but it make reading the syslog difficult and it also causes the log to fill up rapidly, which then triggers other problems.

I could hide the message also with a few modifications in my plugin that modifies the syslog daemon to except that warning but I really don't wan't to do that because that can make troubleshooting the plugin/driver for me even worse.

 

25 minutes ago, John_M said:

So, if the user does not have the GPU Statistics plugin installed the messages don't appear so often in the syslog?

Yes and no, every time nvidia-smi is called it's producing this message (users can try it if the go into the Nvidia-Driver plugin itself because it also calls nvidia-smi to get the UUID and other data that it needs to work - but nvidia-smi is only called one time).

 

27 minutes ago, John_M said:

Thanks for confirming that it's a more universal problem than I thought.

Exactly this should happen on all systems when nvidia-smi is called.

 

28 minutes ago, John_M said:

I can at least ask people to uninstall it temporarily while troubleshooting and they can decide what to do afterwards.

As said, the spamming only happens if the user is on the Dashboard page with the GPU Statistics Plugin installed.

 

29 minutes ago, John_M said:

I hope Nvidia suppresses the messages in a future release.

I don't think Nvidia will hide this message...

Link to comment
Just now, John_M said:

OK, thanks. It is what it is, then. At least I can point people here when they have problems. Ah, the joys of closed source drivers!

Nvidia is also a really cool thing since you can game in a Container, for example look at my Nvidia-DebianBuster container, you can basically stream your games from Steam (in a Docker) to any device you want with basically 0 things to do and it's all pre configured except for your Steam account. ;)

 

2 minutes ago, John_M said:

Still loving my $50 AMD APU, BTW :) 

I know, these APU's are also really cool things, I also switched to a CPU with integrated graphics (i5-10600) and I'm really happy with ti. :)

  • Like 1
Link to comment

Hello, I'm trying to add an nvidia 1030 card to my server and having a few problems. I have never had a GPU in unraid before and have no idea how to tell if it is even being seen by the motherboard or OS. I see nothing in System Devices, and only have this mention of graphics which would be the i3-8100 onboard.
 

Quote

IOMMU group 2:[8086:3e91] 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]

 

Sorry, I switched to firefox yesterday and code format button gives me a failure error

Is there some very basic requirement I need to add a GPU to unraid that I have possibly overlooked?
Any terms I can use in searching how to troubleshoot this? I am wanting to use the card with tdarr to test some encoding, up until now it has been used in my desktop PC and works as far as I know. Fan spins, but no bios or gui visible from hdmi output, and boot message gives a `modprobe: ERROR: could not insert 'nvidia' no such device`

Link to comment
15 hours ago, John_M said:

Quite a number of people are installing the driver and then wondering why they can't pass the GPU through to a VM.

In general this should be possible to passthrough the graphicscard when the Nvidia Plugin is installed.

 

3 hours ago, maxkowalski said:

I see nothing in System Devices, and only have this mention of graphics which would be the i3-8100 onboard.

As @Squid said, it seems that there is something else wrong here.

Try to put in another PCIe slot and if it's actually recognized in the BIOS if you have the possibility to list the installed cards in the BIOS.

Link to comment
On 4/4/2021 at 1:11 PM, ich777 said:

In general this should be possible to passthrough the graphicscard when the Nvidia Plugin is installed.

 

Yes, sorry, I'm not saying that the presence of the plugin is incompatible with pass-though. The example I gave had two Nvidia cards with the intention of using one for transcoding and one for a VM. However, both were being bound to the driver so the solution was to stub the pass-through one, which people seem to be forgetting or aren't aware of.

 

One or two users are upgrading from 6.8 to 6.9, seeing the plugin, deciding they need it, installing it and then finding their VMs don't work. If they only have one Nvidia GPU and intend to use it for VMs then they don't need the driver and shouldn't install the plugin.

 

Link to comment
18 minutes ago, John_M said:

installing it and then finding their VMs don't work

This should also work because I got reports that users are using it that way but that involves a little more work to do since you actually have to pass through the card in the XML...

 

Have to try that on my server when I got time (think this will be somewhere next year... :D :P ).

 

23 minutes ago, John_M said:

If they only have one Nvidia GPU and intend to use it for VMs then they don't need the driver and shouldn't install the plugin.

You are completely right but where should I put that message? In the plugin itself? That's where I don't want to put it for various reasons...

Link to comment
1 hour ago, John_M said:

 

Maybe in the description (that people don't bother to read!) that appears in Community Apps, where most people go to install it.

I will add that to the description tomorrow, any idea what the text should be?

Something like: "If you plan to pass through your card to a VM don't install this plugin"?

 

I think you know english is not my native language but I try my best. :D

Link to comment

Evening @ich777 a quick heads up, i just upgraded to 6.9.2 (stable) and it looks like its causing some ui issues and problems with the preferred driver selection for your plugin, namely the versions available are not shown correctly and the preferred version i had selected was not honoured on reboot, thus a long reboot time and forced up to the latest version now available (465.19.01).

 

screenshot below:-

image.thumb.png.d923043292c50cdd730caa7359a09940.png

 

note the preferred version top right.

 

cheers!.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.