[Plugin] Nvidia-Driver


ich777

Recommended Posts

Just now, jaddel said:

I just updated my post and added the diagnostic.

I see nothing suspicious from your Diagnostics, are you sure the card is working properly and the PCIe slot provides enough power to the card?

Is it possible that you eventually install the card in a Desktop system and see if it works properly (you have to install the driver and also put some load on it).

 

Also try to reseat the card or try another PCIe slot.

What you can also try is to reset the BIOS but only if you know what you are doing.

 

Keep in to not bind the card to VFIO, otherwise the plugin can't see the card.

Link to comment
2 minutes ago, ich777 said:

I see nothing suspicious from your Diagnostics, are you sure the card is working properly and the PCIe slot provides enough power to the card?

Is it possible that you eventually install the card in a Desktop system and see if it works properly (you have to install the driver and also put some load on it).

 

Also try to reseat the card or try another PCIe slot.

What you can also try is to reset the BIOS but only if you know what you are doing.

 

Keep in to not bind the card to VFIO, otherwise the plugin can't see the card.

- The card is running fine in my main system. 
- The board has only one PCIE so I cant reseat. 

- I'm aware of not to bind the card to VFIO.
- The bios has already been reset several times. 

I'll bind it via VFIO and try to setup a win10 vm and see how it goes. 

Link to comment
44 minutes ago, jaddel said:

- The card is running fine in my main system. 
- The board has only one PCIE so I cant reseat. 

- I'm aware of not to bind the card to VFIO.
- The bios has already been reset several times. 

The last thing that I can recommend to put the card in your main system if possible and creating a new Trail USB Boot device for Unraid and boot into Unraid on your main system without starting the Array but installing the CA App and install the Nvidia Driver plugin and see if it's detected there.

 

I can only thing of a weird hardware combination problem, if you click back one or two pages in this thread you will also see someone with a 3rd or 4th generation Intel CPU and having the exact same problem but for some reason it started working at some point.

Link to comment

everything works fine, except I keep getting this error in the logs:

 

Jun 20 00:19:23 Server kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Jun 20 00:19:23 Server kernel: NVRM: This can occur when a driver such as:
Jun 20 00:19:23 Server kernel: NVRM: nouveau, rivafb, nvidiafb or rivatv
Jun 20 00:19:23 Server kernel: NVRM: was loaded and obtained ownership of the NVIDIA device(s).
Jun 20 00:19:23 Server kernel: NVRM: Try unloading the conflicting kernel module (and/or
Jun 20 00:19:23 Server kernel: NVRM: reconfigure your kernel without the conflicting
Jun 20 00:19:23 Server kernel: NVRM: driver(s)), then try loading the NVIDIA kernel module
Jun 20 00:19:23 Server kernel: NVRM: again.
Jun 20 00:19:23 Server kernel: NVRM: No NVIDIA devices probed.
Jun 20 00:19:23 Server kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 244

 

I can't seem to get any stats on the GPU in the unraid gui, but the passthrough works just fine in Windows VM ... it's not really a show stopper, would be nice to get stats in unraid ... and not fill the log with this error ... any help is appreciated.

 

image.thumb.png.abfa54af9930f64f4bc8e9a6a5a472a5.png

Link to comment
44 minutes ago, Ahmad said:

I can't seem to get any stats on the GPU in the unraid gui, but the passthrough works just fine in Windows VM ... it's not really a show stopper, would be nice to get stats in unraid ... and not fill the log with this error ... any help is appreciated.

Please post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded zip file here in the text box).

 

I really can't say anything without knowing what card that you own and so on...

Link to comment
14 hours ago, ich777 said:

Please post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded zip file here in the text box).

 

I really can't say anything without knowing what card that you own and so on...

attached the diagnosis,

 

the card is a Quadro RTX 4000, I read elsewhere that I shouldn't expect the driver to be used when passing through the card entirely to the guest OS and that the plugin is mainly useful for using the GPU with Docker containers ... is that an accurate understanding?

 

removing the plugin didn't seem to have any negative effects ... and I don't see the errors in the logs after removal.

 

does using the plugin ad any benefit to a vm passthrough gpu? can I get the stats in unraid gui if it's passthrough?

 

appreciate the help for as I'm trying to understand all this!

server-diagnostics-20210620-1532.zip

Link to comment
1 minute ago, Ahmad said:

I read elsewhere that I shouldn't expect the driver to be used when passing through the card entirely to the guest OS and that the plugin is mainly useful for using the GPU with Docker containers ... is that an accurate understanding?

The plugin is mainly for that use case if you want to use the card for example in your Docker container for transcoding, foldingathome, boinc, mining,... (there are many use cases)

 

4 minutes ago, Ahmad said:

does using the plugin ad any benefit to a vm passthrough gpu? can I get the stats in unraid gui if it's passthrough?

No, this plugin does nothing if you are using it like in your case in a VM and bound it to VFIO, no you can't get the stats of it if it's passed through to the VM because the VM has exclusive access to it.

  • Thanks 1
Link to comment
1 minute ago, ich777 said:

No, this plugin does nothing if you are using it like in your case in a VM and bound it to VFIO, no you can't get the stats of it if it's passed through to the VM because the VM has exclusive access to it.

thank you for clarifying! makes sense!

 

any clues from the attached diagnosis why it doesn't detect the card? (even with VM off / no passthrough)

Link to comment
Just now, Ahmad said:

any clues from the attached diagnosis why it doesn't detect the card? (even with VM off / no passthrough)

Yes, because you've bound the card to VFIO and if you bind it to VFIO it is exclusively reserved for VMs and strictly speaking the host, or better speaking in this case the driver that runs on the host, can't see it.

  • Thanks 1
Link to comment
3 minutes ago, ich777 said:

Yes, because you've bound the card to VFIO and if you bind it to VFIO it is exclusively reserved for VMs and strictly speaking the host, or better speaking in this case the driver that runs on the host, can't see it.

ah ... well that also makes sense ... (I have to learn more about VFIO I guess)

 

and thank you for the prompt response!

 

side note: I would encourage you to setup GitHub Sponsors so folks can buy you "a cup of coffee" for your efforts! cheers!

Edited by Ahmad
Link to comment

Anyone have a Quick look into this for me please? My Emby GPU has stopped working and i cant understand why - kernel: NVRM: GPU 0000:42:00.0: RmInitAdapter failed!

I have attached logs, I have removed the plugin and restarted the server and re added the plugin and re started and disabled the docker servoce and re started etc but i cant understand Nothings been touched or changed and the servers never moved :(

 

Cheers

 

Lee

r720xd-diagnostics-20210622-1052.zip

Link to comment
5 hours ago, Dazog said:

The driver is already auto compiled but the driver isn't listed on their download site so you actually can't install it because I grab the driver versions from there otherwise this will be a completely mess, or you switch to latest, then it should be listed if I'm not mistaken... :D

Driver

MD5

 

41 minutes ago, leeknight1981 said:

I have removed the plugin and restarted the server and re added the plugin and re started and disabled the docker servoce and re started etc but i cant understand Nothings been touched or changed and the servers never moved :(

I would first try to remove your "script" eventually that's the problem.

  • Like 1
Link to comment

I know this behaviour (or similar) has been raised recently but I cannot find the definitive solution so I am reaching out for some help.

 

I cannot get to the graphical logon prompt on a screen connected directly to the server in unRAID OS GUI Mode, all I get is a black screen following the text boot sequence (which is displayed on screen as you would expect). Once at the black screen I can ctrl-F1 which will take me to the command prompt where I can log in as if I was in normal unRAID OS mode. Everything else operates (including access to the WEB-GUI from a browser on a seperate machine) just fine. 

 

I run a Supermicro - X10SL7-F motherboard which has Aspeed AST2400 onboard video. I have the Nvidia Plugin installed so I can leverage the power of my installed Nvidia GeForce GTX 1050 Ti in Docker Containers. 

 

I have done some troubleshooting and run the server in unRAID OS GUI Safe Mode where the graphical login prompt comes up just fine. This helped me identify that it was likely a Plugin causing the behaviour. Through a process of elimination I got to this plugin. When installed I can't access the graphical login prompt, when removed - I can.

 

I am running unRAID version 6.9.2

I am running Nvidia Plugin version 2021.05.19

Nvidia driver version installed is 470.42.01 (latest as of time of writing)

I do not boot UEFI.

My BIOS is set to prioritise the onboard Aspeed video and my screen is plugged into the VGA port on the motherboard.

 

Diagnostics are attached.

 

unraid-diagnostics-20210623-1323.zip

Link to comment
2 hours ago, danioj said:

I know this behaviour (or similar) has been raised recently but I cannot find the definitive solution so I am reaching out for some help.

The easiest solution would be to disable the GPU from the motherboard and use the output from the Nvidia GPU.

 

The second solution, if you want to use the Aspeed GPU from the motherboard would be that you create the file: '/etc/modprobe.d/ast.conf' and do that from the terminal:

sed -i '/disable_xconfig=/c\disable_xconfig=true' "/boot/config/plugins/nvidia-driver/settings.cfg"

after that reboot.

 

A user had that problem once but I can't remember if he needs to create the file on the USB Boot device but I think so.

What you have to do for sure is to issue the command that I've posted.

I recommend trying the second solution if you want to use the GPU from your motherboard with the file and the command.

 

Please let me know if it works.

  • Thanks 1
Link to comment
4 hours ago, ich777 said:

The easiest solution would be to disable the GPU from the motherboard and use the output from the Nvidia GPU.

 

The second solution, if you want to use the Aspeed GPU from the motherboard would be that you create the file: '/etc/modprobe.d/ast.conf' and do that from the terminal:



sed -i '/disable_xconfig=/c\disable_xconfig=true' "/boot/config/plugins/nvidia-driver/settings.cfg"

after that reboot.

 

A user had that problem once but I can't remember if he needs to create the file on the USB Boot device but I think so.

What you have to do for sure is to issue the command that I've posted.

I recommend trying the second solution if you want to use the GPU from your motherboard with the file and the command.

 

Please let me know if it works.


Thanks for the guidance. For now I’ve just switched to using the graphics card and have switched the BIOS to prioritise the card. All is well. 
 

If my use case changes and I need to use the card in a VM then I might try your solution to allow me to use the on board GPU for maintenance. Until then, the easy solution works given I only have unRAID and no VMs using the card. 

Link to comment
4 minutes ago, danioj said:

If my use case changes and I need to use the card in a VM then I might try your solution to allow me to use the on board GPU for maintenance. Until then, the easy solution works given I only have unRAID and no VMs using the card. 

If you want to use the Aspeed GPU then simply follow the instructions above and it should also just work fine.

 

If you experience any problems please feel free to contact me again. :)

  • Thanks 1
Link to comment
14 hours ago, ich777 said:

The driver is already auto compiled but the driver isn't listed on their download site so you actually can't install it because I grab the driver versions from there otherwise this will be a completely mess, or you switch to latest, then it should be listed if I'm not mistaken... :D

Driver

MD5

 

I would first try to remove your "script" eventually that's the problem.

100% its the nvidia plugin as i removed it my GUI doesn't lock up and NO Errors, Re install the plugin and it fails with the same error in the log's 

r720xd-diagnostics-20210623-1209.zip

Link to comment
5 minutes ago, leeknight1981 said:

100% its the nvidia plugin as i removed it my GUI doesn't lock up and NO Errors

Yes but first a de-installation of such "scripts" is necessary to properly troubleshoot the card.

 

Does your GUI also lock up and freeze with the Nvidia Driver installed?

 

Have you upgraded recently (Unraid, BIOS, Driver version,...)?

This error is normally a sign that the card doesn't initialize properly, can you test the Card in a desktop computer, install the driver (driver installation is necessary because the basic display output mostly works properly) and put a 3D load on it?

 

Please also try to swap the PCIe slot if possible and/or reseat the card.

This seems like a power issue or a failure of the card, but that's only a guess.

 

Btw: sadly enough your Intel iGPU is a little too old for transcoding HEVC but h.264 should work fine on it if you need a temporary solution.

Link to comment
4 minutes ago, ich777 said:

Yes but first a de-installation of such "scripts" is necessary to properly troubleshoot the card.

 

Does your GUI also lock up and freeze with the Nvidia Driver installed?

 

Have you upgraded recently (Unraid, BIOS, Driver version,...)?

This error is normally a sign that the card doesn't initialize properly, can you test the Card in a desktop computer, install the driver (driver installation is necessary because the basic display output mostly works properly) and put a 3D load on it?

 

Please also try to swap the PCIe slot if possible and/or reseat the card.

This seems like a power issue or a failure of the card, but that's only a guess.

 

Btw: sadly enough your Intel iGPU is a little too old for transcoding HEVC but h.264 should work fine on it if you need a temporary solution.

Its all been working 100% OK No Issues till the NVIDIA Plugin UpDated, transcoding was fine also multiple streams. Its in a Dell R720XD Server with 2 1100w PSU's, Card works fine I don't believe this to be a hardware fault at all. 

Link to comment
55 minutes ago, leeknight1981 said:

Its all been working 100% OK No Issues till the NVIDIA Plugin UpDated, transcoding was fine also multiple streams.

When does it update or what do you mean exactly?

 

Can you please be a little bit more specific? Also you don't answered this question:

Quote

Does your GUI also lock up and freeze with the Nvidia Driver installed?

 

Did the driver updated or did the plugin update?

 

Have you already tried to pick a driver from the stable branch?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.