Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

Hello,

 

I just got an Gigabyte GeForce RTX 4060 OC Low Profile. The "Supported products" shows RTX 4060 so I hope the Low Profile is not a problem.

 

The error that I am getting is

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


image.thumb.png.d07b9b04238e8d2e310d16b329860bee.png

 

I have followed the instruction and it did not work so I gave the server a reboot but that didn't work either. So I tried uninstalling, reboot, install, then finally another reboot.

 

I found this in the syslog but I am still new to unraid so I am not sure where to go next.

 

Apr  5 08:28:58 AYSON-UNRAID root: plugin: community.applications.plg installed
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Apr  5 08:28:58 AYSON-UNRAID root: Executing hook script: post_plugin_checks
Apr  5 08:28:58 AYSON-UNRAID root: plugin: installing: nvidia-driver.plg
Apr  5 08:28:58 AYSON-UNRAID root: Executing hook script: pre_plugin_checks
Apr  5 08:28:58 AYSON-UNRAID root: plugin: running: anonymous
Apr  5 08:28:58 AYSON-UNRAID root: plugin: checking: /boot/config/plugins/nvidia-driver/nvidia-driver-2024.01.19.txz - MD5
Apr  5 08:28:58 AYSON-UNRAID root: plugin: skipping: /boot/config/plugins/nvidia-driver/nvidia-driver-2024.01.19.txz already exists
Apr  5 08:28:58 AYSON-UNRAID root: plugin: running: upgradepkg --install-new /boot/config/plugins/nvidia-driver/nvidia-driver-2024.01.19.txz
Apr  5 08:28:58 AYSON-UNRAID root:
Apr  5 08:28:58 AYSON-UNRAID root: +==============================================================================
Apr  5 08:28:58 AYSON-UNRAID root: | Installing new package /boot/config/plugins/nvidia-driver/nvidia-driver-2024.01.19.txz
Apr  5 08:28:58 AYSON-UNRAID root: +==============================================================================
Apr  5 08:28:58 AYSON-UNRAID root:
Apr  5 08:28:58 AYSON-UNRAID root: Verifying package nvidia-driver-2024.01.19.txz.
Apr  5 08:28:58 AYSON-UNRAID root: Installing package nvidia-driver-2024.01.19.txz:
Apr  5 08:28:58 AYSON-UNRAID root: PACKAGE DESCRIPTION:
Apr  5 08:28:58 AYSON-UNRAID root: Package nvidia-driver-2024.01.19.txz installed.
Apr  5 08:28:58 AYSON-UNRAID root: plugin: creating: /usr/local/emhttp/plugins/nvidia-driver/README.md - from INLINE content
Apr  5 08:28:58 AYSON-UNRAID root: plugin: running: anonymous
Apr  5 08:28:58 AYSON-UNRAID root:
Apr  5 08:28:58 AYSON-UNRAID root: --------------------Nvidia driver v550.67 found locally---------------------
Apr  5 08:28:58 AYSON-UNRAID root:
Apr  5 08:28:58 AYSON-UNRAID root: --------------Installation of Nvidia driver v550.67 successful--------------
Apr  5 08:28:58 AYSON-UNRAID root: plugin: nvidia-driver.plg installed

 

 

 

ayson-unraid-diagnostics-20240405-1529.zip

Link to comment
6 hours ago, aysonohbata said:

there is no card in the devices listed ...

 

may check

 

1/ latest bios on your mainboard (1645, yours is 1630)

2/ card sits properly in the pcie slot

3/ GPU is in which slot ? major one ?

4/ power supply attached (properly)

5/ BIOS setup

 - above 4g enabled

 - rbar activated

 - primary display igpu (onboard from your 12700k)

 

that would be my first steps to look for ...

 

in sum, your syslog currently shows only 1 VGA device which is your iGPU, like there is no dGPU (Nvidia), so rather a hardware issue and no software ...

  • Like 1
Link to comment
10 hours ago, alturismo said:

there is no card in the devices listed ...

 

may check

 

1/ latest bios on your mainboard (1645, yours is 1630)

2/ card sits properly in the pcie slot

3/ GPU is in which slot ? major one ?

4/ power supply attached (properly)

5/ BIOS setup

 - above 4g enabled

 - rbar activated

 - primary display igpu (onboard from your 12700k)

 

that would be my first steps to look for ...

 

in sum, your syslog currently shows only 1 VGA device which is your iGPU, like there is no dGPU (Nvidia), so rather a hardware issue and no software ...

 

I thought my graphic card didn't require the extra power cable but it did. I think it worked once I plugged it. Thank you for your help!

 

  • Like 1
Link to comment
3 hours ago, aysonohbata said:

I thought my graphic card didn't require the extra power cable but it did. I think it worked once I plugged it.

If there is a connector for external power on a Nvidia graphics card then it is always necessary to plug it in.

  • Like 1
Link to comment

I too have been recently puzzled by my GPU not working at all for transcoding. I have searched and came across this thread (with others pointing to it) and made the change to add the " --runtime=nvidia" to the EXTRA-PARAMETERS line (and have yet to test that this actually made it work yet), but did notice this warning (highlighted in Yellow below) in the console output after I made the change (I'm guessing that I just didn't notice this warning before). I have 32GB of RAM but notice before that if i let the container float with availablity to all RAM of the system that sometimes I could fill it up and it would crash the system. So I decided to limit it to 16GB and I never noticed this warning before. I recently have been having my system just die randomly and I am trying to track down what is causing it. Has anyone seen this issue before and I thought that this used to work (at least in previous Unraid versions)?

 

"

  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:32400]/web/index.html'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/linuxserver/docker-templates/master/linuxserver.io/img/plex-logo.png'
  -v '/mnt/user/Media/TV/':'/tv':'rw'
  -v '/mnt/user/Media/Movies/':'/movies':'rw'
  -v '/dev/shm':'/transcode':'rw'
  -v '/mnt/disks/Plex/appdata/plex/':'/config':'rw'
  --runtime=nvidia
  --gpus all,capabilities=video
  --log-opt max-size=1m
  --log-opt max-file=1
  --memory=16g 'lscr.io/linuxserver/plex'

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
8388ca9751b2a1154aff24ff5355b240c21bdff2ae561ea4f469c3717f7c54ef

The command finished successfully!

"

Link to comment
3 hours ago, Fastcompjason said:

  --memory=16g 'lscr.io/linuxserver/plex'

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

This is nothing to worry about and has nothing to do with the Nvidia Driver plugin.

 

You have set a custom system memory limit from 16G for the container and Docker itself is complaining that no swap is found basically <- on Unraid there is no swap partition.

 

3 hours ago, Fastcompjason said:

I too have been recently puzzled by my GPU not working at all for transcoding.

What do you mean with that, the driver is working perfectly fine alongside with transcoding. ;)

 

3 hours ago, Fastcompjason said:

I have searched and came across this thread (with others pointing to it) and made the change to add the " --runtime=nvidia" to the EXTRA-PARAMETERS line (and have yet to test that this actually made it work yet)

You should really read the second post in this thread.

  • Like 1
Link to comment
3 hours ago, konglingfei_unraid said:

It is normal for tesila P40 to be installed under Windows, but after I installed the driver through unraid, it was prompted that the device could not be found;

Can you share a bit more information please?

Do you use the card in a VM or are you talking about a physical (maybe different) Windows machine?

If you are talking about the card beeing in Unraid and working in a VM, is the card bound to VFIO?

 

Diagnostics would help a lot.

  • Like 1
Link to comment

Whenever I try to boot unraid with Nvidia 3050 plugged in using a riser cable it won't even get to the bios page, it doesn't do anything. when I turn it off and pull power to the GPU it starts up just fine. It is a Gigabyte B550I motherboard running bios FCb using AMD Ryzen 7 5700G with Radeon Graphics as CPU.  I have the Nvidia and the GPU statistics plugin installed. Nvidia Driver Version: 550.67. If I boot the system and then plug the GPU in I get the error reading "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running"

Thanks

Link to comment
17 minutes ago, gecl said:

Thanks

well, thats more hardware related ...

 

1st you have to make sure you can boot with your GPU plugged in, may look at the BIOS setup like described many times here already ...

 

1/ above 4g decode activated

2/ rbar activated

3/ primary GPU onboard (iGPU)

...

Link to comment

Hello,

 

I am having an issue after upgrading to 6.12.10. I then reverted to 6.12.9 after various tests and the issue persists.

 

After upgrading to 6.12.10 the Nvidia plugin disappeared (my Plex Docker not starting is what tipped me off to this), I reinstalled the plugin and it shows the GPU is installed. Additionally, `nvidia-smi` shows the GPU is detected. However, the Dashboard in Unraid has a blank space where it would normally report the GPU information and Plex does not seem to be leveraging the GPU (the Plex container does start).

 

I have previously tried uninstalling, rebooting, reinstalling, rebooting, etc. Currently, I uninstalled, disabled Docker, reinstalled, and started Docker but the issue persists.

 

I have attached my diagnostics and included some screenshots below. Any help would be greatly appreciated.

 

Maybe there is a more complete way to uninstall and purge related data before attempting to install again?

 

629797498_CleanShot2024-04-10at15_50_07.png.78bac2442c240653311dd8ef3135d5d1.png

87428190_CleanShot2024-04-10at15_50_47.png.5ba79f7462e3dcf4c797df1a32de49d2.png

1743749285_CleanShot2024-04-10at15_51_59.png.b434330bd12d41f3096da914c60d4dc0.png

 

supremacy-diagnostics-20240410-1543.zip

Edited by wreave
Link to comment
14 minutes ago, wreave said:

However, the Dashboard in Unraid has a blank space where it would normally report the GPU information and Plex does not seem to be leveraging the GPU (the Plex container does start).

Pleas install the GPU Statistics plugin, the Nvidia Driver plugin alone is not responsible for showing the GPU Statistics.

 

As for your second issue this seems more container related since the driver is recognized fine.

Are you sure you are forcing a transcode within Plex (please don‘t use the web player since it is known to not work properly when forcing a transcode instaed use a native client for iOS, Android,…)?

Link to comment
13 minutes ago, ich777 said:

Pleas install the GPU Statistics plugin, the Nvidia Driver plugin alone is not responsible for showing the GPU Statistics.

 

As for your second issue this seems more container related since the driver is recognized fine.

Are you sure you are forcing a transcode within Plex (please don‘t use the web player since it is known to not work properly when forcing a transcode instaed use a native client for iOS, Android,…)?


You nailed it in 2. Obviously more than just the Nvidia plugin went missing after my upgrade (and I blanked out that it was a different plugin for stats) and the transcoding is correct on Android. Appreciate such a quick response. Do you have a donate link?

Link to comment

I've purchased a Tesla P4 video card, I intend to use it for converiting videos to x265 using tdarr or unmanic. The card has no video outputs, its a server/data center card.

 

  1. What will I need to use it and have Unraid recognize it?
  2. I also read somewhere that only Windows is able to power down Nvidia cards when not in use, and in any other OS it will continue to draw a lot of power? is this true?
  3. I believe this card has no locked limits, using it for a vGpu should be easy?

 

Link to comment
4 hours ago, MrCrispy said:
  • What will I need to use it and have Unraid recognize it?
  •  

usually, install the plugin like described on page 1

 

4 hours ago, MrCrispy said:
  • I also read somewhere that only Windows is able to power down Nvidia cards when not in use, and in any other OS it will continue to draw a lot of power? is this true?
  •  

search this thread for nvidia-persistence mode

 

4 hours ago, MrCrispy said:
  1. I believe this card has no locked limits, using it for a vGpu should be easy?

 

Page 1, Red Text ... NOT supported here as its a violation from the NV eula ...

  • Upvote 1
Link to comment
On 2024/4/10 at AM2点34分, ich777 said:

您能分享更多信息吗?

您在虚拟机中使用该卡还是在谈论物理(可能不同)Windows 计算机?

如果你说的是Unraid中的卡并在VM中工作,那么该卡是否绑定到VFIO?

 

诊断会有很大帮助。

Reinstall a new unriad operating system and the problem is solved

  • Like 1
Link to comment

Hi @ich777,

 

First of all, thank you for your plugin. It has been really helpful.

 

I have a different request. If you could let me know the easiest path to complete this myself, that would be more than enough.

 

There has been a version of the open source kernel that enables P2P support in 3090/4090 GPUs for ML workloads.

 

Link here:

 

https://github.com/tinygrad/open-gpu-kernel-modules

 

Would it be possible to get instructions on how to compile for Unraid? I have tried to look at your public repo but can't fully grasp your pipeline for building for Unraid. Namely, the compile_opensource.sh file. Even then, I'm not 100% sure how I can load that and persist it inside of Unraid.

 

Is there a way to do what I'm looking to do? Thank you!

Link to comment
41 minutes ago, ybdave said:

If you could let me know the easiest path to complete this myself, that would be more than enough.

Please look at this repository, this is basically a Docker container which downloads a precompiled Kernel for your running Unraid version and you can then compile whatever you want.

 

41 minutes ago, ybdave said:

There has been a version of the open source kernel that enables P2P support in 3090/4090 GPUs for ML workloads.

I can't support that officially because that's too risky for me since if something breaks or is not working properly it can be a nightmare in terms of support.

I'm also not sure if this violates the Nvidia EULA but from what I read it should not...

 

41 minutes ago, ybdave said:

Would it be possible to get instructions on how to compile for Unraid?

Seems like it is possible, you can for example try to compile the open source driver first with my script from my Nvidia Driver plugin repository.

 

41 minutes ago, ybdave said:

Even then, I'm not 100% sure how I can load that and persist it inside of Unraid.

The main issue is that you have to create a package (which is also done in the compile_opensource.sh file) and you have to install it every time on boot and even if you upgrade the Unraid version you have to recompile the package again for the new Kernel.

 

41 minutes ago, ybdave said:

Is there a way to do what I'm looking to do?

I think with the above linked Docker container it should be pretty simple.

 

Just a question, wouldn't it be simpler to pass the card through to a VM and then install this driver in the VM?

Link to comment

Is it possible to use the GPU to do the transcoding in Handbrake and MKVToolNix? For example in Handbrake when i select the Hardware H.265 NEVC 1080p option my CPU does the work, while the GPU sits idle. IDK if i need to change a prefrence in the GUI or if its through the docker edit function.

Thanks

Link to comment
34 minutes ago, gecl said:

IDK if i need to change a prefrence in the GUI or if its through the docker edit function.

Please read the second post of this thread.

All things that need to be changed are described in there.

Link to comment

Hello everyone,

I installed an Nvidia T400 in my Unraid server on Saturday. The card is recognized in the POST screen. I then installed the drivers and the GPU Statistic Plugin. After the successful driver installation, I followed the instructions and switched both the Docker and VMs in the settings off and on again (of course 'applied' in between), and restarted the server. I am also greeted by the GPU statistics on the start page:

image.png.473bfaef26255cb2f015ae2f5056438a.png

 

Then I wanted to go into the plugin for the driver, but in doing so the GUI hangs. Closing and reopening the browser window does not work. But e.g. my Adhome Docker is still running. So I perform a hard reset and the same thing happens again, the GUI hangs up. Hard reset again, this time all dockers stopped, but again the GUI hangs up. Then I uninstalled the Nvidia driver plugin, restarted the server and installed it again. But the problem remains :( I sometimes waited up to 10 minutes for the plugin. The constant hard resets are certainly not good for the USB stick either...

What else could be the problem?

 

Unraid 6.12.10

AMD Ryzen 7 1700

Asus Prime B450M-K II

2x 16 GB RAM 2133 MHz from Corsair (XMP is not activated)

Nvidia Quadro T400

 

I have attached my diagnostics. Do you need any more info?

(This is a cross post. I have the same question posted in the german section but then thought, it would be better to ask here instead)

unraidserver-diagnostics-20240422-0932.zip

Link to comment
44 minutes ago, BastiKA84 said:

AMD Ryzen 7 1700

Since this is a Ryzen first gen CPU make sure to disable C-States in the BIOS.

 

I assume you have the card still installed in your system and you have simply uninstalled the driver correct? This seems more of a hardware compatibility issues and has nothing to do with the plugin itself.

 

Your BIOS version is also out of date:

    Version: 3810
    Release Date: 11/21/2022

 

The newest version on the Asus site is 4604 (08/04/2024) <- I expect that this update also will help the crashing but please also disable the C-States since on Ryzen 1st gen this was quiet a big issue if it's enabled.

Make also sure to enable Resizable BAR Support and Above 4G decoding in the PCI settings.

 

Also make sure the Power Supply is still up to the task, your T400 shouldn't draw too much power but just to be on the safe side make sure it is still adequate.

 

BTW In your Diagnostics I see that the plugin is still installed. However make sure to follow the things above that I recommended and see if that solves your issue.

 

May I ask if a normal shutdown is working? Just press the Power button 2x short and you should hear *beep beep* and this is the indication for a normal shutdown.

 

It would be also helpful if such a thing happens if you could pull the Diagnostics, connect through SSH to the server and type in `diagnostics` this will generate Diagnostics that you can pull from /boot/logs/ after the reboot.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...