[Plugin] Nvidia-Driver


ich777

Recommended Posts

20 minutes ago, unraidxiaobai said:

Please help me diagnose the problem.

You inject a private Docker registry to /etc/docker/daemon.json from your go file.

Please remove that from your go file and reboot your server.

After you‘ve rebooted you will see that your daemon.json has the changes for the nvidia runtime in it.

If you want to keep that private Docker registry, you have to also customize your injection into the file that it includes this changes for the nvidia runtime.

 

From what I saw it also looks very suspicious that you have:

{undefined

at the start, are you sure that this is correct?

Link to comment

I am running Unraid 6.10.3 under the esxi 6.7u3 with passthrough devices.  I have found that current nvidia drivers do not work in that configuration.  Last driver version that was working correctly is 470.82.00 . 

 

How do I build / install this driver on my system ?  How do I prevent nvidia-driver package from "upgrading" that to an incompatible version ?

 

Here is the Nvidia thread explaining the issue. https://forums.developer.nvidia.com/t/nvidia-smi-no-devices-were-found-vmware-esxi-ubuntu-server-20-04-03-with-rtx3070/202904/36

Link to comment
49 minutes ago, Darek said:

I am running Unraid 6.10.3

I have to be completely honest, I'm a little frustrated because you are double posting (Private Message and here in the support thread).

I won't answer any quicker...

 

I've already answered via PM but I will copy paste it here and leave the first paragraph out because you've don't mentioned the Unraid version in your PM:

Quote

 

55 minutes ago, Darek said:

alternatively provide a way to install it and not automatically upgrade/remove

 

What do you mean with this? If you upgrade the Unraid version or on reboots?

If you don't want to upgrade it on reboots then simply change the driver version to the version that you want to stay on instead of selecting latest.

I can't/wont change the behavior for Unraid upgrades since I don't know how long Nvidia will support the legacy 470.xx drivers, if they want to they can support tomorrow (or even today). That's the reason why there is no legacy button in the driver plugin in terms of the version selection.

 

Quote

 

55 minutes ago, Darek said:

The 470.82 drivers were the last proprietary drivers to work in that configuration

 

I can't compile these drivers for newer Unraid versions because they won't compile against newer Kernel versions which Unraid is on, if you want that old drivers you have to stick with that old Unraid versions.

 

Quote

 

55 minutes ago, Darek said:

515.43.04/kernel-opensource

 

Please remember this is the proprietary driver and not the open source one... I also won't compile the open source Kernel module because this would basically break the driver for every card pre Ampere so to speak this would also break compatibility with your system because you are having a Turing card.

 

BTW, are you sure that driver version 470.141.03 is also not working because these are the "newer" legacy drivers which actually support newer Kernel versions: Click

  • Like 1
Link to comment
2 hours ago, DavidNguyen said:

Container failed to load after a reboot. I've disabled the extra parameters for now

Diagnostics GUI not working so sending diagnostics via PM.

I see nothing suspicious in your Diagnostics, can you give me the output from your docker run command with --runtime=nvidia

 

BTW Please remove this line from your go file:

# Start AMD drivers
modprobe amdgpu

Because Unraid does that on it's own and also the RadeonTOP Plugin is modprobing your GPU if Unraid wont.

Link to comment
2 hours ago, ich777 said:

can you give me the output from your docker run command with --runtime=nvidia

root@localhost:# /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='Plex-Media-Server' --net='host' -e TZ="Australia/Sydney" -e HOST_OS="Unraid" -e HOST_HOSTNAME="Tower" -e HOST_CONTAINERNAME="Plex-Media-Server" -e 'PLEX_CLAIM'='claim-C71axTEgtyALFeQPPAFu' -e 'PLEX_UID'='99' -e 'PLEX_GID'='100' -e 'VERSION'='latest' -e 'NVIDIA_VISIBLE_DEVICES'='GPU-06da0ef1-768c-8997-2635-5dddcc599084' -e 'NVIDIA_DRIVER_CAPABILITIES'='all' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:32400]/web' -l net.unraid.docker.icon='https://raw.githubusercontent.com/plexinc/pms-docker/master/img/plex-server.png' -v '/tmp':'/transcode':'rw' -v '/mnt/user/':'/data':'rw' -v '/mnt/user/appdata/Plex-Media-Server':'/config':'rw' --runtime=nvidia 'plexinc/pms-docker'
2474d3059f7df110dde81210862c5d861afedf8092b5b0fd0858ab44624b5591
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown.

The command failed.

 

Link to comment
1 minute ago, DavidNguyen said:
root@localhost:# /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='Plex-Media-Server' --net='host' -e TZ="Australia/Sydney" -e HOST_OS="Unraid" -e HOST_HOSTNAME="Tower" -e HOST_CONTAINERNAME="Plex-Media-Server" -e 'PLEX_CLAIM'='claim-C71axTEgtyALFeQPPAFu' -e 'PLEX_UID'='99' -e 'PLEX_GID'='100' -e 'VERSION'='latest' -e 'NVIDIA_VISIBLE_DEVICES'='GPU-06da0ef1-768c-8997-2635-5dddcc599084' -e 'NVIDIA_DRIVER_CAPABILITIES'='all' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:32400]/web' -l net.unraid.docker.icon='https://raw.githubusercontent.com/plexinc/pms-docker/master/img/plex-server.png' -v '/tmp':'/transcode':'rw' -v '/mnt/user/':'/data':'rw' -v '/mnt/user/appdata/Plex-Media-Server':'/config':'rw' --runtime=nvidia 'plexinc/pms-docker'
2474d3059f7df110dde81210862c5d861afedf8092b5b0fd0858ab44624b5591
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown.

The command failed.

 

Please reboot your server once and try it again.

 

Do you have a power saving script for your GPU installed on your server? If yes, please remove it first and instead add this to your go file before rebooting:

# Start nvidia-persistenced
nvidia-persistenced

 

Also make sure that you are on the latest version from the Plugin.

Link to comment
47 minutes ago, ich777 said:

Try to uninstall the Nvidia Driver Plugin, reboot, pull a fresh copy from the CA App and reboot again.

 

42 minutes ago, ich777 said:

From what I see you are booting with UEFI mode, please try to boot with Legacy (CSM) mode and also disable all the C-State options in your BIOS.

 

I tried both of these, still no joy.

Edited by DavidNguyen
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.