[Plugin] Nvidia-Driver

May I ask first what you want to do with card? If you want to use this card in Docker container this isn't possible. You need to be at least on driver v418.81.07 and the card needs to be at least Kepler based to use it in Docker containers(see documentation here). If you don't want to use the card in Docker containers and only want to install the driver so that you save some power, that is not possible with this plugin and even if would be possible it wouldn't save m

Legacy 3xx Drivers

Recommended by ich777

ich777

November 19, 20241 yr

The plugin is broken, you‘ll see that on the two php warnings on the page. Please do the following: Uninstall the Nvidia driver plugin, Reboot Pull a fresh copy from the CA App Go to the Nvidida Plugins website and slelct the driver that you want to use Reboot again

Driver Download Fails

Recommended by ich777

ich777

May 6, 20251 yr

Sorry but the "Legacy" driver isn't available any more for Kernel version 6.12.x and doesn't compile against these new Kernels, if Nvidia releases a new "Legacy" driver I'm happy to compile it but for now it seems that Pascal becomes the new "Legacy" cards. Please also see this conversation on GitHub for more information: Click

Legacy 4xx Drivers

Recommended by ich777

February 11, 20251 yr

13 hours ago, YsarKain said:

Here is what I am getting when I try to update the plugin:

plugin: updating: nvidia-driver.plg Executing hook script: pre_plugin_checks plugin: run failed: 'upgradepkg --install-new' returned 127 Executing hook script: post_plugin_checks

1 hour ago, ich777 said:

Please post your Diagnostics.

For what do you plan to use the card?

I finally took the time to have a look at un-get as I love tinkering with stuff, and I'm getting the same error code when attempting to install the .plg.

Modifying the un-get.plg replacing upgradepkg -install-new with installpkg makes it install though, but un-get itself throws the same errors when attempting to install a package.

Dug a bit deeper, and I found upgradepkg to have a new name..

╭─root@smart ~  
╰─➤  v /sbin | rg pkg                                                                                                                         
-rwxr-xr-x 1 root root 3.6K Apr 24  2021 explodepkg
-rwxr-xr-x 1 root root  29K Dec 22 20:09 installpkg
-rwxr-xr-x 1 root root  18K Sep 28 23:56 makepkg
-rwxr-xr-x 1 root root  17K May 14  2023 removepkg
-rwxr-xr-x 1 root root  16K Jan 25 17:56 upgradepkg-

I checked both my servers, and they're both the same, but with different timestamps for when upgradepkg- was modified.

Jan 25 on this, Feb 9 on the other (time of my prev reboot). This led me to wonder if Unraid Patch modifies this file.

Removed Unraid Patch, rebooted, and..

root@Yggdrasil ~ 
╰─➤ v /sbin | rg upgradepkg
-rwxr-xr-x 1 root root  16K Jan  9 23:15 upgradepkg

Installing Unraid Patch did not modify this file though, so not entirely sure this is the culprit.

TLDR: Unraid Patch may or may not cause upgradepkg to be renamed to upgradepkg-

[ -e /sbin/upgradepkg- ] && mv /sbin/upgradepkg- /sbin/upgradepkg

Edited February 11, 20251 yr by LAS

Quote

Replies 5.9k
Views 1m
Created 5 yr5 yr
Last Reply Jun 28Jun 28

Popular Days

Posted Images

February 11, 20251 yr

Author

35 minutes ago, LAS said:

Installing Unraid Patch did not modify this file though, so not entirely sure this is the culprit.

It would be very interesting on which Unraid version you are or at least the Diagnostics.

As said above on my test server with Unraid 7.0.0 everything is working correctly.

Quote

February 11, 20251 yr

12 minutes ago, ich777 said:

It would be very interesting on which Unraid version you are or at least the Diagnostics.

As said above on my test server with Unraid 7.0.0 everything is working correctly.

Sorry. Both systems on 7.0.0.

yggdrasil had upgradepkg modified Feb 9, while smart Jan 25.

yggdrasil is the one I just rebooted.

Had a quick look in the syslog, and it seems I ran some updates at the above dates.

smart-diagnostics-20250211-0733.zip yggdrasil-diagnostics-20250211-0733.zip

Quote

February 11, 20251 yr

Author

18 minutes ago, LAS said:

Had a quick look in the syslog, and it seems I ran some updates at the above dates.

So this is from my system:
grafik.png.4acba0909f96669c83430ac91982f458.png

Please keep in mind that this after a reboot, everything is just where it belongs.

I would rather recommend to create a bug report since this doesn't seem to be related to the Nvidia Driver plugin nor to un-get.

Quote

February 12, 20251 yr

Could someone help me with a problem?

When I try to access http://[IP]/Settings/nvidia-driver the unraid WebUI freezes for a few minutes, and I can't access the nvidia configuration page, because it keeps loading infinitely.

Accessing other pages also becomes impossible after trying to access the settings, because it keeps loading for a certain amount of time, and only after a few minutes the WebUI unfreezes if I have the Nvidia configuration page closed.

I don't know what it could be, I've already removed the driver completely, restarted the server, then reinstalled the driver via Community Apps and the issue remains.

I can use the GPU normally in my Dockers, but configuring it, choosing a driver, or anything like that requires the driver configuration page.

I have two GPUs on the server, a P2000 for transcoding and a simple GT to just be able to plug in a maintenance screen.

nashome-diagnostics-20250212-1319.zip

Quote

February 13, 20251 yr

On 1/30/2025 at 9:01 AM, ich777 said:

Can you please upload your Diagnostics instead of the Nvidia bug report when the error occurs? Are you also sure that the power supply is up to the task?

However what you are seeing is probably caused by a C library that is shipped with the container.

Are you also sure that the card isn't overheating or similar issues?

Hi @ich777 - seems the crash has happened again and nvidia-smi shows no devices found.

I haven't reboot the machine yet - as generally this will bring the card back.

Attaching diagnostics as requested previously.

trinity-diagnostics-20250213-1107.zip

Quote

February 13, 20251 yr

Author

14 hours ago, Rafael said:

Could someone help me with a problem?

Sorry for the late response...

You have multiple of these messages in your syslog:

Feb 12 02:05:00 NASHOME mcelog: Corrected memory errors on page 282c43000 exceed threshold 6 in 24h: 6 in 24h
Feb 12 02:05:00 NASHOME mcelog: Location SOCKET:0 CHANNEL:5 DIMM:? []
Feb 12 02:05:00 NASHOME mcelog: Fallback Socket memory error count 4 exceeded threshold: 225 in 24h
Feb 12 02:05:00 NASHOME mcelog: Location SOCKET:0 CHANNEL:? DIMM:? []
Feb 12 02:05:00 NASHOME mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Feb 12 02:05:00 NASHOME kernel: mce: [Hardware Error]: Machine check events logge

Are you sure that your RAM is okay? @JorgeB have you seen such messages before?

I assume they have to do with your issue with the driver.

Quote

February 13, 20251 yr

Author

6 hours ago, m0dded said:

I haven't reboot the machine yet - as generally this will bring the card back.

It seems that you have two Xid errors which you can learn about more here, however I've never seen two Xid errors at the same time.

Xid 79 is pretty useless since it could be everything (Firmware BUG, Temperature, GPU Error,...)

However from what I see in your syslog you also have a Kernel Trace which is probably related to a Docker container, from my perspective it looks like that a container causes this issue, do you run any machine learning or LLM on your system?

These are the relevant errors from your syslog:

Feb 12 20:34:23 Trinity kernel: NVRM: GPU at PCI:0000:01:00: GPU-68a24a84-1227-c298-42fe-359fe10a2390
Feb 12 20:34:23 Trinity kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Feb 12 20:34:23 Trinity kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Feb 12 20:34:23 Trinity kernel: NVRM: Xid (PCI:0000:01:00): 154, pid='<unknown>', name=<unknown>, GPU recovery action changed from 0x0 (None) to 0x2 (Node Reboot Required)
...
Feb 13 02:00:37 Trinity kernel: ------------[ cut here ]------------
Feb 13 02:00:37 Trinity kernel: WARNING: CPU: 10 PID: 3677216 at /tmp/selfgz119961/NVIDIA-Linux-x86_64-565.77/kernel/nvidia/nv.c:5243 nvidia_dev_put_uuid+0x33/0x4a [nvidia]
Feb 13 02:00:37 Trinity kernel: Modules linked in: nvidia_uvm(PO) xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod zfs(PO) spl(O) i2c_dev ntfs3 tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc igc atlantic nvidia_drm(PO) nvidia_modeset(PO) intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btusb btrtl btbcm btintel nvidia(PO) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 bluetooth sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd drm_kms_helper cp210x rapl input_leds intel_cstate ecdh_generic mei_hdcp mei_pxp joydev led_class usbserial ecc wmi_bmof drm nvme mei_me thunderbolt intel_uncore i2c_i801
Feb 13 02:00:37 Trinity kernel: i2c_smbus ahci nvme_core mei i2c_core libahci vmd thermal fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight acpi_pad acpi_tad button [last unloaded: igc]
Feb 13 02:00:37 Trinity kernel: CPU: 10 PID: 3677216 Comm: ollama Tainted: P           O       6.6.68-Unraid #1
Feb 13 02:00:37 Trinity kernel: Hardware name: ASUS System Product Name/ProArt Z790-CREATOR WIFI, BIOS 2801 11/29/2024
Feb 13 02:00:37 Trinity kernel: RIP: 0010:nvidia_dev_put_uuid+0x33/0x4a [nvidia]
Feb 13 02:00:37 Trinity kernel: Code: 53 e8 ff ca ff ff 48 85 c0 74 2f 48 89 c3 48 89 c7 48 89 ee e8 ca d9 ff ff 31 d2 48 89 de 48 89 ef e8 a3 d9 c2 00 85 c0 74 02 <0f> 0b 48 8d bb 28 06 00 00 5b 5d e9 9e 21 1b e1 5b 5d c3 cc cc cc
Feb 13 02:00:37 Trinity kernel: RSP: 0018:ffffc9000199fbb0 EFLAGS: 00010202
Feb 13 02:00:37 Trinity kernel: RAX: 0000000000000026 RBX: ffff888161350000 RCX: ffffc9000199fb30
Feb 13 02:00:37 Trinity kernel: RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffc9000199fae0
Feb 13 02:00:37 Trinity kernel: RBP: ffff8882130bb000 R08: 0000000000000000 R09: ffffc9000199fb58
Feb 13 02:00:37 Trinity kernel: R10: ffff888201378008 R11: 67acb795000c28ec R12: ffff88829353c000
Feb 13 02:00:37 Trinity kernel: R13: ffffc90001a290f8 R14: ffffc90001a2a140 R15: 0000000000000000
Feb 13 02:00:37 Trinity kernel: FS:  0000000000000000(0000) GS:ffff88985f480000(0000) knlGS:0000000000000000
Feb 13 02:00:37 Trinity kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 13 02:00:37 Trinity kernel: CR2: 000000c0000c7010 CR3: 0000000005416000 CR4: 0000000000752ee0
Feb 13 02:00:37 Trinity kernel: PKRU: 55555554
Feb 13 02:00:37 Trinity kernel: Call Trace:
Feb 13 02:00:37 Trinity kernel: <TASK>
Feb 13 02:00:37 Trinity kernel: ? __warn+0x99/0x11a
Feb 13 02:00:37 Trinity kernel: ? report_bug+0xd9/0x153
Feb 13 02:00:37 Trinity kernel: ? nvidia_dev_put_uuid+0x33/0x4a [nvidia]
Feb 13 02:00:37 Trinity kernel: ? handle_bug+0x53/0x7c
Feb 13 02:00:37 Trinity kernel: ? exc_invalid_op+0x13/0x60
Feb 13 02:00:37 Trinity kernel: ? asm_exc_invalid_op+0x16/0x20
Feb 13 02:00:37 Trinity kernel: ? nvidia_dev_put_uuid+0x33/0x4a [nvidia]
Feb 13 02:00:37 Trinity kernel: nvUvmInterfaceUnregisterGpu+0x28/0x2f [nvidia]
Feb 13 02:00:37 Trinity kernel: uvm_gpu_release_locked+0x27/0x36 [nvidia_uvm]
Feb 13 02:00:37 Trinity kernel: uvm_va_space_destroy+0x36e/0x3c7 [nvidia_uvm]
Feb 13 02:00:37 Trinity kernel: uvm_release.isra.0+0xe2/0x16e [nvidia_uvm]
Feb 13 02:00:37 Trinity kernel: uvm_release_entry.part.0.isra.0+0x47/0x7c [nvidia_uvm]
Feb 13 02:00:37 Trinity kernel: ? nvidia_close+0x19a/0x1e8 [nvidia]
Feb 13 02:00:37 Trinity kernel: ? kmem_cache_free+0x10a/0x14c
Feb 13 02:00:37 Trinity kernel: uvm_release_entry+0x22/0x29 [nvidia_uvm]
Feb 13 02:00:37 Trinity kernel: __fput+0x11c/0x207
Feb 13 02:00:37 Trinity kernel: task_work_run+0x68/0x80
Feb 13 02:00:37 Trinity kernel: do_exit+0x3af/0x90b
Feb 13 02:00:37 Trinity kernel: ? _raw_spin_unlock+0x14/0x29
Feb 13 02:00:37 Trinity kernel: ? futex_unqueue+0x44/0x54
Feb 13 02:00:37 Trinity kernel: do_group_exit+0x7a/0x7a
Feb 13 02:00:37 Trinity kernel: get_signal+0x6a1/0x6d9
Feb 13 02:00:37 Trinity kernel: arch_do_signal_or_restart+0x2a/0x224
Feb 13 02:00:37 Trinity kernel: exit_to_user_mode_prepare+0x53/0x108
Feb 13 02:00:37 Trinity kernel: syscall_exit_to_user_mode+0x14/0x1f
Feb 13 02:00:37 Trinity kernel: do_syscall_64+0x71/0x7b
Feb 13 02:00:37 Trinity kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
Feb 13 02:00:37 Trinity kernel: RIP: 0033:0x55b2c0169e43
Feb 13 02:00:37 Trinity kernel: Code: Unable to access opcode bytes at 0x55b2c0169e19.
Feb 13 02:00:37 Trinity kernel: RSP: 002b:000014f85c8f8ac0 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca
Feb 13 02:00:37 Trinity kernel: RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 000055b2c0169e43
Feb 13 02:00:37 Trinity kernel: RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000c000100f48
Feb 13 02:00:37 Trinity kernel: RBP: 000014f85c8f8b08 R08: 0000000000000000 R09: 0000000000000000
Feb 13 02:00:37 Trinity kernel: R10: 0000000000000000 R11: 0000000000000286 R12: 000014f85c8f8b18
Feb 13 02:00:37 Trinity kernel: R13: 0000000000000000 R14: 000000c000122540 R15: 0000000003ffffff
Feb 13 02:00:37 Trinity kernel: </TASK>
Feb 13 02:00:37 Trinity kernel: ---[ end trace 0000000000000000 ]---
...
Feb 13 02:00:38 Trinity kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:891)
Feb 13 02:00:38 Trinity kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
...
Feb 13 02:00:38 Trinity kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:891)
Feb 13 02:00:38 Trinity kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 13 02:00:38 Trinity kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:891)
Feb 13 02:00:38 Trinity kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

Quote

February 13, 20251 yr

2 hours ago, ich777 said:

@JorgeB have you seen such messages before?

It does look like a bad DIMM, system event or IPMI log may have additional information, if the board has one.

Quote

February 13, 20251 yr

10 hours ago, ich777 said:

...
I assume they have to do with your issue with the driver.

Thanks for the reply!

So, I've had this memory problem for many years. I believe the Chinese Xeon KIT I bought in December 2022 came with problematic memories, but since it didn't cause me any issues, I simply ignored it until a better time to buy new ones.

However, in June 2024, I bought the P2000, and when I did the first installation, I was able to access the drive settings without any issues.

At the end of 2024, I couldn't anymore, but I left it aside, since everything was working, and I could have transcoding functionality in docker.

I recently took it to better understand the issue, because I wanted to see if I can reduce the energy consumption...

So, finally, I decided to send the message here in the thread, because I don't know what else to do.

Could it still be related to my memories or motherboard?

Quote

February 13, 20251 yr

8 hours ago, JorgeB said:

It does look like a bad DIMM, system event or IPMI log may have additional information, if the board has one.

What's the best way to get this information? 🤔

Quote

February 13, 20251 yr

14 hours ago, ich777 said:

It seems that you have two Xid errors which you can learn about more here, however I've never seen two Xid errors at the same time.

Xid 79 is pretty useless since it could be everything (Firmware BUG, Temperature, GPU Error,...)

However from what I see in your syslog you also have a Kernel Trace which is probably related to a Docker container, from my perspective it looks like that a container causes this issue, do you run any machine learning or LLM on your system?

These are the relevant errors from your syslog:

Thanks for looking into this - yes I have ollama and its the only docker using the gpu.
To stress test the gpu I loaded gpu-burn docker container and did random 1,5,10,60 mins tests and no crashes observed. I also loaded several different llms in ollama randomly and no crashes. Seems ollama is crashing the gpu but not sure what is causing it and why it mostly happens when the appdata backup runs at night.

Quote

February 14, 20251 yr

Author

13 hours ago, Rafael said:

Could it still be related to my memories or motherboard?

It could be CPU (bad contact on the socket), Motherboard (bad trace), or RAM (bad memory).

13 hours ago, Rafael said:

What's the best way to get this information? 🤔

If your Motherboard has IPMI (built in KVM) then you would find the log there.

Quote

February 14, 20251 yr

Hi @ich777,

I'm experiencing a bug after updating from Unraid 6.12.14 to Unraid 7 involving an interaction between the Nerd Tools plugin and the Nvidia driver plugin where Nerd Tools interacts somehow with `nvidia-container-cli` resulting in it giving a permission denied error when it accesses `/proc/sys/kernel/overflowuid` while trying to start dockers that use nvidia gpus. The end result is that docker containers with passed-through nvidia gpus fail to start with the system (not container) logs showing:
`rc.docker: nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown` error message.

# Steps to reproduce:

1. Be in Unraid 6.12.x with NerdTools plugin and Nvidia driver plugin installed. Have dockers with an nvidia you passed through.
2. Upgrade from Unraid 6.12.x to Unraid 7.
3. Restart to complete the upgrade.
4. On restart, Unraid notifies you that the NerdTools plugin is not supported in Unraid 7 and auto-disables it. Remove the plugin using the webgui. At this point, docker containers with nvidia gpus passed through still work as expected.
5. Restart again.
6. On restart, docker containers with passed-through nvidia gpus fail to start with error message `rc.docker: nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown`

# Attempts to fix docker containers that fail to start:

1. Uninstall Nvidia drivers, restart, reinstall Nvidia drivers, disable and enable docker service, and finally restart again. This does not fix the issue.
2. Remove the affected docker container and reinstall it. This does not fix the issue.
3. Run Docker Safe New Perms. This does not fix the issue.
4. Panic. This does not fix the issue.

# Notes

Running `nvidia-smi` in the Unraid terminal still works as expected, showing a detected gpu. My gpu also still shows in the GPU Statistics plugin dashboard.

# Workaround:

1. Enable "show incompatible apps" in the appstore and reinstall the Nerd Tools plugin. The affected docker containers now start as expected.

The interesting part is that my Nerd Tools plugin is not doing anything - no packages are enabled, it is simply actively installed (screenshot)

This same issue was briefly mentioned by ich777 on the Unraid forums (here) but the cause was unclear.

# Reoccurring Issue:

On reboot, the issue occurs again and the same dockers fail to start with the same error message. In order to fix, I needed to uninstall and reinstall the Nerd Tools plugin again.

Any insight into a permanent fix for this would be greatly appreciated.

msaladunraid-diagnostics-20250214-1748.zip

Edited February 14, 20251 yr by msalad

Quote

February 15, 20251 yr

Author

9 hours ago, msalad said:

`rc.docker: nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown` error message.

I created an issue back then for this error on GitHub but it hasn't been an issue in a long time.

https://github.com/NVIDIA/nvidia-container-toolkit/issues/102

However I don't think that Nerd Tools is the culprit here and this is just a coincidence. Did you yet try to install another plugin and see if that fixes the issue too? Maybe its related so something completely different?

Do you use the your GPU in only one container or do you use it in multiple containers? If you are using it in multiple containers which containers?

Please also remove that script from your go file if you want to further troubleshoot since you are moving files for Docker around and maybe that's causing issues:

# -------------------------------------------------
# RAM-Disk for Docker json/log files
# -------------------------------------------------
# create RAM-Disk on starting the docker service
sed -i '/^  echo "starting \$BASE ..."$/i \
  # move json/logs to ram disk\
***line removed***
  mount -t tmpfs tmpfs /var/lib/docker/containers\
***line removed***
  logger -t docker RAM-Disk created' /etc/rc.d/rc.docker
# remove RAM-Disk on stopping the docker service
sed -i '/^  # tear down the bridge$/i \
  # backup json/logs and remove RAM-Disk\
***line removed***
  umount /var/lib/docker/containers\
***line removed***
  logger -t docker RAM-Disk removed' /etc/rc.d/rc.docker
# automatically backup RAM-Disk if Docker tab has been loaded (https://unix.stackexchange.com/a/198543/101920)
sed -i '/^<?PHP$/a \
if (file_exists("/var/lib/docker/containers")) {\
  exec("mkdir /var/lib/docker_bind");\
  exec("mount --bind /var/lib/docker /var/lib/docker_bind");\
  exec("rsync -aH --delete /var/lib/docker/containers/ /var/lib/docker_bind/containers");\
  exec("umount /var/lib/docker_bind");\
  exec("rmdir /var/lib/docker_bind");\
  exec("logger -t docker Created RAM-Disk backup");\
}' /usr/local/emhttp/plugins/dynamix.docker.manager/DockerContainers.page

Did you yet try another driver version? I just tested this exact driver version and the official Jellyfin container for example starts fine with the Nvidia Driver.

Quote

February 18, 20251 yr

I need to download and install Linux x64 (AMD64/EM64T) Display Driver 390.116 | Linux 64-bit to use with my NVIDIA NVS 4200M. I realize it is older but I would be using it to transcode MPEG-1, MPEG-2, VC-1, and H.264 which it is capable of doing. The NVIDIA Driver plugin's oldest version is v470.256.2. I am getting this error: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Could someone please assist me.

Quote

February 18, 20251 yr

Author

9 minutes ago, kpwillis said:

Could someone please assist me.

Please read the first recommended post on top of this thread.

Quote

February 19, 20251 yr

On 2/15/2025 at 3:10 AM, ich777 said:
I created an issue back then for this error on GitHub but it hasn't been an issue in a long time.

https://github.com/NVIDIA/nvidia-container-toolkit/issues/102

However I don't think that Nerd Tools is the culprit here and this is just a coincidence. Did you yet try to install another plugin and see if that fixes the issue too? Maybe its related so something completely different?

Do you use the your GPU in only one container or do you use it in multiple containers? If you are using it in multiple containers which containers?

Please also remove that script from your go file if you want to further troubleshoot since you are moving files for Docker around and maybe that's causing issues:
# -------------------------------------------------
# RAM-Disk for Docker json/log files
# -------------------------------------------------
# create RAM-Disk on starting the docker service
sed -i '/^  echo "starting \$BASE ..."$/i \
  # move json/logs to ram disk\
***line removed***
  mount -t tmpfs tmpfs /var/lib/docker/containers\
***line removed***
  logger -t docker RAM-Disk created' /etc/rc.d/rc.docker
# remove RAM-Disk on stopping the docker service
sed -i '/^  # tear down the bridge$/i \
  # backup json/logs and remove RAM-Disk\
***line removed***
  umount /var/lib/docker/containers\
***line removed***
  logger -t docker RAM-Disk removed' /etc/rc.d/rc.docker
# automatically backup RAM-Disk if Docker tab has been loaded (https://unix.stackexchange.com/a/198543/101920)
sed -i '/^<?PHP$/a \
if (file_exists("/var/lib/docker/containers")) {\
  exec("mkdir /var/lib/docker_bind");\
  exec("mount --bind /var/lib/docker /var/lib/docker_bind");\
  exec("rsync -aH --delete /var/lib/docker/containers/ /var/lib/docker_bind/containers");\
  exec("umount /var/lib/docker_bind");\
  exec("rmdir /var/lib/docker_bind");\
  exec("logger -t docker Created RAM-Disk backup");\
}' /usr/local/emhttp/plugins/dynamix.docker.manager/DockerContainers.page
Did you yet try another driver version? I just tested this exact driver version and the official Jellyfin container for example starts fine with the Nvidia Driver.

Thanks for taking a look at my post. I looked at the github issue but it seems like for that user, the problem went away by itself? I wish I was that lucky!

I removed that script in my go file and downgraded to nvidia driver 565.77 but the same issue occurs.

Quote

Do you use the your GPU in only one container or do you use it in multiple containers? If you are using it in multiple containers which containers?

I currently use my rtx 3060 ti in the following docker containers: whisper-asr-service, immich, plex.

Surprisingly, installing a different plugin (I randomly chose "AppData Cleanup") also fixes this issue - my containers (plex, immich, and whisper-asr-service) can start again as normal. I booted up Unraid, the containers failed to start with the same nvidia-container-cli error message, and then I installed the AppData Cleanup plugin (note I did not uninstall Nerd Tools this time, it is still currently installed). I then tried to start the containers again and they started up no problem. What the heck?

Edited February 19, 20251 yr by msalad
clarification

Quote

February 21, 20251 yr

I'm on unRaid 6.11.5, updating to 6.12.10, and I got the following toast notifications:
image.png.54ce822626faf221ac711c9aabdef6d3.png
I have not received the toast notification that it's safe to reboot yet.

I'm already on a higher driver version than 470.239.06, and I don't seem to have a corrupt plugin like some others in this thread have had. Here's a screenshot from my plugin settings page:

Any help would be appreciated.

Quote

February 21, 20251 yr

Author

2 hours ago, Eyeheartpie said:

Any help would be appreciated.

Uninstall the plugin, deactivate autostart from the array, install the plugin, enable autostrt from the array in the disk settings again and reboot again.

I would strongly recommend that you upgrade directly to 7.0.0

Please post your Diagnostics if you experiencing further issues.

Quote

February 21, 20251 yr

my journey with this same issue.

Ive been banging my head on my desk for a month trying to figure this out. Even tried to just use a laptop with plex and unraid on it as a stop gap. I see the same issues on the laptop... (has a 1080) Ive tried to blow away my build and start over (kept array) and still no luck. Ive been using unraid for about 8 years and i have neve had these issues and it has worked properly for years until now.

Edited February 21, 20251 yr by radly82

Quote

February 21, 20251 yr

3 hours ago, ich777 said:

Uninstall the plugin, deactivate autostart from the array, install the plugin, enable autostrt from the array in the disk settings again and reboot again.

Do I reboot in between deactivating autostart and then re-installing the plugin?

3 hours ago, ich777 said:

I would strongly recommend that you upgrade directly to 7.0.0

I already started the upgrade process to 6.12.10. Am I able to override this with an upgrade to 7 if I just run it again? Or am I stuck on the upgrade path to 6.12.10?

3 hours ago, ich777 said:

Please post your Diagnostics if you experiencing further issues.

EDIT: Did the above process with a reboot after uninstalling the plugin and deactivating the autostart. Seems to be working as expected now. I'll keep an eye on it when I move up to the 7.0.0 to see if I have similar issues.

Edited February 21, 20251 yr by Eyeheartpie

Quote

February 22, 20251 yr

Author

3 hours ago, Eyeheartpie said:

Do I reboot in between deactivating autostart and then re-installing the plugin?

Just do it in the order that I said above.

3 hours ago, Eyeheartpie said:

I already started the upgrade process to 6.12.10. Am I able to override this with an upgrade to 7 if I just run it again? Or am I stuck on the upgrade path to 6.12.10?

I‘m not sure but you can upgrade to 6.12.10 and then to 7.0.0

The upgrade process should just work fine from 6.12.10 to 7.0.0 as long as you wait for the notification that it is safe to reboot from the plugin update helper.

Hope that helps.

Quote

February 22, 20251 yr

Author

4 hours ago, radly82 said:

my journey with this same issue.

What are you referring to?

However I already answered here:

Quote

February 22, 20251 yr

Hey @ich777, do you have anymore insight into my nvidia-container-cli issue above? It now does seem unrelated to Nerd Tools plugin since I found that I can install a different plugin to fix the issue too. I'm just not sure where to go from here. Having to install/uninstall a plugin after every reboot and then manually start the affected containers isn't ideal for long term

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Replies 5.9k
Views 1m
Created 5 yr5 yr
Last Reply Jun 28Jun 28

[Plugin] Nvidia-Driver

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

ich777

ich777

ich777

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

ich777

ich777

ich777

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)