[Plugin] Nvidia-Driver

April 21Apr 21

Author

10 hours ago, ranova said:
Failed starting container: failed to create task for container: failed
to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init: error
running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected
mode as 'legacy'
nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with
error code: 1: unknown

Do you have Diagnostics? Otherwise I'm not able to help.

What are your container startup parameters so to speak what is your docker run command?

Quote

April 21Apr 21

I would like to confirm I am having the same issues as others expressed in the thread, some of my docker containers still work with the latest Nvidia open source drives like emby. But steam-headless and ollama refuse to start, stating the exact same logs like previously mentioned.

I will attempt to downgrade my driver's. Because this happened for me once I updated both steam-headless and ollama to their latest docker image. Emby has not had a recent update and I assume that's why it's still working.

And unfortunately I don't have a current backup of my docker images. 😞

Quote

April 21Apr 21

9 hours ago, ich777 said:
Do you have Diagnostics? Otherwise I'm not able to help.
What are your container startup parameters so to speak what is your docker run command?

I have the same issue.

I was using the following command in Unraid terminal to check if the Nvidia driver was working...
```docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi```

Output:
```docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'

nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 1

Run 'docker run --help' for more information```

Quote

April 21Apr 21

Author

34 minutes ago, Mrtj18 said:
And unfortunately I don't have a current backup of my docker images. 😞

30 minutes ago, eyesfit said:
I have the same issue.

Please guys, you've all read the previous posts, where are your Diagnostics?

It is pretty useless to say I have the same issue when you post no diagnostics at all. I even don't know on which Unraid version you are nor do I know what exact driver you all on.

Do you all use --gpus all?
Please remove that variable and add the variables like mentioned in the second post.

Quote

1
1

April 21Apr 21

2 hours ago, ich777 said:
Please guys, you've all read the previous posts, where are your Diagnostics?
It is pretty useless to say I have the same issue when you post no diagnostics at all. I even don't know on which Unraid version you are nor do I know what exact driver you all on.
Do you all use --gpus all?
Please remove that variable and add the variables like mentioned in the second post.

I can confirm after re-reading your posts and checking the docker container setup I had for ollama. I had the --gpus=all flag in the extra parameters section. Thus causing ollama not to start. As soon as I placed what you mentioned in the 2nd post of this thread. the --runtime=nvidia flag there, the docker started.

I also removed all --gpus=all flags from other containers I have. As well as placed the actual GPU-ID in the nvidia visable devices section as well. I had All listed there.

My steam-headless issue is an issue with the container itself. It does not like to play well with the latest nvidia drivers for some reason. I see other reports of users downgrading to 580.142 driver. I will just possibly wait on a new update to the container. Or try and find the previous docker image release name, because I didnt have the issue before upgrading.

So sorry, user error once again. lol

Quote

April 22Apr 22

Hi,

I wanted to upgrade my system so I could use the models locally directly on my server. To do this, I installed an RTX 5070 TI. I followed a lot of the instructions I saw in this thread, such as uninstalling and reinstalling the NVIDIA driver. I even spoke with Perplexity, who said that my driver isn't loading

“The driver loads, but it fails when initializing the card because Unraid assigns it an invalid PCIe region (This PCI I/O region assigned to your NVIDIA device is invalid, probe ... failed with error -1).”

tower-diagnostics-20260422-1116.zip

Edited April 22Apr 22 by Nono@Server

Quote

April 22Apr 22

4 hours ago, Nono@Server said:
Hi,
I wanted to upgrade my system so I could use the models locally directly on my server. To do this, I installed an RTX 5070 TI. I followed a lot of the instructions I saw in this thread, such as uninstalling and reinstalling the NVIDIA driver. I even spoke with Perplexity, who said that my driver isn't loading
“The driver loads, but it fails when initializing the card because Unraid assigns it an invalid PCIe region (This PCI I/O region assigned to your NVIDIA device is invalid, probe ... failed with error -1).”

tower-diagnostics-20260422-1116.zip

you need to boot in UEFI mode, youre currently using legacy. Also check that Resizable BAR Support and Above 4G Decoding is enabled

Quote

1

April 22Apr 22

26 minutes ago, Mainfrezzer said:
you need to boot in UEFI mode, youre currently using legacy. Also check that Resizable BAR Support and Above 4G Decoding is enabled

I tried booting into UEFI, but it just takes me straight back to the BIOS, and I don't know why. I don't think I had this problem before I replaced my motherboard at the same time as my graphics card.

Maybe I don't have UEFI on the Unraid USB drive.

Quote

April 22Apr 22

5 minutes ago, Nono@Server said:
I tried booting into UEFI, but it just takes me straight back to the BIOS, and I don't know why. I don't think I had this problem before I replaced my motherboard at the same time as my graphics card.
Maybe I don't have UEFI on the Unraid USB drive.

You can either click this checkbox in the unraid webgui

or rename the EFI- folder to EFI

Im gonna assume now that the motherboard is set to boot uefi/bios at the same time

Quote

April 22Apr 22

54 minutes ago, Mainfrezzer said:
You can either click this checkbox in the unraid webgui

or rename the EFI- folder to EFI

Im gonna assume now that the motherboard is set to boot uefi/bios at the same time

Thank you. After allowing the USB drive to boot in UEFI mode, the card is now detected properly.

Quote

1

April 22Apr 22

For me this is driver version related, specifically since 595.58.03 so I assume it's an Nvidia problem not the driver plugin

I have Intel integrated GPU along with Nvidia RTX 4060

The 4060 is passed to a gaming VM when it's started and handed back to host when the VM stops so power management persists

by use of the following script in /etc/libvirt/hooks/qem

#!/bin/bash

if [ "$2" == "prepare" ]
then
killall nvidia-persistenced
elif [ "$2" == "release" ]
then
nvidia-persistenced

echo auto | tee /sys/bus/pci/devices/????:??:??.?/power/control

fi

exit 0

This worked fine until updating the driver to 595.58.03. At which point I started seeing the following when shutting the VM down. My limited understanding, the host was trying to take back control of the audio part of the 4060 before the VM had released it

Apr 15 09:24:07 myunraid kernel: vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
Apr 15 09:24:07 myunraid kernel: nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
Apr 15 09:24:07 myunraid kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Apr 15 09:24:10 myunraid kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 2
Apr 15 09:24:10 myunraid kernel: nvidia 0000:01:00.0: vgaarb: deactivate vga console
Apr 15 09:24:10 myunraid kernel: fbcon: nvidia-drmdrmfb (fb0) is primary device
Apr 15 09:24:10 myunraid kernel: BUG: kernel NULL pointer dereference, address: 000000000000001c
Apr 15 09:24:10 myunraid kernel: #PF: supervisor read access in kernel mode
Apr 15 09:24:10 myunraid kernel: #PF: error_code(0x0000) - not-present page
Apr 15 09:24:10 myunraid kernel: PGD 0 P4D 0 
Apr 15 09:24:10 myunraid kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Apr 15 09:24:10 myunraid kernel: CPU: 7 UID: 0 PID: 2118002 Comm: qemu-event Tainted: P     U     O       6.12.54-Unraid #1
Apr 15 09:24:10 myunraid kernel: Tainted: P]=PROPRIETARY_MODULE, [U]=USER, [O]=OOT_MODULE
Apr 15 09:24:10 myunraid kernel: Hardware name: Micro-Star International Co., Ltd. MS-7E13/MAG B760M MORTAR WIFI II (MS-7E13), BIOS 1.B0 03/04/2026
Apr 15 09:24:10 myunraid kernel: RIP: 0010:nv_audio_dynamic_power+0x8f/0x110 [nvidia]
Apr 15 09:24:10 myunraid kernel: Code: c0 74 7c f6 80 a4 01 00 00 10 75 73 48 8b 80 40 01 00 00 48 85 c0 74 67 48 8b 90 a8 01 00 00 48 05 a0 01 00 00 48 39 c2 74 55 <83> 7a 1c 03 75 10 48 8b 7a 20 48 83 bf 40 03 00 00 00 75 08 eb 3f
Apr 15 09:24:10 myunraid kernel: RSP: 0018:ffffc9000b126b98 EFLAGS: 00010207
Apr 15 09:24:10 myunraid kernel: RAX: ffff8881404d39a0 RBX: 0000000000000000 RCX: ffff888101570bb8
Apr 15 09:24:10 myunraid kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881017fc100
Apr 15 09:24:10 myunraid kernel: RBP: ffff888161c85bf0 R08: ffff8881017fb000 R09: 00000000ffffffff
Apr 15 09:24:10 myunraid kernel: R10: 00000000ffffffff R11: ffffffffa65b06a8 R12: ffff888161c85c50
Apr 15 09:24:10 myunraid kernel: R13: ffffffffa17c3dc0 R14: ffff888345c36820 R15: ffff888161c85d50
Apr 15 09:24:10 myunraid kernel: FS:  0000152e70eff6c0(0000) GS:ffff88885f5c0000(0000) knlGS:0000000000000000 .........

To remedy this I changed the hooks script to

#!/bin/bash
if [ "$2" == "prepare" ]; then
    killall nvidia-persistenced 2>/dev/null
elif [ "$2" == "release" ]; then
    nvidia-persistenced
fi
exit 0

and created /boot/config/modprobe.d/nvidia.conf containing "options nvidia NVreg DynamicPowerManagement=0x00" without ""

And appended boot option in SysLinux Configuration "nvidia_drm.fbdev=0" without ""

This worked fine when the VM was shutdown. However, when I attempt to stop all containers binhex-krusader (I think) was somehow holding onto the 4060 as below:

Apr 22 17:00:28 myunraid kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 2
Apr 22 17:00:28 myunraid nvidia-persistenced: Started (37875)
Apr 22 17:01:06 myunraid kernel: docker0: port 4(vethdb03243) entered disabled state
Apr 22 17:01:06 myunraid kernel: veth88185cb: renamed from eth0
Apr 22 17:01:07 myunraid kernel: docker0: port 4(vethdb03243) entered disabled state
Apr 22 17:01:07 myunraid kernel: vethdb03243 (unregistering): left allmulticast mode
Apr 22 17:01:07 myunraid kernel: vethdb03243 (unregistering): left promiscuous mode
Apr 22 17:01:07 myunraid kernel: docker0: port 4(vethdb03243) entered disabled state
Apr 22 17:01:07 myunraid kernel: docker0: port 1(veth200eb14) entered disabled state
Apr 22 17:01:07 myunraid kernel: vethb8b2976: renamed from eth0
Apr 22 17:01:07 myunraid kernel: docker0: port 1(veth200eb14) entered disabled state
Apr 22 17:01:07 myunraid kernel: veth200eb14 (unregistering): left allmulticast mode
Apr 22 17:01:07 myunraid kernel: veth200eb14 (unregistering): left promiscuous mode
Apr 22 17:01:07 myunraid kernel: docker0: port 1(veth200eb14) entered disabled state
Apr 22 17:01:09 myunraid kernel: veth9c6c9dd: renamed from eth0
Apr 22 17:01:09 myunraid kernel: docker0: port 3(vethcdc26e9) entered disabled state
Apr 22 17:01:09 myunraid kernel: veth3146c42: renamed from eth0
Apr 22 17:01:09 myunraid kernel: docker0: port 3(vethcdc26e9) entered disabled state
Apr 22 17:01:09 myunraid kernel: vethcdc26e9 (unregistering): left allmulticast mode
Apr 22 17:01:09 myunraid kernel: vethcdc26e9 (unregistering): left promiscuous mode
Apr 22 17:01:09 myunraid kernel: docker0: port 3(vethcdc26e9) entered disabled state
Apr 22 17:01:10 myunraid kernel: br-ba13ce51d72c: port 4(vethadd7090) entered disabled state
Apr 22 17:01:10 myunraid kernel: veth899b005: renamed from eth0
Apr 22 17:01:10 myunraid kernel: br-ba13ce51d72c: port 4(vethadd7090) entered disabled state
Apr 22 17:01:10 myunraid kernel: vethadd7090 (unregistering): left allmulticast mode
Apr 22 17:01:10 myunraid kernel: vethadd7090 (unregistering): left promiscuous mode
Apr 22 17:01:10 myunraid kernel: br-ba13ce51d72c: port 4(vethadd7090) entered disabled state
Apr 22 17:01:10 myunraid kernel: br-ba13ce51d72c: port 3(veth09f6b7d) entered disabled state
Apr 22 17:01:10 myunraid kernel: veth8511e99: renamed from eth0
Apr 22 17:01:10 myunraid kernel: br-ba13ce51d72c: port 3(veth09f6b7d) entered disabled state
Apr 22 17:01:10 myunraid kernel: veth09f6b7d (unregistering): left allmulticast mode
Apr 22 17:01:10 myunraid kernel: veth09f6b7d (unregistering): left promiscuous mode
Apr 22 17:01:10 myunraid kernel: br-ba13ce51d72c: port 3(veth09f6b7d) entered disabled state
Apr 22 17:01:11 myunraid kernel: br-ba13ce51d72c: port 2(vethc9c0771) entered disabled state
Apr 22 17:01:11 myunraid kernel: vetha24d22e: renamed from eth0
Apr 22 17:01:11 myunraid kernel: br-ba13ce51d72c: port 2(vethc9c0771) entered disabled state
Apr 22 17:01:11 myunraid kernel: vethc9c0771 (unregistering): left allmulticast mode
Apr 22 17:01:11 myunraid kernel: vethc9c0771 (unregistering): left promiscuous mode
Apr 22 17:01:11 myunraid kernel: br-ba13ce51d72c: port 2(vethc9c0771) entered disabled state
Apr 22 17:01:11 myunraid kernel: br-ba13ce51d72c: port 1(veth7559afd) entered disabled state
Apr 22 17:01:11 myunraid kernel: vethe345a06: renamed from eth0
Apr 22 17:01:11 myunraid kernel: br-ba13ce51d72c: port 1(veth7559afd) entered disabled state
Apr 22 17:01:11 myunraid kernel: veth7559afd (unregistering): left allmulticast mode
Apr 22 17:01:11 myunraid kernel: veth7559afd (unregistering): left promiscuous mode
Apr 22 17:01:11 myunraid kernel: br-ba13ce51d72c: port 1(veth7559afd) entered disabled state
Apr 22 17:01:11 myunraid kernel: Oops: general protection fault, probably for non-canonical address 0x7369645f766564b6: 0000 [#1] PREEMPT SMP NOPTI
Apr 22 17:01:11 myunraid kernel: CPU: 6 UID: 99 PID: 24821 Comm: Xvnc Tainted: P     U     O       6.12.54-Unraid #1
Apr 22 17:01:11 myunraid kernel: Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [O]=OOT_MODULE
Apr 22 17:01:11 myunraid kernel: Hardware name: Micro-Star International Co., Ltd. MS-7E13/MAG B760M MORTAR WIFI II (MS-7E13), BIOS 1.B0 03/04/2026
Apr 22 17:01:11 myunraid kernel: RIP: 0010:_nv000022kms+0xaa/0xbb0 [nvidia_modeset]
Apr 22 17:01:11 myunraid kernel: Code: 00 83 c1 01 48 05 78 0e 00 00 83 f9 04 75 bc be 01 00 00 00 bf 40 52 00 00 e8 32 02 f9 ff 49 89 c5 48 85 c0 0f 84 9e 05 00 00 <41> 8b 47 40 49 8d b5 20 0c 00 00 31 d2 45 31 f6 48 89 75 a8 41 89
Apr 22 17:01:11 myunraid kernel: RSP: 0000:ffffc90021f97960 EFLAGS: 00010282
Apr 22 17:01:11 myunraid kernel: RAX: ffffc90000c56008 RBX: ffff8881c6bdc000 RCX: 0000000000000000
Apr 22 17:01:11 myunraid kernel: RDX: 0000000000005248 RSI: 0000000000000000 RDI: ffffc90000c5b248
Apr 22 17:01:11 myunraid kernel: RBP: ffffc90021f979d0 R08: 000000000000000c R09: ffffc90000c56000
Apr 22 17:01:11 myunraid kernel: R10: 0000000000000006 R11: ffff888197974070 R12: ffffffffa06d7a80
Apr 22 17:01:11 myunraid kernel: R13: ffffc90000c56008 R14: 0000000000000000 R15: 7369645f76656476
Apr 22 17:01:11 myunraid kernel: FS:  0000000000000000(0000) GS:ffff88885f580000(0000) knlGS:0000000000000000
Apr 22 17:01:11 myunraid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 22 17:01:11 myunraid kernel: CR2: 00001474ae97e6b0 CR3: 0000000005618002 CR4: 0000000000772ef0
Apr 22 17:01:11 myunraid kernel: PKRU: 55555554
Apr 22 17:01:11 myunraid kernel: Call Trace:
Apr 22 17:01:11 myunraid kernel: <TASK>
Apr 22 17:01:11 myunraid kernel: ? prep_new_page+0x32/0x50
Apr 22 17:01:11 myunraid kernel: ? get_page_from_freelist+0x803/0x950
Apr 22 17:01:11 myunraid kernel: ? nv_drm_atomic_apply_modeset_config+0x41a/0x570 [nvidia_drm]
Apr 22 17:01:11 myunraid kernel: ? drm_atomic_check_only+0x6f3/0x820 [drm]
Apr 22 17:01:11 myunraid kernel: ? nv_drm_atomic_state_alloc+0x1b/0x60 [nvidia_drm]
Apr 22 17:01:11 myunraid kernel: ? lruvec_stat_mod_folio.constprop.0+0x10/0x20
Apr 22 17:01:11 myunraid kernel: ? drm_atomic_commit+0x6b/0xd0 [drm]
Apr 22 17:01:11 myunraid kernel: ? __pfx___drm_printfn_info+0x10/0x10 [drm]
Apr 22 17:01:11 myunraid kernel: ? __pfx___drm_printfn_info+0x10/0x10 [drm]
Apr 22 17:01:11 myunraid kernel: ? nv_drm_revoke_modeset_permission+0x1a0/0x220 [nvidia_drm]
Apr 22 17:01:11 myunraid kernel: ? kmem_cache_free_bulk+0x197/0x1f0
Apr 22 17:01:11 myunraid kernel: ? nv_drm_master_drop+0x2f/0x1c0 [nvidia_drm]
Apr 22 17:01:11 myunraid kernel: ? __pfx_drm_gem_object_release_handle+0x10/0x10 [drm]
Apr 22 17:01:11 myunraid kernel: ? drm_drop_master+0x18/0x30 [drm]
Apr 22 17:01:11 myunraid kernel: ? drm_master_release+0x57/0xa0 [drm]
Apr 22 17:01:11 myunraid kernel: ? drm_file_free+0x182/0x1e0 [drm]
Apr 22 17:01:11 myunraid kernel: ? drm_release+0x5c/0xa0 [drm]
Apr 22 17:01:11 myunraid kernel: ? __fput+0x10a/0x1d0
Apr 22 17:01:11 myunraid kernel: ? task_work_run+0x68/0x80
Apr 22 17:01:11 myunraid kernel: ? do_exit+0x36f/0x8d0
Apr 22 17:01:11 myunraid kernel: ? mtree_range_walk+0xf3/0x170
Apr 22 17:01:11 myunraid kernel: ? do_group_exit+0x79/0x80
Apr 22 17:01:11 myunraid kernel: ? get_signal+0x617/0x660
Apr 22 17:01:11 myunraid kernel: ? arch_do_signal_or_restart+0x28/0x1d0
Apr 22 17:01:11 myunraid kernel: ? fatal_signal_pending+0x9/0x30
Apr 22 17:01:11 myunraid kernel: ? fault_signal_pending+0x1b/0x60
Apr 22 17:01:11 myunraid kernel: ? do_user_addr_fault+0x2c0/0x490
Apr 22 17:01:11 myunraid kernel: ? irqentry_exit_to_user_mode+0x49/0x80
Apr 22 17:01:11 myunraid kernel: ? asm_exc_page_fault+0x22/0x30
Apr 22 17:01:11 myunraid kernel: </TASK>
Apr 22 17:01:11 myunraid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 vhost_net tun vhost vhost_iotlb tap ipvlan br_netfilter nf_conntrack_netlink xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo ip6table_nat iptable_nat nf_nat nvidia_uvm(PO) af_packet nfnetlink dm_crypt dm_mod md_mod zfs(PO) spl(O) ntfs3 tcp_diag inet_diag i915(O) intel_gtt iptable_mangle xt_addrtype iptable_raw xt_comment xt_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_mark ip6table_mangle ip6table_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc atlantic r8125(O) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl intel_cstate mei_pxp mei_hdcp iwlmvm nvidia_drm(PO)
Apr 22 17:01:11 myunraid kernel: nvidia_modeset(PO) xe mac80211 btusb btrtl drm_gpuvm btbcm libarc4 btintel drm_exec nvidia(PO) iwlwifi gpu_sched drm_buddy bluetooth i2c_algo_bit drm_suballoc_helper drm_display_helper drm_ttm_helper ttm sr_mod input_leds drm_kms_helper ecdh_generic cdrom cfg80211 led_class ecc joydev drm intel_uncore wmi_bmof mxm_wmi i2c_i801 mei_me i2c_smbus mei rfkill agpgart i2c_core thermal fan video wmi backlight acpi_tad acpi_pad button [last unloaded: atlantic]
Apr 22 17:01:11 myunraid kernel: ---[ end trace 0000000000000000 ]---
Apr 22 17:01:11 myunraid kernel: pstore: backend (efi_pstore) writing error (-28)
Apr 22 17:01:11 myunraid kernel: RIP: 0010:_nv000022kms+0xaa/0xbb0 [nvidia_modeset]
Apr 22 17:01:11 myunraid kernel: Code: 00 83 c1 01 48 05 78 0e 00 00 83 f9 04 75 bc be 01 00 00 00 bf 40 52 00 00 e8 32 02 f9 ff 49 89 c5 48 85 c0 0f 84 9e 05 00 00 <41> 8b 47 40 49 8d b5 20 0c 00 00 31 d2 45 31 f6 48 89 75 a8 41 89
Apr 22 17:01:11 myunraid kernel: RSP: 0000:ffffc90021f97960 EFLAGS: 00010282
Apr 22 17:01:11 myunraid kernel: RAX: ffffc90000c56008 RBX: ffff8881c6bdc000 RCX: 0000000000000000
Apr 22 17:01:11 myunraid kernel: RDX: 0000000000005248 RSI: 0000000000000000 RDI: ffffc90000c5b248
Apr 22 17:01:11 myunraid kernel: RBP: ffffc90021f979d0 R08: 000000000000000c R09: ffffc90000c56000
Apr 22 17:01:11 myunraid kernel: R10: 0000000000000006 R11: ffff888197974070 R12: ffffffffa06d7a80
Apr 22 17:01:11 myunraid kernel: R13: ffffc90000c56008 R14: 0000000000000000 R15: 7369645f76656476
Apr 22 17:01:11 myunraid kernel: FS:  0000000000000000(0000) GS:ffff88885f580000(0000) knlGS:0000000000000000
Apr 22 17:01:11 myunraid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 22 17:01:11 myunraid kernel: CR2: 00001474ae97e6b0 CR3: 00000002aaf52006 CR4: 0000000000772ef0
Apr 22 17:01:11 myunraid kernel: PKRU: 55555554
Apr 22 17:01:11 myunraid kernel: Fixing recursive fault but reboot is needed!
Apr 22 17:01:11 myunraid kernel: BUG: using smp_processor_id() in preemptible [00000000] code: Xvnc/24821
Apr 22 17:01:11 myunraid kernel: caller is __schedule+0x2d/0x760
Apr 22 17:01:11 myunraid kernel: CPU: 6 UID: 99 PID: 24821 Comm: Xvnc Tainted: P     UD    O       6.12.54-Unraid #1
Apr 22 17:01:11 myunraid kernel: Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [D]=DIE, [O]=OOT_MODULE
Apr 22 17:01:11 myunraid kernel: Hardware name: Micro-Star International Co., Ltd. MS-7E13/MAG B760M MORTAR WIFI II (MS-7E13), BIOS 1.B0 03/04/2026
Apr 22 17:01:11 myunraid kernel: Call Trace:
Apr 22 17:01:11 myunraid kernel: <TASK>
Apr 22 17:01:11 myunraid kernel: dump_stack_lvl+0x47/0x70
Apr 22 17:01:11 myunraid kernel: check_preemption_disabled+0xb7/0xd0
Apr 22 17:01:11 myunraid kernel: __schedule+0x2d/0x760
Apr 22 17:01:11 myunraid kernel: ? _printk+0x58/0x80
Apr 22 17:01:11 myunraid kernel: do_task_dead+0x3e/0x40
Apr 22 17:01:11 myunraid kernel: make_task_dead+0xfc/0x110
Apr 22 17:01:11 myunraid kernel: rewind_stack_and_make_dead+0x16/0x20
Apr 22 17:01:11 myunraid kernel: RIP: 0033:0x1474aefff6c0
Apr 22 17:01:11 myunraid kernel: Code: Unable to access opcode bytes at 0x1474aefff696.
Apr 22 17:01:11 myunraid kernel: RSP: 002b:00007ffcd73ac978 EFLAGS: 00010246
Apr 22 17:01:11 myunraid kernel: RAX: 00001474aefff6c0 RBX: 00001474b9f08fa8 RCX: 0000000000000001
Apr 22 17:01:11 myunraid kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 00001474b635dc00
Apr 22 17:01:11 myunraid kernel: RBP: 00007ffcd73ac9d0 R08: 00001474b635dc00 R09: 0000000000000007
Apr 22 17:01:11 myunraid kernel: R10: 000055dac13cbf80 R11: 1556005f37d461e2 R12: 000000000000096c
Apr 22 17:01:11 myunraid kernel: R13: 0000000000000000 R14: 00001474b9f07680 R15: 000055dac13c83f0
Apr 22 17:01:11 myunraid kernel: </TASK>
Apr 22 17:01:11 myunraid kernel: BUG: scheduling while atomic: Xvnc/24821/0x00000000
Apr 22 17:01:11 myunraid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 vhost_net tun vhost vhost_iotlb tap ipvlan br_netfilter nf_conntrack_netlink xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo ip6table_nat iptable_nat nf_nat nvidia_uvm(PO) af_packet nfnetlink dm_crypt dm_mod md_mod zfs(PO) spl(O) ntfs3 tcp_diag inet_diag i915(O) intel_gtt iptable_mangle xt_addrtype iptable_raw xt_comment xt_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_mark ip6table_mangle ip6table_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc atlantic r8125(O) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl intel_cstate mei_pxp mei_hdcp iwlmvm nvidia_drm(PO)
Apr 22 17:01:11 myunraid kernel: nvidia_modeset(PO) xe mac80211 btusb btrtl drm_gpuvm btbcm libarc4 btintel drm_exec nvidia(PO) iwlwifi gpu_sched drm_buddy bluetooth i2c_algo_bit drm_suballoc_helper drm_display_helper drm_ttm_helper ttm sr_mod input_leds drm_kms_helper ecdh_generic cdrom cfg80211 led_class ecc joydev drm intel_uncore wmi_bmof mxm_wmi i2c_i801 mei_me i2c_smbus mei rfkill agpgart i2c_core thermal fan video wmi backlight acpi_tad acpi_pad button [last unloaded: atlantic]
Apr 22 17:01:11 myunraid kernel: Preemption disabled at:
Apr 22 17:01:11 myunraid kernel: [<0000000000000000>] 0x0
Apr 22 17:01:11 myunraid kernel: CPU: 6 UID: 99 PID: 24821 Comm: Xvnc Tainted: P     UD    O       6.12.54-Unraid #1
Apr 22 17:01:11 myunraid kernel: Tainted: [P]=PROPRIETARY_MODULE, [U]=USER, [D]=DIE, [O]=OOT_MODULE
Apr 22 17:01:11 myunraid kernel: Hardware name: Micro-Star International Co., Ltd. MS-7E13/MAG B760M MORTAR WIFI II (MS-7E13), BIOS 1.B0 03/04/2026
Apr 22 17:01:11 myunraid kernel: Call Trace:
Apr 22 17:01:11 myunraid kernel: <TASK>
Apr 22 17:01:11 myunraid kernel: dump_stack_lvl+0x47/0x70
Apr 22 17:01:11 myunraid kernel: __schedule_bug+0x85/0xa0
Apr 22 17:01:11 myunraid kernel: __schedule+0x63/0x760
Apr 22 17:01:11 myunraid kernel: ? _printk+0x58/0x80
Apr 22 17:01:11 myunraid kernel: do_task_dead+0x3e/0x40
Apr 22 17:01:11 myunraid kernel: make_task_dead+0xfc/0x110
Apr 22 17:01:11 myunraid kernel: rewind_stack_and_make_dead+0x16/0x20
Apr 22 17:01:11 myunraid kernel: RIP: 0033:0x1474aefff6c0
Apr 22 17:01:11 myunraid kernel: Code: Unable to access opcode bytes at 0x1474aefff696.
Apr 22 17:01:11 myunraid kernel: RSP: 002b:00007ffcd73ac978 EFLAGS: 00010246
Apr 22 17:01:11 myunraid kernel: RAX: 00001474aefff6c0 RBX: 00001474b9f08fa8 RCX: 0000000000000001
Apr 22 17:01:11 myunraid kernel: RDX: 0000000000000003 RSI: 0000000000000000 RDI: 00001474b635dc00
Apr 22 17:01:11 myunraid kernel: RBP: 00007ffcd73ac9d0 R08: 00001474b635dc00 R09: 0000000000000007
Apr 22 17:01:11 myunraid kernel: R10: 000055dac13cbf80 R11: 1556005f37d461e2 R12: 000000000000096c
Apr 22 17:01:11 myunraid kernel: R13: 0000000000000000 R14: 00001474b9f07680 R15: 000055dac13c83f0
Apr 22 17:01:11 myunraid kernel: </TASK>

Rolled the driver back to 590.48.01 and all is well in my world again

I've attached diagnostics taken with each driver installed

myunraid-diagnostics-20260422-1702-Nvidia595.58.03.zip myunraid-diagnostics-20260422-1647-Nvidia590.48.01 .zip

Edited April 22Apr 22 by fka

Quote

April 22Apr 22

On 4/21/2026 at 8:39 AM, ich777 said:
Please guys, you've all read the previous posts, where are your Diagnostics?
It is pretty useless to say I have the same issue when you post no diagnostics at all. I even don't know on which Unraid version you are nor do I know what exact driver you all on.
Do you all use --gpus all?
Please remove that variable and add the variables like mentioned in the second post.

Thank you for your work and that information, I guess I was using some legacy nvidia docker commands:

services:

llama-swap:

image: ghcr.io/mostlygeek/llama-swap:cuda13

container_name: llama-swap-cuda

restart: unless-stopped

ports:

- "8887:8080"

volumes:

- /mnt/user/AI/models:/models

environment:

- NVIDIA_VISIBLE_DEVICES=all

- NVIDIA_DRIVER_CAPABILITIES=all

- LD_LIBRARY_PATH=/custom-bin/bin:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu

- GGML_CUDA_FORCE_FA3=1

- GGML_CUDA_GRAPH_OPT=1

deploy:

resources:

reservations:

devices:

- driver: nvidia

count: all

capabilities: [gpu]

entrypoint: ["/app/llama-swap"]

command: ["--config", "/app/config.yaml", "--listen", "0.0.0.0:8080"]

Updated it to the following and its now working again:

version: "3.8"

services:

llama-swap:

image: ghcr.io/mostlygeek/llama-swap:cuda13

container_name: llama-swap-cuda

restart: unless-stopped

runtime: nvidia

ports:

- "8887:8080"

volumes:

- /mnt/samsungevo/AI/models:/models

environment:

- NVIDIA_VISIBLE_DEVICES=GPU-xxx-xxx-xxx-xxxx-xxxx,GPU-xxx-xxx-xxx-xxxx-xxxx

- NVIDIA_DRIVER_CAPABILITIES=all

- LD_LIBRARY_PATH=/custom-bin/bin:/custom-bin-turbo/bin:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu

- GGML_CUDA_FORCE_FA3=1

- GGML_CUDA_GRAPH_OPT=1

entrypoint: ["/app/llama-swap"]

command: ["--config", "/app/config.yaml", "--listen", "0.0.0.0:8080"]

Quote

1

April 23Apr 23

I just updated to RC v7.3 and I ran into what looks like a similar issue @ich777 found here https://github.com/NVIDIA/nvidia-container-toolkit/issues/1385#issuecomment-3497279007

All my GPU containers with --gpus=all were failing before startup with errors like:

error running prestart hook #0: exit status 1
Auto-detected mode as 'legacy'
nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 1

With toolkit debug enabled, the specific error was:

could not start /sbin/ldconfig: process confinement failed: invalid argument

In my case, a fix was to add:

[nvidia-container-cli]
no-pivot = true

to /etc/nvidia-container-runtime/config.toml, then restart Docker.

Specifically: no-pivot = true needed to be under [nvidia-container-cli] .

After that and restarting docker, my GPU containers appear to be working again (until reboot since that's not persisted)

For context, I had originally hit this after updating to RV 7.3, then even after rolling Unraid back and changing driver versions the issue was still there until I applied the no-pivot fix. Originally posted here: https://discord.com/channels/216281096667529216/1440856243156680757/1496671614107127879

Versions:

Driver: 595.58.03 (Also tried 590)

nvidia-container-cli: 1.19.0~1.19.0

nvidia-container-runtime: 1.19.0

EDIT: It seems this is the related commit
https://github.com/unraid/nvidia-container-toolkit/commit/5085b229cfc356d01eca4823feb6dae7e9afbf49

I believe no-pivot = true should be under [nvidia-container-cli] not [nvidia-container-runtime]?

Looking at @zoggy 's docs post it seems:

--gpus all alone on Docker → Docker directly triggers the hook path → that is the legacy-compatible path. There was a bug in the config for this path, but it works after fixing no-pivot = true

--gpus all --runtime=nvidia → Docker explicitly uses the NVIDIA runtime, which is NVIDIA’s recommended Docker invocation. Avoids the bug above

gilfoyle-diagnostics-20260422_1709.zip

Edited April 23Apr 23 by pureelectricity
Found the GH Commit with the issue & updated with zoggy reference

Quote

April 23Apr 23

--gpus came with docker 19.03, and a lot of improvements with --gpus came with docker29 ( https://docs.docker.com/engine/release-notes/29/ )

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

so why would someone not want to use --gpus today and instead the legacy --runtime ?

browsing i found

https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/tree/main/cmd/nvidia-container-runtime#notes-on-using-the-docker-cli

Edited April 23Apr 23 by zoggy

Quote

April 23Apr 23

Author

This is fixed in about an hour.

Quote

April 24Apr 24

EDIT: Adding runtime: nvidia into the compose file appears to have fixed the issue. Previously, it wasn't required when using the deploy: notation. I'm not sure what changed, but adding the runtime back fixed the issue. I'm leaving this here in case it helps someone else out.

After upgrading beyond 7.2.5-rc1, I am unable to start containers using my Nvidia GTX 970. The newer kernel versions do not appear to have a working - at least for the GTX 970 - 580 series driver. 580.125 is the last driver version that worked for me. Diagnostics are attached and additional information is below. Any help would be appreciated.

Nvidia Info:

Nvidia Driver Version: 580.142
Open Source Kernel Module: No
Installed GPU(s):
0:
NVIDIA GeForce GTX 970
0C:00.0
GPU-f2b9c44f-accf-5c81-4303-1f1c97c38b39

GPU Driver Support:

0: NVIDIA GeForce GTX 970
Detected via: nvidia-smi
Chip codename: GM204
Architecture: Maxwell
Kernel module support: proprietary-only
Device ID input: 0x13C210DE
Normalized ID: 13c2
Supported candidates: 580.142
Recommended Driver: 580.142 (best-available)

nvidia-smi Output:

Thu Apr 23 22:21:09 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142                Driver Version: 580.142        CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 970         On  |   00000000:0C:00.0 Off |                  N/A |
| 47%   39C    P8             13W /  250W |       2MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Error when starting via docker compose:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: ldcache error: process /sbin/ldconfig failed with error code: 1

Example Docker Compose Service:

services:
  plex:
    container_name: plex
    image: ghcr.io/hotio/plex
    restart: always
    network_mode: host
    tmpfs:
      - /transcode
    volumes:
      - $DATADIR/plex:/config
      - $MEDIADIR/media:/data
    environment:
      - TZ=$TZ
      - UMASK=002
      - PLEX_BETA_INSTALL=true
      - PLEX_CLAIM_TOKEN=************
      - PLEX_ADVERTISE_URL=**************
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities:
                - gpu
              count: all
    healthcheck:
      test: curl --connect-timeout 15 --max-time 100 --silent --show-error --fail
        "http://$IP_ADDRESS:32400/identity" >/dev/null
      interval: 15s
      timeout: 5s
      retries: 5
      start_period: 60s
    labels:
      net.unraid.docker.icon: https://raw.githubusercontent.com/walkxcode/dashboard-icons/master/png/plex.png
      net.unraid.docker.webui: http://$IP_ADDRESS:32400/web

dragon-diagnostics-20260423-2201.zip

Edited April 24Apr 24 by mrvnsk9

Quote

April 24Apr 24

Author

39 minutes ago, mrvnsk9 said:
After upgrading beyond 7.2.5-rc1

When did you exactly upgrade?

Quote

April 24Apr 24

15 minutes ago, ich777 said:
When did you exactly upgrade?

I upgraded to 7.3.0-rc1 a couple of hours ago. It was a few hours after your latest release on GitHub.

Quote

April 24Apr 24

Author

2 hours ago, mrvnsk9 said:
EDIT: Adding runtime: nvidia into the compose file appears to have fixed the issue. Previously, it wasn't required when using the deploy: notation. I'm not sure what changed, but adding the runtime back fixed the issue. I'm leaving this here in case it helps someone else out.

So your issue is fixed anyways.

I upgraded the Nvidia toolkit to the latest version and made a mistake with one line, however it seems that the new container runtime causes issues for some users.

But for all latest versions the issue should be fixed so to speak for Unraid 7.2.4 and Unraid 7.2.0-rc.1, I'll trigger the new build for Unraid 7.2.5-rc.2 now.

Quote

2

April 24Apr 24

ich777 trying to keep up with Unraid releases be like:

Quote

2

April 24Apr 24

Author

1 hour ago, ConnerVT said:
ich777 trying to keep up with Unraid releases be like:

Maybe, but if you make a few mistakes in a row while pushing a big update in the background to the nvidia-container-toolkit and libnvidia-container can be troublesome at times :D

Quote

1

April 24Apr 24

19 hours ago, ich777 said:
This is fixed in about an hour.

Do I have to be on 7.3.0-rc1 to see this update?

Edited April 24Apr 24 by pdawg1717

Quote

April 24Apr 24

Author

2 hours ago, pdawg1717 said:
Do I have to be on 7.3.0-rc1 to see this update?

No, just the driver packages where updated.

Just redownload a driver and everything should hopefully be good.

Pleasr do keep note you should now use --gpus all in Extra Parameters and delete --runtime nvidia since everything should now be handled by Docker

Quote

1

April 25Apr 25

5 hours ago, ich777 said:
No, just the driver packages where updated.
Just redownload a driver and everything should hopefully be good.
Pleasr do keep note you should now use --gpus all in Extra Parameters and delete --runtime nvidia since everything should now be handled by Docker

So I downloaded the same version of driver (moved selection from "latest" to "open-source" of same driver), rebooted, but still cannot start the container. If I change it back to --runtime=nvidia it works again.

Quote

April 25Apr 25

Author

5 hours ago, pdawg1717 said:
So I downloaded the same version of driver (moved selection from "latest" to "open-source" of same driver), rebooted, but still cannot start the container. If I change it back to --runtime=nvidia it works again.

Then please leave it as is, as noted above, this is not working for all users, may depends on the hardware which I did not test.

Quote

[Plugin] Nvidia-Driver

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

ich777

ich777

ich777

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

ich777

ich777

ich777

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)