Altheran

Members

Joined
April 25, 20179 yr
Last visited
December 29, 2025Dec 29

View Profile Find content

Rookie

Current rank (2/14)

Posts

Find content

69
Solutions

See all answers

2
Reputation
Neutral

2
Birthday
July 6

Gender
Male
Location
Quebec, Canada

The recent visitors block is disabled and is not being shown to other users.

Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.
Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

Altheran replied to Altheran's topic in General Support

You loose your GPU X time after starting your server, and it comes back after a reboot ? If you have the same symptoms as me, I edited my solution as I had to tweak my go file also. Also, export your diagnostics zip file into ChatGPT, see if it think you have the same issue as me.
- December 29, 2025Dec 29
- 5 replies
[Plugin] Nvidia-Driver
[Plugin] Nvidia-Driver

Altheran replied to ich777's topic in Plugin Support

For anyone loosing access to their GPU randomly, here is what I did : Losing my GTX1660 GPU randomly X days/weeks ? after a reboot. - General Support - Unraid
- November 10, 2025Nov 10
- 5918 replies
Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.
Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

Altheran replied to Altheran's topic in General Support

So, I updated my BIOS and disabled any and all power management settings ... Rock solid for the last week. Note that I did nothing past my "Here is where I might pratice caution with what it is suggesting :" comment in the middle of the instructions. Edit : BIOS "only" finally didn't fix it for good (made last longer tho), I had to edit my /boot/config/go file and added these lines at the end (or in the order it makes most sense for you) # Ensure NVIDIA driver is initialized & kept warm between clients modprobe nvidia /usr/bin/nvidia-smi -pm 1 # Optional warm-up query (fails fast if device absent) /usr/bin/nvidia-smi -L || true
- November 10, 2025Nov 10
- 5 replies
nvidia runtime enabled container cannot start on 7.2 + 580.105.05 nvidia drivers
nvidia runtime enabled container cannot start on 7.2 + 580.105.05 nvidia drivers

Altheran posted a report in Stable Releases

Hi, I want to report an issue with either Unraid 7.2 and/or Nvidia drivers plugin when updating to 580.105.05. Starting a --runtime=nvidia enabled container (linuxserver/emby:beta and ghcr.io/haveagitgat/tdarr) results in this error : docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createContainer hook #4: exit status 1, stdout: , stderr: 2025/11/04 15:46:40 Error updating ldcache: error running pivot_root: pivot_root .: invalid argument Reverting back to 580.95.05 fixes the issue. Diagnostics analysis by ChatGPT return this explanation : 580.105.08 introduced a userspace/toolkit change that doesn’t play nicely with your current Docker/runc/overlayfs combo on Unraid 7.2. The failure happens in the NVIDIA Container Toolkit OCI prestart hook (the one that tries to run ldconfig inside the prepared rootfs); its internal pivot_root call returns EINVAL → container won’t start. Removing --runtime=nvidia sidesteps the hook entirely (so Emby starts, but loses NVENC). Rolling back to 580.95.05 downgrades that hook/tooling to a version that does work with your stack—hence everything comes back. Original diagnostics attached. I'd guess either an issue in the update process, incompatibility between the new driver and the packaged toolkit, or something in unraid betwwen the toolkit and the containers. untheran-diagnostics-20251104-1557.zip
- November 4, 2025Nov 4
- 8 comments
Altheran started following nvidia runtime enabled container cannot start on 7.2 + 580.105.05 nvidia drivers
- November 4, 2025Nov 4
Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.
Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

Altheran replied to Altheran's topic in General Support

Ok, so fun fact, ChatGPT (v5, thinking mode) is REALLY good at extracting and analyzing the diagnostics zip file. Prompt (+ upload of the diagnostics.zip in the same prompt): In Unraid OS, I have an Nvidia GTX 1660 GPU. All works fine for a good while, then randomly, my gpu is not found. When I reboot, everything comes back all right and working for a while. I rebooted this morning, and right now the GPU is not found. Analyze all the files and logs in this compressed zip archive. Extract the file's content, analyze all the logs and configurations, report back with an explanation of this behavior and provide a step by step solution. Here is what it reported : What’s happening (from your logs)Driver loads at boot: At boot the NVIDIA kernel modules load cleanly and bind to the 1660 at PCI 0000:01:00.0: nvidia: loading ... NVRM: loading NVIDIA UNIX x86_64 Kernel Module 580.95.05 DRM attaches: [drm] Initialized nvidia-drm ... for 0000:01:00.0 A few minutes later the GPU stops initializing Starting at 10:40:20 (same boot), the kernel begins spamming: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0xffff:1623) repeated many times as RmInitAdapter failed! (0x62:0x40:2661) These lines continue for minutes. When this starts, nvidia-smi shows “No devices were found” (your system/nvidia-smi.txt). The gpustat JSON also reports vendor command errors and N/A for clocks/temps. You have three display adapters in this system ASPEED BMC (IPMI) — AST driver (BMC console) Intel iGPU — i915 driver, device 00:02.0 NVIDIA GTX 1660 (TU116) — device 01:00.0 Motherboard: Supermicro X11SSH-LN4F, BIOS 2.0b (2017-07-27). PCIe power-saving is enabled Your bridges show ASPM L1/L0s enabled (system/aspm-status.txt). ACPI reports AER unsupported (“platform does not support [AER]”), so the OS can’t log link-layer errors even if the link is flaking. Timing correlation Docker/containers (including Tdarr, which can use NVENC) start around 10:36–10:37. A few minutes later (~10:40) the NVIDIA driver begins failing to (re)initialize the adapter. This pattern is classic for a card that’s slipped into a bad PCIe power state or is being poked while the driver isn’t “persistent.” Diagnosis in one line:Your GTX 1660 is on an older Supermicro platform (2017 BIOS) with ASPM power-saving enabled and two other GPUs present (ASPEED + Intel iGPU). Under load/idle transitions the GPU drops into a PCIe power state the driver can’t recover from; when a container touches it, the driver tries to (re)initialize and you get RmInitAdapter failed storms. A reboot resets the link, so it “comes back” — until the next time it falls into that state. Why this causes the exact symptoms you seeASPM + older platform firmware: Aggressive PCIe link power management (L1/L0s) on some older C236/X11 boards is notorious for GPUs “falling into a ditch.” Without AER, the kernel can’t even record proper link errors — so all you see is NVIDIA’s RM failing. Multiple GPUs: With ASPEED + Intel iGPU + NVIDIA in the same machine, BAR allocation and power policies get more complex on old firmware. Even if initial BAR assignment is OK (it is in your log), subsequent power-state transitions can leave the device non-responsive to RM (NVIDIA’s resource manager), which matches your RmInitAdapter failed! loops. No persistence mode: If nothing keeps the device initialized between client uses, transitions in and out of low-power + module refcounts can amplify the chance of landing in that bad state. Containers hitting the GPU while it’s not “ready”: Tdarr/Plex/etc. may probe NVENC very early after boot; if the card hasn’t fully settled (or slipped into ASPM L1.2), that’s when the first RM failures typically appear. Fix plan (do in order)You don’t have to do every step — but the more of these you apply, the more bullet-proof it gets. I’ve split them into BIOS, Unraid boot parameters, and service ordering. None of these change your data or array. 1) BIOS / firmware settings (Supermicro X11SSH-LN4F)Update BIOS to the latest available for X11SSH-LN4F. You are on 2.0b (2017-07-27) — several later revs improve PCIe compatibility and resource handling. Disable ASPM in BIOS (PCIe power management). Set PCIe ASPM/Power Saving to Disabled (global and per-slot if offered). Force slot speed for the x16 slot hosting the 1660 to Gen3 (not Auto). (Your logs show 8.0 GT/s caps on bridges; forcing speed avoids training oddities.) Enable “Above 4G Decoding” (a.k.a. 64-bit BAR decoding). Even though initial BARs look fine, this reduces pressure with 3 GPUs + NVMe/SATA HBAs. Primary display: Set Onboard/BMC as primary; if you don’t need Intel QuickSync, disable the Intel iGPU entirely. If you do use QuickSync in containers, keep iGPU on — but still set BMC as primary. (If you can only do a couple things: update BIOS, disable ASPM, force Gen3.) Here is where I might pratice caution with what it is suggesting : 2) Unraid boot (Syslinux) kernel parametersEdit: Main → Flash → Syslinux Configuration (apply to the boot stanza you use). Add to the append line: pcie_aspm=offOptional but sometimes helpful on X11 when not using Intel iGPU: To keep the iGPU from binding at all (if you don’t need it): modprobe.blacklist=i915 If you see framebuffer clashes (rare here), you can also add: video=efifb:offApply and reboot. 3) Make the NVIDIA device “persistent” and start GPU containers after it’s readyEnable persistence mode at boot Add these lines near the bottom of /boot/config/go (before the last line is fine): # Ensure NVIDIA driver is initialized & kept warm between clients modprobe nvidia /usr/bin/nvidia-smi -pm 1 # Optional warm-up query (fails fast if device absent) /usr/bin/nvidia-smi -L || true Stagger GPU containers (Tdarr, Plex/Emby/Jellyfin, etc.) until the device node exists. If you use the User Scripts plugin, create one named “Start GPU containers” with: #!/bin/bash # Wait up to ~2 minutes for NVIDIA device to appear for i in {1..60}; do if [ -e /dev/nvidia0 ]; then break fi sleep 2 done # Start your GPU-using containers (edit list) docker start tdarr 2>/dev/null || true # docker start plex 2>/dev/null || true # docker start emby 2>/dev/null || trueSet it to At Startup of Array. Also set those containers’ own setting “Start when array starts” = No, so your script controls the timing. Double-check container GPU flags In each GPU container template: NVIDIA_VISIBLE_DEVICES = your GPU UUID or all NVIDIA_DRIVER_CAPABILITIES = compute,video,utility (for NVENC) Use the NVIDIA runtime (Unraid’s NVIDIA Driver plugin does this automatically when set). 4) (If not using QuickSync) keep Intel iGPU out of the wayIf you don’t need QuickSync: Either disable Intel iGPU in BIOS or Create /boot/config/modprobe.d/blacklist-i915.conf with: blacklist i915(This keeps i915 from binding, reducing multi-GPU interaction. Reboot to apply.) 5) Physical basics (worth a quick check)Ensure the GTX 1660 has all PCIe power plugs fully seated. If using a riser, test directly in the board slot. Try another slot if available (on X11 it may share lanes with other controllers). How we know this is the root causeKernel shows successful bind at boot and repeated RmInitAdapter failed a few minutes later (10:40:20+). That pattern is not a driver install issue — it’s a device/link power or initialization issue. nvidia-smi has “No devices were found” at the time your diagnostics were captured — exactly what RM-init failure yields. ASPM is enabled and AER is not available on this platform, which hides link-error breadcrumbs and is a well-known recipe for flaky GPU links on older server boards. Three GPUs present (BMC + Intel + NVIDIA) on an old BIOS (2017) increases the chance of a device getting power-gated or left in D3 during runtime. After you make the changesReboot. On the Unraid console or SSH: nvidia-smi watch -n1 nvidia-smiYou should see the 1660 listed and it should remain visible indefinitely. Start a GPU workload (Tdarr/Emby NVENC) and watch for any new NVRM lines in Tools → System Log. There shouldn’t be any RmInitAdapter failed after these fixes. If anything still misbehaves, the next most impactful single change is disabling the Intel iGPU (if you don’t use it) and ensuring ASPM is truly off (both BIOS and kernel pcie_aspm=off). TL;DRYour GTX 1660 is fine; it’s the platform power/PCIe policy. Update the X11SSH BIOS, disable ASPM, set the GPU slot to Gen3, enable Above-4G, keep the card “awake” with nvidia-smi -pm 1, and start GPU containers only after /dev/nvidia0 exists. This stops the RmInitAdapter failed loops and the random “GPU not found” episodes without needing to reboot. So, I'll try the BIOS fixes first, it should fix the PCIe power management issues. Any thoughts on the rest of the suggestions ?
- November 3, 2025Nov 3
- 5 replies
Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.
Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

Altheran posted a topic in General Support

Hi all, I looked around about why this might happen and possible fixes. Can't figure where to start with this ... I need the collective knowlege of you fine people. Got a GTX1660, using Nvidia Drivers and a certain patch applied at first array start only. (Setup worked flawlessly for years, issue appeared ~ 6-8 months ago) All works fine for a good while, then randomly, my gpu is not found. It disapears from my Transcode GPU choices in Emby. GPU Statistics throws a "Vendor command returned unparseable data." nvivia-smi returns "No devices were found" Tools > System devices I still got : IOMMU group 2: [8086:1901] 00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07) [10de:2184] 01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1) [10de:1aeb] 01:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1) [10de:1aec] 01:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1) Bus 003 Device 001 Port 3-0 ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001 Port 4-0 ID 1d6b:0003 Linux Foundation 3.0 root hub [10de:1aed] 01:00.3 Serial bus controller: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1) After a simple software reboot, everything comes back and work fine. I attached anonymized diagnostics untheran-diagnostics-20251101-2135.zip
- November 2, 2025Nov 2
- 5 replies
Altheran started following WebGUI unresponsive - Getting 500 Internal Server Error , Losing my GTX1660 GPU randomly X days/weeks ? after a reboot. , What is the difference between virtio and virtio-net ? and 1 other
- November 2, 2025Nov 2
Docker requests
Docker requests

Altheran replied to sacretagent's topic in Docker Containers

+1 one on this. There doesn't even seem to be any premade images even ... quitte an issue with Unraid I'd say ....
- October 31, 2025Oct 31
- 567 replies
- - 1
Option to Store content of /boot/config/plugins/dockerMan elsewhere.
Option to Store content of /boot/config/plugins/dockerMan elsewhere.

Altheran replied to Altheran's topic in Feature Requests

It's what I did ! But still, there are some parameters that can't be in an ENV FILE, like Post Arguments that MIGHT contain sensitive data.
- May 22, 20251 yr
- 3 replies
What is the difference between virtio and virtio-net ?
What is the difference between virtio and virtio-net ?

Altheran replied to Beermedlar's topic in VM Engine (KVM)

@bonienl Is this this an issue ? Do we still have to use virtio-net in VMs if we use Docker containers ?
- March 19, 20251 yr
- 16 replies
Option to Store content of /boot/config/plugins/dockerMan elsewhere.
Option to Store content of /boot/config/plugins/dockerMan elsewhere.

Altheran posted a topic in Feature Requests

Given that the flash drive can't be encrypted and that docker user-templates definitions often containing credentials and keys are stored in the clear, it would be nice to have an option to move /boot/config/plugins/dockerMan on a pool drive (cache). Could be as simple as storing data on an unpublished share and making a symlink to the original folder. I tried, but it won't let me do "ln -s /mnt/cache/dockerMan /boot/config/plugins/dockerMan". Might be a cool approach for any other "config" that doesn't need to be available before the array is started and available.
- March 16, 20251 yr
- 3 replies
[PLUGIN] GPU Statistics
[PLUGIN] GPU Statistics

Altheran replied to b3rs3rk's topic in Plugin Support

@SimonF you need this ? ipmifan.log
- December 6, 20241 yr
- 2101 replies
IPv6 Disabling auto configuration (SLAAC) on my containers
IPv6 Disabling auto configuration (SLAAC) on my containers

Altheran replied to Altheran's topic in General Support

Ok, now we are getting somewhere. I tried this in "Extra parameters" in my containers with no success : --sysctl net.ipv6.conf.all.autoconf=0 --sysctl net.ipv6.conf.all.accept_ra=0 Soooo, what else is there for me to try ?
- December 6, 20241 yr
- 9 replies
WebGUI unresponsive - Getting 500 Internal Server Error
WebGUI unresponsive - Getting 500 Internal Server Error

Altheran replied to jbquintal's topic in General Support

@SimonF Here, are you aware of such issue ?
- December 4, 20241 yr
- 21 replies
[PLUGIN] GPU Statistics
[PLUGIN] GPU Statistics

Altheran replied to b3rs3rk's topic in Plugin Support

I just tried to login into my WebGUI this morning, Error 500. Dockers still working. I logged into cli via IPMI, htop, no performance issues, RAM and CPU cruising. Tried restarting the webgui process via "/etc/rc.d/rc.php-fpm restart", didn't work. Found a thread saying GPU Statistics might be the culprit here So I tried "plugin remove gpustat.plg" and restarted php-fpm. Back online ! So there seems to be a serious issue with the plugin that makes php-fpm crash or something. @SimonF what do you need to check on that ? I didn't reboot, so I might still have some relevant logs.
- December 4, 20241 yr
- 2101 replies
IPv6 Disabling auto configuration (SLAAC) on my containers
IPv6 Disabling auto configuration (SLAAC) on my containers

Altheran replied to Altheran's topic in General Support

The outgoing rule is planned, I want to block all other clients (Except my UNRAID server itself and my router) on my LAN from making DNS queries anywhere else but my Pi-Hole. And I DID enable RA in my router, I want my devices on my LAN to use SLAAC. I don't have issues with SLAAC by itself. It's just that when I configure a "STATIC" IPv6 on a device, I don't want it to go out of it's way and not use said Static IPv6 ;).
- December 3, 20241 yr
- 9 replies

Altheran

Joined

Last visited

Rookie

Posts

Solutions

Reputation

Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

[Plugin] Nvidia-Driver

Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

nvidia runtime enabled container cannot start on 7.2 + 580.105.05 nvidia drivers

Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

Losing my GTX1660 GPU randomly X days/weeks ? after a reboot.

Docker requests

Option to Store content of /boot/config/plugins/dockerMan elsewhere.

What is the difference between virtio and virtio-net ?

Option to Store content of /boot/config/plugins/dockerMan elsewhere.

[PLUGIN] GPU Statistics

IPv6 Disabling auto configuration (SLAAC) on my containers

WebGUI unresponsive - Getting 500 Internal Server Error

[PLUGIN] GPU Statistics

IPv6 Disabling auto configuration (SLAAC) on my containers

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)