[Plugin] Nvidia-Driver


ich777

Recommended Posts

OS: 6.11.4

Driver Version: Latest: v535.54.03

 

I have been using the same set up for a long time without an issue. The past 3 days i have had an issue twice where my GPU (1070) fans start going at full blast. I look at the server and the gpu is no longer visible. I looked at the logs and it showed that it crashed. I i thought i saved a copy of the system log but i did not, but i do have the gpu logs which i will attach.

 

The fist time I had used the gpu to trans-code a tv show earlier in the night. Today the GPU was doing nothing  and had not done anything  for a couple of days (since the last time this happened.).

 

I have to reboot to get it to fix.

 

I have space invader ones   nvidia power save and   nvidia power save  hourly script installed.

The last time it it happened it happened around x:45 so if the every hour is on the hour this would not have ran. They have been installed since i set up the server.

 

 

nvidia-bug-report.log.gz

Edited by chris smashe
Link to comment
25 minutes ago, chris smashe said:

I i thought i saved a copy of the system log but i did not, but i do have the gpu logs which i will attach.

Please post your Diagnostics if this is happening the next time since all the relevant information are in there.

 

26 minutes ago, chris smashe said:

I have space invader ones   nvidia power save and   nvidia power save  hourly script installed.

Please remove those scripts, those are outdated IIRC because they use a binary that is soon to be dropped by Nvidia and therefore the scripts won't work anymore.

 

The only thing that you need is to run this once:

nvidia-persistenced

Put this in the go file or through the User Scripts on startup, please don't run this command multiple times one time is enough.

Link to comment

After the 6.12 releases, we have seen reports of Docker instability when the Nvidia plugin is installed. Some containers with built-in health checks appear to be filling up the /run partition with logs and causing crashes.

 

To fix this, please ensure that your Nvidia plugin is updated to at least version 2023.07.06 and upgrade to Unraid OS version 6.12.2 (and wait for the Plugin-Update-Helper to say it’s okay to reboot).

 

If you are already on this OS version, you just need to update the plugin and reboot.

 

Thanks to @ich777, @Pducharme, @ljm42 for tracking this down.

  • Like 6
Link to comment
On 7/6/2023 at 10:30 AM, ich777 said:

Please post your Diagnostics if this is happening the next time since all the relevant information are in there.

 

Please remove those scripts, those are outdated IIRC because they use a binary that is soon to be dropped by Nvidia and therefore the scripts won't work anymore.

 

The only thing that you need is to run this once:

nvidia-persistenced

Put this in the go file or through the User Scripts on startup, please don't run this command multiple times one time is enough.

 

Just happened again. Attached are the diagnostics

 

smashenas-diagnostics-20230707-2045.zip

Link to comment
5 hours ago, chris smashe said:

Just happened again. Attached are the diagnostics

From what I see in the Diagnostics your GPU falls from the bus:

Jul  7 19:51:42 SmasheNas kernel: NVRM: GPU at PCI:0000:01:00: GPU-cc1373f9-9d64-fd0f-2406-6b90e4430287
Jul  7 19:51:42 SmasheNas kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Jul  7 19:51:42 SmasheNas kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Jul  7 19:51:42 SmasheNas kernel: NVRM: A GPU crash dump has been created. If possible, please run
Jul  7 19:51:42 SmasheNas kernel: NVRM: nvidia-bug-report.sh as root to collect this data before
Jul  7 19:51:42 SmasheNas kernel: NVRM: the NVIDIA kernel module is unloaded.

 

This usually happens because of too less power over the external PCIe power connector or some aggressive power saving measurements.

 

Did you do nothing to the hardware (add/remove hardware)? You can also try to reseat the card in the PCIe slot that also helped some users. What power supply are you using?

Do you have a machine where you can put the card in to test the card and put a 3D load on it like FurMark for at least 30 minutes to an hour?

Link to comment

Hi,

 

So I have tried this several times and have waited a few hours also and tried again. No issue. I have never had issues ever installing this plugin till the upgrade. Running 6.12.2. I've gotten and followed every prompt. This is was fresh. The driver was removed... rebooted and started from scratch. Check /boot and the driver was gone for a sanity check. I'm basically given the finger everytime. Again, prior to this july version, never had an issue with the plugin and been on 6.12.2 the whole time from a new rebuild.

 

unraid-diagnostics-20230708-0232.zip

 

Snip20230708_1.thumb.png.e2185f95f1bd66b69c4db5ee2b1efce2.png

 

Snip20230708_2.thumb.png.bfb225d7d2b4c90d8c49fd3a713cb73d.png

 

Snip20230708_3.thumb.png.943093012bf2d0570cde81b35dccf6be.png

 

Snip20230708_4.thumb.png.3ac6a72711d5bc31d26b89bf340b382b.png

 

Snip20230708_6.thumb.png.2e393488a8c7f317872e162e9f43b587.png

 

Snip20230708_7.thumb.png.623a3b5561088b141a444b45166c2535.png

 

 

Link to comment
31 minutes ago, Admin9705 said:

Again, prior to this july version, never had an issue with the plugin and been on 6.12.2 the whole time from a new rebuild.

Have you yet tried to install another driver version like 530.41.03 and see if the issue persists (please reboot after installing another version)?

 

The strange thing is that the driver reports that your card isn't supported:

Jul  8 02:29:59 UNRAID kernel: NVRM: The NVIDIA GPU 0000:06:00.0 (PCI ID: 10de:2684)
Jul  8 02:29:59 UNRAID kernel: NVRM: installed in this system is not supported by the
Jul  8 02:29:59 UNRAID kernel: NVRM: NVIDIA 535.54.03 driver release.
Jul  8 02:29:59 UNRAID kernel: NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
Jul  8 02:29:59 UNRAID kernel: NVRM: in this release's README, available on the operating system
Jul  8 02:29:59 UNRAID kernel: NVRM: specific graphics driver download page at www.nvidia.com.

 

Did you change anything in terms of hardware or did a BIOS upgrade of some sort?

 

This July version changed only a minor thing which doesn't affect how the driver is working, it only changes one line for the container runtime which again doesn't affect how the plugin is working.

 

 

BTW, if the driver lists something related to nvidia-smi failed then a restart from Docker won't fix anything.

Link to comment

No never an issue. I tried the older drivers also and had the same issues. The card is a 4090. Nothing has changed hardware wise or bios. Was all working fine several hours ago and then upgraded to this new version and then this issue came up. 

I have a 3080ti also. I’ll switch the card and see what happens.

 

 

IMG_4565.jpeg

Edited by Admin9705
Link to comment
2 minutes ago, Admin9705 said:

The card is a 4090.

Yes, I‘ve already saw that in the Diagnostics.

 

3 minutes ago, Admin9705 said:

Was all working fine several hours ago and then upgraded to this new version and then this issue came up. 

This release only changed ine line in the plugin which is responsible for the log output in the container runtime which should not affect anything in terms how the driver is working.

 

Are you sure that it was working before and hadn‘t got the same issue like now since Plex will fall back to SW transcoding if the card isn‘t working anymore.

 

Anyways if it was working before this seems rather strange to me bacause the driver package is still the same as before and now it reports that it isn‘t supported.

Link to comment
Quote

Did you do nothing to the hardware (add/remove hardware)?

I have not

 

Quote

You can also try to reseat the card in the PCIe slot that also helped some users.

I will try that

 

Quote

What power supply are you using?

 

Corsair RM850x

 

Quote

Do you have a machine where you can put the card in to test the card and put a 3D load on it like FurMark for at least 30 minutes to an hour?

 

Yes i can move it to my windows machine if reseating it does not work

 

Will let you know the results. Thanks

 

  • Like 1
Link to comment
1 hour ago, ich777 said:

This is really strange, do you have another PC to test the 4090?

 

I do, I have to build and heading out of town for the weekend. I'll build and then test. One thing, a week ago, I did have the 4090 and 3080ti in the system. I took out the 3080ti a week ago, but the 4090 has been working solo for awhile.

Link to comment
1 minute ago, Admin9705 said:

I took out the 3080ti a week ago, but the 4090 has been working solo for awhile.

As said above the update didn‘t change anything on how the plugin is working or on the driver itself.

 

Just one line is modified which strictly speaking doesn‘t have anything to do with the driver itself.

Link to comment

Since I installed 6.12.2 nvidia stoped working in my containers like plex or frigate, when I try to use it I get errors and crashes

 

imagen.thumb.png.c5c236695298cf2ee5c4c81ecc92d1bf.png

imagen.thumb.png.c511984bf442cfeb47b17921768d1301.png

 

plugin ver 2023.07.06 

 

in frigate I get ffmpeg errors due to the H264 decoder using nvidia GPU

error in plex when transcoding

imagen.thumb.png.4c1cd450be667750011a8ef34b18153c.png

 

 

All this was working fine before. I haven't changed anything not in the docker run, not in the apps config. Any idea what the problem could be?

 

@SpencerJ could this be related with the issue you mention here?

Although I',m using the latest version of everything.

 

unraid-diagnostics-20230710-1727.zip

Edited by L0rdRaiden
Link to comment
2 hours ago, L0rdRaiden said:

@SpencerJ could this be related with the issue you mention here?

This is already solved with Plugin version 2023.07.06

 

2 hours ago, L0rdRaiden said:

when I try to use it I get errors and crashes

Can you please be a bit more specific? Where do you get errors and where do you get crashes? What does crash exactly?

 

Do you have a way to test the card other than in your server? I see nothing obvious in your syslogs why it shouldn't work.

Have you yet tried to reboot your system?

Link to comment
39 minutes ago, ich777 said:

This is already solved with Plugin version 2023.07.06

 

Can you please be a bit more specific? Where do you get errors and where do you get crashes? What does crash exactly?

 

Do you have a way to test the card other than in your server? I see nothing obvious in your syslogs why it shouldn't work.

Have you yet tried to reboot your system?

 

For plex there is nothing in the logs, just the error in the screenshot above when I try to use transcoding

 

[migrations] started
[migrations] no migrations found
usermod: no changes
───────────────────────────────────────

      ██╗     ███████╗██╗ ██████╗ 
      ██║     ██╔════╝██║██╔═══██╗
      ██║     ███████╗██║██║   ██║
      ██║     ╚════██║██║██║   ██║
      ███████╗███████║██║╚██████╔╝
      ╚══════╝╚══════╝╚═╝ ╚═════╝ 

   Brought to you by linuxserver.io
───────────────────────────────────────

To support LSIO projects visit:
https://www.linuxserver.io/donate/

───────────────────────────────────────
GID/UID
───────────────────────────────────────

User UID:    99
User GID:    100
───────────────────────────────────────

**** Server already claimed ****
**** permissions for /dev/dri/renderD128 are good ****
**** permissions for /dev/dri/card0 are good ****
Docker is used for versioning skip update check
[custom-init] No custom files found, skipping...
Starting Plex Media Server. . . (you can ignore the libusb_init error)
[ls.io-init] done.
Critical: libusb_init failed

 

Plex docker run

 

docker run
  -d
  --name='Plex'
  --net='br2'
  --ip='10.10.50.20'
  --cpuset-cpus='4,5,6,7,8,9,16,17,18,19,20,21'
  -e TZ="Europe/Paris"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="Unraid"
  -e HOST_CONTAINERNAME="Plex"
  -e 'VERSION'='docker'
  -e 'NVIDIA_VISIBLE_DEVICES'='GPU-f1c0f52c-e491-64c7-428c-e10038734368'
  -e 'NVIDIA_DRIVER_CAPABILITIES'='all'
  -e 'PUID'='99'
  -e 'PGID'='100'
  -e 'TCP_PORT_32400'='32400'
  -e 'TCP_PORT_3005'='3005'
  -e 'TCP_PORT_8324'='8324'
  -e 'TCP_PORT_32469'='32469'
  -e 'UDP_PORT_1900'='1900'
  -e 'UDP_PORT_32410'='32410'
  -e 'UDP_PORT_32412'='32412'
  -e 'UDP_PORT_32413'='32413'
  -e 'UDP_PORT_32414'='32414'
  -e '022'='022'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:32400]/web'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/linuxserver/docker-templates/master/linuxserver.io/img/plex-icon.png'
  -v '/mnt/user/Video/Películas/':'/media/Películas':'rw'
  -v '/mnt/user/Video/Movies/':'/media/Movies':'rw'
  -v '/mnt/user/Video/Series/':'/media/Series':'rw'
  -v '':'/movies':'rw'
  -v '':'/tv':'rw'
  -v '':'/music':'rw'
  -v '/mnt/user/Docker/Plex/':'/config':'rw'
  --dns=10.10.50.5
  --no-healthcheck
  --runtime=nvidia
  --mount type=tmpfs,destination=/tmp,tmpfs-size=4000000000 'lscr.io/linuxserver/plex'

d52fd6937b48a59636659eacbee1624de26fe7ba3f718fff524eafdd4e205cba

The command finished successfully!

 

For frigate I opened this bug when I though that the problem was with frigate but is actually with the GPU, you can see the logs in the last 3 or 4 posts

https://github.com/blakeblackshear/frigate/issues/7051

 

I don't know what else to do to troubleshoot this.

Edited by L0rdRaiden
Link to comment
3 hours ago, L0rdRaiden said:

error in plex when transcoding

From what I found from searching the error s1003 on Google it seems that this is either a issue with the database or the network?

 

 

4 minutes ago, L0rdRaiden said:

For frigate I opened this bug when I though that the problem was with frigate but is actually with the GPU, you can see the logs in the last 3 or 4 posts

Please remove this Extra Parameter from Frigate since it is known to cause issues:

--gpus=all

I would also recommend that you change this:

'NVIDIA_DRIVER_CAPABILITIES'='compute,utility,video'

to that:

'NVIDIA_DRIVER_CAPABILITIES'='all'

 

Sorry I'm not too familiar with Frigate but it seems that ffmpeg has issues receiving frames so that it can not transcode them.

Are you sure that there is nothing wrong with the network?

 

 

2 minutes ago, L0rdRaiden said:

I don't know what else to do to troubleshoot this.

Have you yet tried to reboot your server?

 

 

BTW I don't think that this is a specific error with 6.12.2 since other people would have such issues too and my thread would explode with support requests

From what I see from the download numbers the drivers are installed on about 8000 Unraid systems (combined) the driver version 535.54.03 alone was about 7700 times downloaded.

Link to comment

I have rebooted the server several times but nothings changes.

 

I have been reproducing the error with plex

 

this is what I get in unraid log

 

Jul 10 20:44:49 Unraid kernel: WARNING: CPU: 14 PID: 0 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 10 20:44:49 Unraid kernel: Modules linked in: veth wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha af_packet nvidia_uvm(PO) xt_nat macvlan xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge 8021q garp mrp stp llc ixgbe xfrm_algo mdio igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) nvidia(PO) edac_mce_amd zcommon(PO) edac_core znvpair(PO) spl(O) kvm_amd video drm_kms_helper kvm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 backlight aesni_intel crypto_simd syscopyarea tpm_crb cryptd wmi_bmof
Jul 10 20:44:49 Unraid kernel: mxm_wmi asus_wmi_sensors tpm_tis sysfillrect i2c_piix4 k10temp nvme rapl tpm_tis_core input_leds ccp ahci sysimgblt led_class cdc_acm nvme_core i2c_core libahci fb_sys_fops tpm wmi button acpi_cpufreq unix [last unloaded: xfrm_algo]
Jul 10 20:44:49 Unraid kernel: CPU: 14 PID: 0 Comm: swapper/14 Tainted: P           O       6.1.36-Unraid #1
Jul 10 20:44:49 Unraid kernel: Hardware name: ASUS System Product Name/ROG CROSSHAIR VII HERO, BIOS 4603 09/13/2021
Jul 10 20:44:49 Unraid kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 10 20:44:49 Unraid kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Jul 10 20:44:49 Unraid kernel: RSP: 0018:ffffc900004c8838 EFLAGS: 00010202
Jul 10 20:44:49 Unraid kernel: RAX: 0000000000000001 RBX: ffff8885c2e81f00 RCX: 7aecd0b99ace0591
Jul 10 20:44:49 Unraid kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8885c2e81f00
Jul 10 20:44:49 Unraid kernel: RBP: 0000000000000001 R08: fed2146f5781fd9e R09: d403ee2a01cdc41c
Jul 10 20:44:49 Unraid kernel: R10: 13c56616bc33d4cc R11: ffffc900004c8800 R12: ffffffff82a11440
Jul 10 20:44:49 Unraid kernel: R13: 00000000000254b3 R14: ffff88892d6dbe00 R15: 0000000000000000
Jul 10 20:44:49 Unraid kernel: FS:  0000000000000000(0000) GS:ffff888ffeb80000(0000) knlGS:0000000000000000
Jul 10 20:44:49 Unraid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 10 20:44:49 Unraid kernel: CR2: 000000c000107010 CR3: 00000001c7cee000 CR4: 0000000000350ee0
Jul 10 20:44:49 Unraid kernel: Call Trace:
Jul 10 20:44:49 Unraid kernel: <IRQ>
Jul 10 20:44:49 Unraid kernel: ? __warn+0xab/0x122
Jul 10 20:44:49 Unraid kernel: ? report_bug+0x109/0x17e
Jul 10 20:44:49 Unraid kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 10 20:44:49 Unraid kernel: ? handle_bug+0x41/0x6f
Jul 10 20:44:49 Unraid kernel: ? exc_invalid_op+0x13/0x60
Jul 10 20:44:49 Unraid kernel: ? asm_exc_invalid_op+0x16/0x20
Jul 10 20:44:49 Unraid kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Jul 10 20:44:49 Unraid kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Jul 10 20:44:49 Unraid kernel: ? nf_nat_inet_fn+0xc0/0x1a8 [nf_nat]
Jul 10 20:44:49 Unraid kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Jul 10 20:44:49 Unraid kernel: nf_hook_slow+0x3d/0x96
Jul 10 20:44:49 Unraid kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Jul 10 20:44:49 Unraid kernel: NF_HOOK.constprop.0+0x79/0xd9
Jul 10 20:44:49 Unraid kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Jul 10 20:44:49 Unraid kernel: ip_sabotage_in+0x52/0x60 [br_netfilter]
Jul 10 20:44:49 Unraid kernel: nf_hook_slow+0x3d/0x96
Jul 10 20:44:49 Unraid kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
Jul 10 20:44:49 Unraid kernel: NF_HOOK.constprop.0+0x79/0xd9
Jul 10 20:44:49 Unraid kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
Jul 10 20:44:49 Unraid kernel: __netif_receive_skb_one_core+0x77/0x9c
Jul 10 20:44:49 Unraid kernel: netif_receive_skb+0xbf/0x127
Jul 10 20:44:49 Unraid kernel: br_handle_frame_finish+0x438/0x472 [bridge]
Jul 10 20:44:49 Unraid kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Jul 10 20:44:49 Unraid kernel: br_nf_hook_thresh+0xe5/0x109 [br_netfilter]
Jul 10 20:44:49 Unraid kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Jul 10 20:44:49 Unraid kernel: br_nf_pre_routing_finish+0x2c1/0x2ec [br_netfilter]
Jul 10 20:44:49 Unraid kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Jul 10 20:44:49 Unraid kernel: ? NF_HOOK.isra.0+0xe4/0x140 [br_netfilter]
Jul 10 20:44:49 Unraid kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
Jul 10 20:44:49 Unraid kernel: br_nf_pre_routing+0x236/0x24a [br_netfilter]
Jul 10 20:44:49 Unraid kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
Jul 10 20:44:49 Unraid kernel: br_handle_frame+0x27a/0x2e0 [bridge]
Jul 10 20:44:49 Unraid kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Jul 10 20:44:49 Unraid kernel: __netif_receive_skb_core.constprop.0+0x4fd/0x6e9
Jul 10 20:44:49 Unraid kernel: __netif_receive_skb_list_core+0x8a/0x11e
Jul 10 20:44:49 Unraid kernel: netif_receive_skb_list_internal+0x1d2/0x20b
Jul 10 20:44:49 Unraid kernel: gro_normal_list+0x1d/0x3f
Jul 10 20:44:49 Unraid kernel: napi_complete_done+0x7b/0x11a
Jul 10 20:44:49 Unraid kernel: igb_poll+0xd88/0xf8e [igb]
Jul 10 20:44:49 Unraid kernel: ? run_cmd+0x13/0x51
Jul 10 20:44:49 Unraid kernel: ? update_overutilized_status+0x33/0x6e
Jul 10 20:44:49 Unraid kernel: ? hrtick_update+0x17/0x4f
Jul 10 20:44:49 Unraid kernel: __napi_poll.constprop.0+0x2b/0x124
Jul 10 20:44:49 Unraid kernel: net_rx_action+0x159/0x24f
Jul 10 20:44:49 Unraid kernel: __do_softirq+0x129/0x288
Jul 10 20:44:49 Unraid kernel: __irq_exit_rcu+0x5e/0xb8
Jul 10 20:44:49 Unraid kernel: common_interrupt+0x9b/0xc1
Jul 10 20:44:49 Unraid kernel: </IRQ>
Jul 10 20:44:49 Unraid kernel: <TASK>
Jul 10 20:44:49 Unraid kernel: asm_common_interrupt+0x22/0x40
Jul 10 20:44:49 Unraid kernel: RIP: 0010:cpuidle_enter_state+0x11d/0x202
Jul 10 20:44:49 Unraid kernel: Code: 16 37 a0 ff 45 84 ff 74 1b 9c 58 0f 1f 40 00 0f ba e0 09 73 08 0f 0b fa 0f 1f 44 00 00 31 ff e8 24 f6 a4 ff fb 0f 1f 44 00 00 <45> 85 e4 0f 88 ba 00 00 00 48 8b 04 24 49 63 cc 48 6b d1 68 49 29
Jul 10 20:44:49 Unraid kernel: RSP: 0018:ffffc900001c7e98 EFLAGS: 00000246
Jul 10 20:44:49 Unraid kernel: RAX: ffff888ffeb80000 RBX: ffff888108c8cc00 RCX: 0000000000000000
Jul 10 20:44:49 Unraid kernel: RDX: 0000096113d8cad6 RSI: ffffffff820909fc RDI: ffffffff82090f05
Jul 10 20:44:49 Unraid kernel: RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000002
Jul 10 20:44:49 Unraid kernel: R10: 0000000000000020 R11: 0000000000004bc6 R12: 0000000000000002
Jul 10 20:44:49 Unraid kernel: R13: ffffffff823235a0 R14: 0000096113d8cad6 R15: 0000000000000000
Jul 10 20:44:49 Unraid kernel: ? cpuidle_enter_state+0xf7/0x202
Jul 10 20:44:49 Unraid kernel: cpuidle_enter+0x2a/0x38
Jul 10 20:44:49 Unraid kernel: do_idle+0x18d/0x1fb
Jul 10 20:44:49 Unraid kernel: cpu_startup_entry+0x1d/0x1f
Jul 10 20:44:49 Unraid kernel: start_secondary+0xeb/0xeb
Jul 10 20:44:49 Unraid kernel: secondary_startup_64_no_verify+0xce/0xdb
Jul 10 20:44:49 Unraid kernel: </TASK>
Jul 10 20:44:49 Unraid kernel: ---[ end trace 0000000000000000 ]---

 

This is plex app logs while reproducing the error, the error below is what i get every time I try to transcode I get the popup error.

 

imagen.png.e9c82dd66d370218c9f0abd9f9cc2f3b.pngimagen.thumb.png.0557c8fa8033fefdfbbd58a5e9b1da8b.png

 

 

Regarding your comments frigate, the network is fine because as soon as I disable gpu decoding everything works. While using gpu decoding I can access the cameras using other tools and works. ANyway I'm going to do the changes you proposed to see if something changes, considering that it affects to plex as well, and I have the same problem with plex even if frigate is stopped...

 

thanks for you help. Maybe is something with my config... but I don't even know where to start to troubleshoot it, and the logs don't tell a lot.

All I know is that it only happens when the container try to use the GPU for something

Edited by L0rdRaiden
Link to comment
39 minutes ago, L0rdRaiden said:

this is what I get in unraid log

This call trace is not caused by the Nvidia Driver plugin, please switch from MACVLAN to IPVLAN in your Docker settings and reboot your server.

 

39 minutes ago, L0rdRaiden said:

This is plex app logs while reproducing the error, the error below is what i get every time I try to transcode I get the popup error.

Can you post the transcoding logs too?

 

39 minutes ago, L0rdRaiden said:

thanks for you help. Maybe is something with my config... but I don't even know where to start to troubleshoot it, and the logs don't tell a lot.

Do you have another PC where you can test the card to just make sure that it is working as expected?

Link to comment
22 hours ago, ich777 said:

This call trace is not caused by the Nvidia Driver plugin, please switch from MACVLAN to IPVLAN in your Docker settings and reboot your server.

 

Can you post the transcoding logs too?

 

Do you have another PC where you can test the card to just make sure that it is working as expected?

I don't have other pc to try but I have remove the card and plug it again (didn't work)

Then I clean the logs folder of plex, I started plex, I reproduced the issue, and here are the logs and also screenshots of nvidia-smi while trying to transcode.

 

imagen.thumb.png.7036aa4d876e440a54030661857db6cf.png

 

Maybe the card died but apparently it's working...

Logs.zip

Link to comment
1 hour ago, L0rdRaiden said:

Then I clean the logs folder of plex, I started plex, I reproduced the issue, and here are the logs and also screenshots of nvidia-smi while trying to transcode.

Wait, are you using the the WebClient from Plex to transcode a movie? Can you try to use a native app like for Android or iOS and see if it is working there if you force a transcode?

The Plex WebClient is nutorious

 

What happens when you stop the transcode? Can you post a screenshot from nvidia-smi if nothing is using the GPU please?

Link to comment
On 7/10/2023 at 9:09 PM, L0rdRaiden said:

thanks for you help. Maybe is something with my config... but I don't even know where to start to troubleshoot it, and the logs don't tell a lot.

 

your logs looks a little weird while plex is trying to use vaapi (intel /amd) before nvenc ...

 

may try to delete the codecs dir from plex and restart the docker, also may try another browser to test playback (or a native client and force transcoding like mentioned from @ich777)

 

image.png.e87d6fe0a23bf0eceeed7f876805615f.png

 

also may try without your ramdisk as transcoding path for testing, i know some DTS streams needs some insane high ramdisk free space (whyever)

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.