[Plugin] Nvidia-Driver


ich777

Recommended Posts

4 hours ago, VladoPortos said:

I'm experiencing hard crashes after some time when one or both Nvidia cards are encoding for some time (tdarr)

How hot did the cards get?

Are you sure this is caused by the transcoding itself? Is it possible for you try Handbrake if the same happens there too?

Link to comment
4 hours ago, ich777 said:

Is this a new or a used card?

 

Something seems very strange here to me... Never had a problem using my P400 and playing such files.

 

Brand new. Sealed in box from my local computer parts store.

 

Ah well, I'm not too concerned. It works well in HandBrake so not a total loss.

Link to comment
17 minutes ago, Akshunhiro said:

Ah well, I'm not too concerned. It works well in HandBrake so not a total loss.

I really can't tell why it's not working here... :/

On Emby/Jellyfin/Plex everything works as expected with my T400 and even back then with my P400.

Link to comment

I have (2) Nvidia GTX 1660 Super Gaming X

One is in the first PCIe slot is for unrad/plex while the second card is assigned to a VM

 

My problem is after installing Nvidia Driver, on startup the video card shuts off (no output) I can't log directly into the server (with keyboard/mouse on system) and I occasionally get hardware errors through 'Fix Common Problems'. I uploaded two syslogs. The first is before installing the driver and the second one is after.  

workhorse-syslog-20211007-0903.zip workhorse-syslog-20211007-0920.zip

Edited by N385
Link to comment
13 minutes ago, N385 said:

One is in the first PCIe slot is for unrad/plex while the second card is assigned to a VM

Have you bound one to VFIO, if not I would you strongly recommend to bound the one to VFIO that you want to use in the VM.

 

Do the two cards appear in the Nvidia Driver plugin?

 

13 minutes ago, N385 said:

My problem is after installing Nvidia Driver, on startup the video card shuts off (no output)

Is this also the case when the Nvidia Driver is not installed?

You boot unRAID into GUI mode and get no output, correct?

Link to comment

With the first video card I can log into GUI (without the driver), but once the driver is installed I can see the login for about 1/4 second before the output disappears and the monitor shuts off.

 

The second video card is completely VFIO bound to the VM and works flawlessly with handbrake for video work and other tasks (I Remote Desktop into it).

 

I have a third video card on order (AMD) that I intend to bind to a second VM and want to use that daily and get rid of my standalone Win10 system, making Unraid GUI login required if I have to shut off the VMs for any reason.

Link to comment
43 minutes ago, N385 said:

With the first video card I can log into GUI (without the driver), but once the driver is installed I can see the login for about 1/4 second before the output disappears and the monitor shuts off.

Please try this solution and let me know if it is working:

 

Link to comment

I got disable_xconfig=true.

 

After reboot & startup sequence, I now get a horizontal blinking cursor instead of the monitor shutting down.

 

One other change I noticed after installing the driver is my startup's last line is: Warning: commands will be executed using /bin/sh and it sits on that for about 60 seconds before boot finishes.

 

If I delete the driver and restart it doesn't do that anymore.

 

IMG_5434.jpg

Edited by N385
Link to comment
1 hour ago, N385 said:

After reboot & startup sequence, I now get a horizontal blinking cursor instead of the monitor shutting down.

Can you please upload your full Diagnostics?

 

1 hour ago, N385 said:

Warning: commands will be executed using /bin/sh and it sits on that for about 60 seconds

This can be ignored and will be fixed in an upcoming version from the plugin, the ~60 seconds are actually the time it takes to install the plugin.

Link to comment
8 hours ago, N385 said:

Here she is...

Just a thought but can you try to switch the cards, so that you bind the second one to VFIO and unbind the first one from VFIO, also don't forget to change the cards in the VMs template, so that unRAID can grab the first one and not the second one?

Please also don't change the config file for now and let the xconf at false.

Link to comment
3 hours ago, ich777 said:

Just a thought but can you try to switch the cards, so that you bind the second one to VFIO and unbind the first one from VFIO, also don't forget to change the cards in the VMs template, so that unRAID can grab the first one and not the second one?

I wondered myself if it had to do with the video cards so I already shifted them earlier this week and it made no difference. I tried everything I could think of to try and get it to work. I would change one setting, reboot and if it didn't work I'd change it back.

 

I've had a bad week trying to get this to work. One of the things I tried was changing the ACS overide and my system didn't like that. I thought I was going to have to start over from scratch as it wouldn't boot at all. Another thing I boned was using a capital letter in --runtime=nvidia in plex and the docker disappeared. Thank God everything is well backed up and I knew what to do.

 

Quote

let the xconf at false.

I'm sorry I don't know what you mean. Did you want me to change something? If so you're going to have to tell me what to type.

 

I've only been using Unraid since April and just started messing around with a linux distro also. I'm a 53y/o lifetime Windows user but I'm learning (slowly).

Edited by N385
  • Like 1
Link to comment
I'm sorry I don't know what you mean. Did you want me to change something? If so you're going to have to tell me what to type.
 
I've only been using Unraid since April and just started messing around with a linux distro also. I'm a 53y/o lifetime Windows user but I'm learning (slowly).
On what card is the BIOS screen displayed? I would recommend to bind the card to VFIO which does output nothing on boot.

Do you boot with CSM enabled or better speaking with Legacy Boot or do you boot with UEFI?

Can you try to uninstall the Nvidia Plugin once and reboot and see if you get a screen output again?

Sent from my C64

Link to comment
8 hours ago, ich777 said:

On what card is the BIOS screen displayed? I would recommend to bind the card to VFIO which does output nothing on boot.

To make things simpler I deleted my VM and unbound everything from it and removed the 2nd video card from PCI-3 slot so now I just have the one video card in PCI-1.

 

I have tried each card in PCI-1. Both will boot natively into unraid without the driver, but not with the driver installed. Once boot is complete the monitor shuts off.

 

I boot in Legacy mode.

Link to comment

Well, new boot disk did the same exact thing. Right after start the monitor shut off.

 

So I changed video cards and nothing changed.

 

One of the things I ordered Monday from Amazon came in with my Amd video card, and that was a Display port cable.

 

I rebooted with display port and boom, it works, right into the sign-in in 2160p. Almost 8 days of anguish and it works. 🙂

 

I'll edit this post and let you know if I have luck with my regular system.  Sorry for all the trouble.

  • Like 1
Link to comment
17 minutes ago, N385 said:

I rebooted with display port and boom, it works, right into the sign-in in 2160p. Almost 8 days of anguish and it works. 🙂

How was the card connected before?

So now it is all working?

Link to comment

I was using HDMI and replaced it with Display Port. That fixed it with the temporary boot setup, but with now back with everything normal, I've got the horizontal blinking curser. I think I jinxed myself.

 

Where's a gun when you need it? 1 shot in the GPU, 2 in the mb.

Link to comment
16 minutes ago, N385 said:

I was using HDMI and replaced it with Display Port. That fixed it with the temporary boot setup, but with now back with everything normal, I've got the horizontal blinking curser. I think I jinxed myself.

May I ask why you need to boot into the unRAID GUI anyways?

 

This is a really strange issue, can you check the output of the other graphics card too? Have you already tried to plug the cable in another port on the card or re plug it when this happens?

 

What you can also do is:

  1. Press Ctrl - Alt - F1 (at the same time)
  2. Log in with root and your password
  3. Type in: /etc/rc.d/rc.4

But I don't think it will change anything to the situation...

 

Please check also again your BIOS what PCIe slot you've set to the primary graphics card.

Link to comment
On 10/6/2021 at 12:51 PM, ich777 said:

How hot did the cards get?

Are you sure this is caused by the transcoding itself? Is it possible for you try Handbrake if the same happens there too?

Unfortunately I can't 100% confirm it's the encoder, it's my best guess. But there was also issue with network (switch was kind of broken) and after removing it there was no crash yet. But I run only one card. 

For sake of testing I just turned both of the cards to encode and will report if it crashes or not. 

 

So far the temps are like this: GTX 960 - cca 60 Degree, GTX 980 - cca 70 Degrees

 

 

 

temp.JPG

  • Like 1
Link to comment
2 hours ago, ich777 said:

May I ask why you need to boot into the unRAID GUI anyways?

The windows 10 PC I'm using now runs on an M.2 nvme drive that I'm going to put in the server and pair with an AMD graphics card so I won't have to have two PCs on all the time.

 

I have a second M.2 nvme that will again be paired with the second nvidia for nvenc and can do other tasks relatively undisturbed.

 

If I have to do any work that requires the VMs be shutoff, I'll have to be able to do it with the first video card that will be paired to Plex for transcoding etc. 

 

BTW I deleted the driver and reinstalled it and now, finally it's working.

  • Like 1
Link to comment

I think Im having an issue with the Nvidia Plugin:

Sep 30 07:10:08 Tower kernel: BUG: kernel NULL pointer dereference, address: 00000000000000b1
Sep 30 07:10:08 Tower kernel: #PF: supervisor read access in kernel mode
Sep 30 07:10:08 Tower kernel: #PF: error_code(0x0000) - not-present page
Sep 30 07:10:08 Tower kernel: PGD 2d5955067 P4D 2d5955067 PUD 2aded0067 PMD 0 
Sep 30 07:10:08 Tower kernel: Oops: 0000 [#1] SMP NOPTI
Sep 30 07:10:08 Tower kernel: CPU: 72 PID: 106336 Comm: nvidia-smi Tainted: P           O      5.10.28-Unraid #1
Sep 30 07:10:08 Tower kernel: Hardware name: ASUS System Product Name/ROG ZENITH II EXTREME ALPHA, BIOS 1402 01/15/2021
Sep 30 07:10:08 Tower kernel: RIP: 0010:_nv031699rm+0x79/0x940 [nvidia]
Sep 30 07:10:08 Tower kernel: Code: 07 00 00 41 bf 01 00 00 00 4c 8d 65 48 31 db 44 89 7d 10 66 0f 1f 44 00 00 41 f6 c5 01 0f 84 90 00 00 00 49 8b 86 30 1a 00 00 <80> b8 b1 00 00 00 00 74 12 b8 01 00 00 00 89 d9 d3 e0 41 85 86 94
Sep 30 07:10:08 Tower kernel: RSP: 0018:ffffc9000303b978 EFLAGS: 00010202
Sep 30 07:10:08 Tower kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
Sep 30 07:10:08 Tower kernel: RDX: ffff88824b6f0008 RSI: ffff88817b692008 RDI: ffff888198f88008
Sep 30 07:10:08 Tower kernel: RBP: ffff8884806ddd80 R08: 0000000000000002 R09: 0000000000000020
Sep 30 07:10:08 Tower kernel: R10: 0000000000000002 R11: 0000000000000002 R12: ffff8884806dddc8
Sep 30 07:10:08 Tower kernel: R13: 0000000000000003 R14: ffff88817b692008 R15: 0000000000000001
Sep 30 07:10:08 Tower kernel: FS:  0000152f94ef2b80(0000) GS:ffff88bf3e000000(0000) knlGS:0000000000000000
Sep 30 07:10:08 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 30 07:10:08 Tower kernel: CR2: 00000000000000b1 CR3: 000000048e7d4000 CR4: 0000000000350ee0
Sep 30 07:10:08 Tower kernel: Call Trace:
Sep 30 07:10:08 Tower kernel: ? _nv031813rm+0x82/0x270 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv031846rm+0x17/0x30 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv022821rm+0xc0/0x1b0 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv022826rm+0x11b/0x230 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv022826rm+0x211/0x230 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv022828rm+0x310/0x310 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv023498rm+0x32d/0x470 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv023498rm+0x304/0x470 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv000722rm+0x32a/0x680 [nvidia]
Sep 30 07:10:08 Tower kernel: ? _nv000715rm+0x1802/0x23d0 [nvidia]
Sep 30 07:10:08 Tower kernel: ? rm_init_adapter+0xc5/0xe0 [nvidia]
Sep 30 07:10:08 Tower kernel: ? ttwu_queue_wakelist+0x93/0x9a
Sep 30 07:10:08 Tower kernel: ? nv_open_device+0x44b/0x676 [nvidia]
Sep 30 07:10:08 Tower kernel: ? nvidia_open+0x266/0x3d1 [nvidia]
Sep 30 07:10:08 Tower kernel: ? nvidia_frontend_open+0x62/0x8d [nvidia]
Sep 30 07:10:08 Tower kernel: ? chrdev_open+0x150/0x187
Sep 30 07:10:08 Tower kernel: ? cdev_put+0x19/0x19
Sep 30 07:10:08 Tower kernel: ? do_dentry_open+0x184/0x289
Sep 30 07:10:08 Tower kernel: ? path_openat+0x85e/0x937
Sep 30 07:10:08 Tower kernel: ? filename_lookup+0xb8/0xdf
Sep 30 07:10:08 Tower kernel: ? do_filp_open+0x4c/0xa9
Sep 30 07:10:08 Tower kernel: ? _cond_resched+0x1b/0x1e
Sep 30 07:10:08 Tower kernel: ? getname_flags+0x24/0x146
Sep 30 07:10:08 Tower kernel: ? kmem_cache_alloc+0x108/0x130
Sep 30 07:10:08 Tower kernel: ? do_sys_openat2+0x6f/0xec
Sep 30 07:10:08 Tower kernel: ? do_sys_open+0x35/0x4f
Sep 30 07:10:08 Tower kernel: ? do_syscall_64+0x5d/0x6a
Sep 30 07:10:08 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 30 07:10:08 Tower kernel: Modules linked in: xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd amd_energy wmi_bmof mxm_wmi kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper mpt3sas atlantic nvme ahci i2c_piix4 raid_class rapl i2c_core scsi_transport_sas input_leds ccp nvme_core libahci led_class k10temp wmi button acpi_cpufreq
Sep 30 07:10:08 Tower kernel: CR2: 00000000000000b1
Sep 30 07:10:08 Tower kernel: ---[ end trace d232d3a5b0583cf9 ]---
Sep 30 07:10:08 Tower kernel: RIP: 0010:_nv031699rm+0x79/0x940 [nvidia]
Sep 30 07:10:08 Tower kernel: Code: 07 00 00 41 bf 01 00 00 00 4c 8d 65 48 31 db 44 89 7d 10 66 0f 1f 44 00 00 41 f6 c5 01 0f 84 90 00 00 00 49 8b 86 30 1a 00 00 <80> b8 b1 00 00 00 00 74 12 b8 01 00 00 00 89 d9 d3 e0 41 85 86 94
Sep 30 07:10:08 Tower kernel: RSP: 0018:ffffc9000303b978 EFLAGS: 00010202
Sep 30 07:10:08 Tower kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
Sep 30 07:10:08 Tower kernel: RDX: ffff88824b6f0008 RSI: ffff88817b692008 RDI: ffff888198f88008
Sep 30 07:10:08 Tower kernel: RBP: ffff8884806ddd80 R08: 0000000000000002 R09: 0000000000000020
Sep 30 07:10:08 Tower kernel: R10: 0000000000000002 R11: 0000000000000002 R12: ffff8884806dddc8
Sep 30 07:10:08 Tower kernel: R13: 0000000000000003 R14: ffff88817b692008 R15: 0000000000000001
Sep 30 07:10:08 Tower kernel: FS:  0000152f94ef2b80(0000) GS:ffff88bf3e000000(0000) knlGS:0000000000000000
Sep 30 07:10:08 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 30 07:10:08 Tower kernel: CR2: 00000000000000b1 CR3: 000000048e7d4000 CR4: 0000000000350ee0

 

this happens at random times. I have 3 GPUs 2x RTX Titans and 1x 2080Ti:

image.thumb.png.5a6b539ebe2a118a9218503da13a2336.png

 

 

I have 3 Docker Containers that utilize the GPUs. PhoenixMiner (uses all 3), Plex (2080Ti), Deepstack(Dual Titans). I thought maybe they couldnt all run together, so I stopped the others and just ran plex. system still crashed. right now they are all off, and Its still running, but time will tell.. Ill leave an update if it crashes again.

syslog.zip tower-diagnostics-20211009-0736.zip

Link to comment
1 hour ago, TheSkaz said:

I thought maybe they couldnt all run together, so I stopped the others and just ran plex.

How warm do the cards get, are they cooled enough for those kind of workloads?

 

1 hour ago, TheSkaz said:

just ran plex

What do you mean with that? A transcoding session? If yes on which card?

Maybe a card is broken but thats only a  vague guess...

 

Usually a null pointer reference is when the driver wants to access an unmapped memory region but this is the first time I've see this issue here...

Link to comment
10 hours ago, VladoPortos said:

Unfortunately I can't 100% confirm it's the encoder, it's my best guess. But there was also issue with network (switch was kind of broken) and after removing it there was no crash yet. But I run only one card. 

For sake of testing I just turned both of the cards to encode and will report if it crashes or not. 

 

So far the temps are like this: GTX 960 - cca 60 Degree, GTX 980 - cca 70 Degrees

 

 

 

temp.JPG

So I did not get crash as before, maybe because I caught it first but seems like it network related at least this was in log:

 

Oct  9 17:25:13 PlexServer kernel: ------------[ cut here ]------------
Oct  9 17:25:13 PlexServer kernel: WARNING: CPU: 4 PID: 78054 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
Oct  9 17:25:13 PlexServer kernel: Modules linked in: xt_mark nvidia_uvm(PO) xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter xfs nfsd lockd grace sunrpc md_mod i915 video iosf_mbi i2c_algo_bit intel_gtt nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart it87 hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables wmi_bmof mxm_wmi edac_mce_amd amd_energy kvm_amd btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul crc32_pclmul crc32c_intel mpt3sas ghash_clmulni_intel aesni_intel crypto_simd
Oct  9 17:25:13 PlexServer kernel: ecdh_generic ecc cryptd nvme i2c_piix4 glue_helper atlantic ahci nvme_core libahci raid_class scsi_transport_sas i2c_core rapl ccp wmi k10temp thermal button acpi_cpufreq
Oct  9 17:25:13 PlexServer kernel: CPU: 4 PID: 78054 Comm: kworker/4:0 Tainted: P           O      5.10.28-Unraid #1
Oct  9 17:25:13 PlexServer kernel: Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS MASTER/TRX40 AORUS MASTER, BIOS F5q 04/12/2021
Oct  9 17:25:13 PlexServer kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Oct  9 17:25:13 PlexServer kernel: RIP: 0010:__nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
Oct  9 17:25:13 PlexServer kernel: Code: e8 dc f8 ff ff 44 89 fa 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 36 f6 ff ff 84 c0 75 bb 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 6d f3 ff ff e8 35 f5 ff ff e9 22 01
Oct  9 17:25:13 PlexServer kernel: RSP: 0018:ffffc9000037cdd8 EFLAGS: 00010202
Oct  9 17:25:13 PlexServer kernel: RAX: 0000000000000188 RBX: 000000000000d3ca RCX: 00000000fe806454
Oct  9 17:25:13 PlexServer kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa00dc128
Oct  9 17:25:13 PlexServer kernel: RBP: ffff888de1169900 R08: 00000000c01ab324 R09: ffff8883ef80aba0
Oct  9 17:25:13 PlexServer kernel: R10: 0000000000000098 R11: ffff8882e0cea100 R12: 0000000000001cca
Oct  9 17:25:13 PlexServer kernel: R13: ffffffff8210b440 R14: 000000000000d3ca R15: 0000000000000000
Oct  9 17:25:13 PlexServer kernel: FS:  0000000000000000(0000) GS:ffff88903d100000(0000) knlGS:0000000000000000
Oct  9 17:25:13 PlexServer kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  9 17:25:13 PlexServer kernel: CR2: 0000000000527fc8 CR3: 00000001645d2000 CR4: 0000000000350ee0
Oct  9 17:25:13 PlexServer kernel: Call Trace:
Oct  9 17:25:13 PlexServer kernel: <IRQ>
Oct  9 17:25:13 PlexServer kernel: nf_conntrack_confirm+0x2f/0x36 [nf_conntrack]
Oct  9 17:25:13 PlexServer kernel: nf_hook_slow+0x39/0x8e
Oct  9 17:25:13 PlexServer kernel: nf_hook.constprop.0+0xb1/0xd8
Oct  9 17:25:13 PlexServer kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe
Oct  9 17:25:13 PlexServer kernel: ip_local_deliver+0x49/0x75
Oct  9 17:25:13 PlexServer kernel: __netif_receive_skb_one_core+0x74/0x95
Oct  9 17:25:13 PlexServer kernel: process_backlog+0xa3/0x13b
Oct  9 17:25:13 PlexServer kernel: net_rx_action+0xf4/0x29d
Oct  9 17:25:13 PlexServer kernel: __do_softirq+0xc4/0x1c2
Oct  9 17:25:13 PlexServer kernel: asm_call_irq_on_stack+0x12/0x20
Oct  9 17:25:13 PlexServer kernel: </IRQ>
Oct  9 17:25:13 PlexServer kernel: do_softirq_own_stack+0x2c/0x39
Oct  9 17:25:13 PlexServer kernel: do_softirq+0x3a/0x44
Oct  9 17:25:13 PlexServer kernel: netif_rx_ni+0x1c/0x22
Oct  9 17:25:13 PlexServer kernel: macvlan_broadcast+0x10e/0x13c [macvlan]
Oct  9 17:25:13 PlexServer kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan]
Oct  9 17:25:13 PlexServer kernel: process_one_work+0x13c/0x1d5
Oct  9 17:25:13 PlexServer kernel: worker_thread+0x18b/0x22f
Oct  9 17:25:13 PlexServer kernel: ? process_scheduled_works+0x27/0x27
Oct  9 17:25:13 PlexServer kernel: kthread+0xe5/0xea
Oct  9 17:25:13 PlexServer kernel: ? __kthread_bind_mask+0x57/0x57
Oct  9 17:25:13 PlexServer kernel: ret_from_fork+0x22/0x30
Oct  9 17:25:13 PlexServer kernel: ---[ end trace 33448d9f3f916301 ]---

 

So I disabled both NW interfaces in BIOS, added Intel Pro/1000 dual port PCI card that was working for me in another server (not unraid) for 2 years without hiccup... and another round of testing is a go... 

 

  • Like 1
Link to comment
3 hours ago, ich777 said:

How warm do the cards get, are they cooled enough for those kind of workloads?

 

What do you mean with that? A transcoding session? If yes on which card?

Maybe a card is broken but thats only a  vague guess...

 

Usually a null pointer reference is when the driver wants to access an unmapped memory region but this is the first time I've see this issue here...

 they are watercooled and dont go north of 52-4c.  I ran plex, no transcode, no nothing. i navigated to a show I wanted to watch, and then it crashed. Plex is tied to the 2080 Ti. the server is still going strong... without any of the nvidia based dockers running. normally it would have crashed by now. if its a memory map issue, could it be due to a memory size difference between the first and 2nd card? the fact that I have 3 cards? the fact that plex is pointing to the smaller one?

assuming that the thermals arent an issue, should this be able to run multiple docker containers pointing to the same GPU?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.