Jump to content

nVidia GTX970 - kernel: NVRM: GPU 0000:0b:00.0: request_irq() failed (-22) error


Recommended Posts

Just migrated my unRAID server from an Intel based 8700K to an AMD based 5900x + nVidia GTX 970.

No issues with the actual migration (I disabled docker and vm manager and also disabled auto start) and then just re-enabled everything afterwards.

The only thing that is "odd" is I keep getting this error "...kernel: NVRM: GPU 0000:0b:00.0: request_irq() failed (-22) error"

I'm assuming it's some kind of IRQ error (an interupt?) but I have no idea.

The GTX970 and its corresponding HDMI audio component are in their own dedicated IOMMU so I don't believe that is the issue (but that is pure speculation).

 

In the nVidia driver package (under settings) I have Production Branch: v535.129.03 selected but on the left hand side of that same window under "Installed GPU(s): it says "No devices were found"????

Another thing I noticed was that when the server reboots, it seems to install the nVidia driver every time? Is this how it's supposed to happen? From my other "Linux experiences" the install only happens once and the driver is just loaded on restart/power on etc? This does not appear to be the case?

Any idea what is happening? I've linked a diagnostics file if it helps at all?

Thanks in advance.

 

galactica-diagnostics-20231119-1512.zip

Link to comment
2 hours ago, thaoggamer said:

Any idea what is happening? I've linked a diagnostics file if it helps at all?

Your Kernel crashed most certainly because of a macvlan issue:

Nov 20 01:18:23 GALACTICA kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Nov 20 01:18:23 GALACTICA kernel: #PF: supervisor read access in kernel mode
Nov 20 01:18:23 GALACTICA kernel: #PF: error_code(0x0000) - not-present page
Nov 20 01:18:23 GALACTICA kernel: PGD 15c5ba067 P4D 15c5ba067 PUD 1ccb89067 PMD 0 
Nov 20 01:18:23 GALACTICA kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Nov 20 01:18:23 GALACTICA kernel: CPU: 5 PID: 18431 Comm: sh Tainted: P           O       6.1.49-Unraid #1
Nov 20 01:18:23 GALACTICA kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F37f 08/09/2023
Nov 20 01:18:23 GALACTICA kernel: RIP: 0010:_compound_head+0x0/0x36
Nov 20 01:18:23 GALACTICA kernel: Code: 8b 95 c0 01 00 00 48 8b 85 58 02 00 00 48 39 c2 73 0c 48 29 d0 48 39 c3 48 0f 47 d8 eb 02 31 db 48 89 d8 5b 5d e9 b6 1d 9d 00 <48> 8b 57 08 48 89 f8 f6 c2 01 75 21 66 90 e9 a3 1d 9d 00 f7 c7 ff
Nov 20 01:18:23 GALACTICA kernel: RSP: 0018:ffffc90001a5fd50 EFLAGS: 00010046
Nov 20 01:18:23 GALACTICA kernel: RAX: 0000000000000246 RBX: 0000000000000004 RCX: 0000000000000000
Nov 20 01:18:23 GALACTICA kernel: RDX: 0000000000000004 RSI: 0000000000000025 RDI: 0000000000000000
Nov 20 01:18:23 GALACTICA kernel: RBP: 0000000000000025 R08: 00000000ffffffff R09: 0000000000000000
Nov 20 01:18:23 GALACTICA kernel: R10: 000014d588cfda10 R11: 0000000000000000 R12: 0000000000000000
Nov 20 01:18:23 GALACTICA kernel: R13: 0000000000000246 R14: 0000000000000000 R15: ffff8881917e1000
Nov 20 01:18:23 GALACTICA kernel: FS:  000014d588cfd740(0000) GS:ffff8893ee940000(0000) knlGS:0000000000000000
Nov 20 01:18:23 GALACTICA kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 20 01:18:23 GALACTICA kernel: CR2: 0000000000000008 CR3: 00000001cafb8000 CR4: 0000000000750ee0
Nov 20 01:18:23 GALACTICA kernel: PKRU: 55555554
Nov 20 01:18:23 GALACTICA kernel: Call Trace:
Nov 20 01:18:23 GALACTICA kernel: <TASK>
Nov 20 01:18:23 GALACTICA kernel: ? __die_body+0x1a/0x5c
Nov 20 01:18:23 GALACTICA kernel: ? page_fault_oops+0x329/0x376
Nov 20 01:18:23 GALACTICA kernel: ? do_user_addr_fault+0x12e/0x48d
Nov 20 01:18:23 GALACTICA kernel: ? exc_page_fault+0xfb/0x11d
Nov 20 01:18:23 GALACTICA kernel: ? asm_exc_page_fault+0x22/0x30
Nov 20 01:18:23 GALACTICA kernel: ? mem_cgroup_margin+0x59/0x59
Nov 20 01:18:23 GALACTICA kernel: __mod_lruvec_page_state+0x17/0x76
Nov 20 01:18:23 GALACTICA kernel: account_kernel_stack.isra.0+0x3b/0x5d
Nov 20 01:18:23 GALACTICA kernel: copy_process+0x2ed/0x16aa
Nov 20 01:18:23 GALACTICA kernel: ? kmem_cache_alloc+0x122/0x14d
Nov 20 01:18:23 GALACTICA kernel: kernel_clone+0xa5/0x2aa
Nov 20 01:18:23 GALACTICA kernel: ? alloc_file+0x8f/0x13f
Nov 20 01:18:23 GALACTICA kernel: __do_sys_clone+0x65/0x8b
Nov 20 01:18:23 GALACTICA kernel: do_syscall_64+0x6b/0x81
Nov 20 01:18:23 GALACTICA kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Nov 20 01:18:23 GALACTICA kernel: RIP: 0033:0x14d588ddb383
Nov 20 01:18:23 GALACTICA kernel: Code: 00 e8 91 75 f5 ff 90 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 89 c2 85 c0 75 2c 64 48 8b 04 25 10 00 00
Nov 20 01:18:23 GALACTICA kernel: RSP: 002b:00007ffe80fafb48 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Nov 20 01:18:23 GALACTICA kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014d588ddb383
Nov 20 01:18:23 GALACTICA kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Nov 20 01:18:23 GALACTICA kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Nov 20 01:18:23 GALACTICA kernel: R10: 000014d588cfda10 R11: 0000000000000246 R12: 0000000000000001
Nov 20 01:18:23 GALACTICA kernel: R13: 00007ffe80fafd70 R14: 00000000004eebb7 R15: 00000000ffffffff
Nov 20 01:18:23 GALACTICA kernel: </TASK>
Nov 20 01:18:23 GALACTICA kernel: Modules linked in: nvidia_uvm(PO) udp_diag xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle xt_nat xt_tcpudp vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls r8169 realtek nvidia_drm(PO) nvidia_modeset(PO) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi nvidia(PO) kvm_amd video kvm drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 joydev btusb input_leds btrtl aesni_intel backlight btbcm crypto_simd btintel cryptd mxm_wmi gigabyte_wmi wmi_bmof bluetooth
Nov 20 01:18:23 GALACTICA kernel: i2c_piix4 syscopyarea mpt3sas rapl sysfillrect k10temp i2c_core nvme hid_apple raid_class sysimgblt ecdh_generic ccp scsi_transport_sas ahci led_class fb_sys_fops ecc nvme_core libahci thermal wmi button acpi_cpufreq unix [last unloaded: realtek]
Nov 20 01:18:23 GALACTICA kernel: CR2: 0000000000000008

 

You should really change you Docker network to IPVLAN or change it to MACVLAN and disable Bridging in the network settings.

 

 

 

Please lets continue the conversation about the Nvidia driver by posting in the appropriate support thread (you can find this by going to your plugin page and clicking on Support Thread at the plugin):

 

Also please update your BIOS in the first place, enable Above 4G Decoding and Resizable BAR Support in your BIOS.

Since this is an AMD system please also try to disable C-States in your BIOS.

 

I'll lock this thread.

Link to comment
  • ich777 locked this topic
Guest
This topic is now closed to further replies.
×
×
  • Create New...