Jump to content

Random crash of my intel uhd630 iGPU


sorg
Go to solution Solved by sorg,

Recommended Posts

From time to time, my uhd630 crash, and it "disapear" from lspci and becomes unavailable for my dockers. 

I found this in the logs:

 

Sep 19 14:47:57 unraid kernel: ------------[ cut here ]------------
Sep 19 14:47:57 unraid kernel: pci 0000:00:02.0: pm_runtime_get_sync() failed: -13
Sep 19 14:47:57 unraid kernel: WARNING: CPU: 10 PID: 20545 at drivers/gpu/drm/i915/intel_runtime_pm.c:358 __intel_runtime_pm_get+0x62/0x7e [i915]
Sep 19 14:47:57 unraid kernel: Modules linked in: ipvlan af_packet bluetooth ecdh_generic ecc xt_connmark xt_mark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat xt_addrtype br_netfilter xfs xt_MASQUERADE ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc intel_rapl_msr i915 mei_hdcp mei_pxp intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iosf_mbi drm_buddy i2c_algo_bit ttm
Sep 19 14:47:57 unraid kernel: drm_display_helper wmi_bmof kvm drm_kms_helper intel_wmi_thunderbolt mxm_wmi drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd intel_gtt i2c_i801 agpgart i2c_smbus rapl syscopyarea intel_cstate intel_uncore nvme i2c_core ch341 sysfillrect apex(O) mei_me r8168(O) ahci gasket(O) nvme_core video sysimgblt mei libahci usbserial fb_sys_fops thermal fan wmi backlight intel_pmc_core acpi_tad acpi_pad button unix
Sep 19 14:47:57 unraid kernel: CPU: 10 PID: 20545 Comm: ffmpeg Tainted: P           O       6.1.49-Unraid #1
Sep 19 14:47:57 unraid kernel: Hardware name: ASUS System Product Name/PRIME Z490-P, BIOS 1602 01/14/2021
Sep 19 14:47:57 unraid kernel: RIP: 0010:__intel_runtime_pm_get+0x62/0x7e [i915]
Sep 19 14:47:57 unraid kernel: Code: f4 1e 00 01 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 34 53 dc e0 44 89 e1 4c 89 ea 48 c7 c7 d6 4e 92 a0 48 89 c6 e8 85 cf 89 e0 <0f> 0b 40 0f b6 f5 48 89 df e8 6c ff ff ff 83 c8 ff 5b 5d 41 5c 41
Sep 19 14:47:57 unraid kernel: RSP: 0018:ffffc9000713fcc0 EFLAGS: 00010282
Sep 19 14:47:57 unraid kernel: RAX: 0000000000000000 RBX: ffff888136c4a328 RCX: 0000000000000027
Sep 19 14:47:57 unraid kernel: RDX: 0000000000000002 RSI: ffffffff820ed4af RDI: 00000000ffffffff
Sep 19 14:47:57 unraid kernel: RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff82245ed0
Sep 19 14:47:57 unraid kernel: R10: 0000000000000013 R11: ffffffff82958563 R12: 00000000fffffff3
Sep 19 14:47:57 unraid kernel: R13: ffff8881018cc100 R14: ffff8882d21b5600 R15: 0000000000000000
Sep 19 14:47:57 unraid kernel: FS:  0000000000000000(0000) GS:ffff88885f480000(0000) knlGS:0000000000000000
Sep 19 14:47:57 unraid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 19 14:47:57 unraid kernel: CR2: 000000c003d6303f CR3: 0000000579024002 CR4: 00000000003706e0
Sep 19 14:47:57 unraid kernel: Call Trace:
Sep 19 14:47:57 unraid kernel: <TASK>
Sep 19 14:47:57 unraid kernel: ? __warn+0xab/0x122
Sep 19 14:47:57 unraid kernel: ? report_bug+0x109/0x17e
Sep 19 14:47:57 unraid kernel: ? __intel_runtime_pm_get+0x62/0x7e [i915]
Sep 19 14:47:57 unraid kernel: ? handle_bug+0x41/0x6f
Sep 19 14:47:57 unraid kernel: ? exc_invalid_op+0x13/0x60
Sep 19 14:47:57 unraid kernel: ? asm_exc_invalid_op+0x16/0x20
Sep 19 14:47:57 unraid kernel: ? __intel_runtime_pm_get+0x62/0x7e [i915]
Sep 19 14:47:57 unraid kernel: i915_driver_release+0x22/0x71 [i915]
Sep 19 14:47:57 unraid kernel: drm_dev_put+0x31/0x62 [drm]
Sep 19 14:47:57 unraid kernel: singleton_release+0x1f/0x26 [i915]
Sep 19 14:47:57 unraid kernel: __fput+0xff/0x1d2
Sep 19 14:47:57 unraid kernel: task_work_run+0x68/0x80
Sep 19 14:47:57 unraid kernel: do_exit+0x3b4/0x923
Sep 19 14:47:57 unraid kernel: do_group_exit+0x7a/0x7a
Sep 19 14:47:57 unraid kernel: get_signal+0x622/0x65a
Sep 19 14:47:57 unraid kernel: arch_do_signal_or_restart+0x36/0x607
Sep 19 14:47:57 unraid kernel: ? do_futex+0xcd/0x143
Sep 19 14:47:57 unraid kernel: exit_to_user_mode_prepare+0x58/0x10d
Sep 19 14:47:57 unraid kernel: syscall_exit_to_user_mode+0x18/0x2c
Sep 19 14:47:57 unraid kernel: do_syscall_64+0x77/0x81
Sep 19 14:47:57 unraid kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Sep 19 14:47:57 unraid kernel: RIP: 0033:0x154752f73d36
Sep 19 14:47:57 unraid kernel: Code: Unable to access opcode bytes at 0x154752f73d0c.
Sep 19 14:47:57 unraid kernel: RSP: 002b:000015474f9fec10 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
Sep 19 14:47:57 unraid kernel: RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 0000154752f73d36
Sep 19 14:47:57 unraid kernel: RDX: 0000000000000000 RSI: 0000000000000189 RDI: 000055aeb71f14c0
Sep 19 14:47:57 unraid kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
Sep 19 14:47:57 unraid kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055aeb71f1528
Sep 19 14:47:57 unraid kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 000055aeb71f14c0
Sep 19 14:47:57 unraid kernel: </TASK>
Sep 19 14:47:57 unraid kernel: ---[ end trace 0000000000000000 ]---

 

The only solution after that is to reboot.

Do you have a pointer or something i should investigate ?

Link to comment
  • 4 months later...

Well i am still struggling with random crashes of my iGPU. 
It does not happen when i remove the discrete GPU (nvidia 3060Ti) from its PCIe socket.


The nvidia GPU is used only by my gaming VM. Its bound to VFIO and there is no nvidia driver loaded on unraid.

Link to comment
  • 4 weeks later...

Hello again,

 

I am still trying to find the factors involving this crash.

I have two containers using this iGPU: (container mapped with access to /dev/dri ). 

- Plex

- Frigate


Based on my tests made during the last days, i came to the conclusion that this crash only occur if Frigate is running. (from time to time, it happens as soon as i start my frigate container).

If plex is running alone, i have no crash.

If frigate is running alone or in parrallel to plex, the crash will happen sooner or later.

 

 

Link to comment
  • 1 month later...
  • Solution

Hello again,

 

To whom i may concern: I finally solved my issue.

 

I have resetted my bios to the defaults, and reactivated the minimum option i needed to use the Virtualization, and since then unraid and my UHD630 have been running smoothly.
My best bet is that i had an option activated in the bios that made the energy saving mode of the uhd630 unstable, but i have not been able the specific option that causes the issue.

 

I will now mark the issue has solved.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...