Crashes after attempting GPU pass through.


louij2
Go to solution Solved by ghost82,

Recommended Posts

You had a kernel panic from your syslog:

Dec 29 20:54:07 Tower emhttpd: shcmd (110): /etc/rc.d/rc.flash_backup start
Dec 29 20:54:08 Tower kernel: ------------[ cut here ]------------
Dec 29 20:54:08 Tower kernel: WARNING: CPU: 3 PID: 17032 at kernel/printk/printk_ringbuffer.c:1232 get_data+0x9f/0xd0
Dec 29 20:54:08 Tower kernel: Modules linked in: xt_mark xt_comment xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap veth macvlan xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd lockd grace sunrpc md_mod amdgpu gpu_sched i2c_algo_bit drm_kms_helper ttm drm backlight agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas aesni_intel crypto_simd nvme wmi_bmof cryptd raid_class r8169 nvme_core scsi_transport_sas i2c_piix4 glue_helper i2c_core rapl k10temp ahci wmi realtek ccp libahci thermal button acpi_cpufreq
Dec 29 20:54:08 Tower kernel: CPU: 3 PID: 17032 Comm: dmesg Not tainted 5.10.28-Unraid #1
Dec 29 20:54:08 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F50 11/27/2019
Dec 29 20:54:08 Tower kernel: RIP: 0010:get_data+0x9f/0xd0
Dec 29 20:54:08 Tower kernel: Code: 39 c2 75 26 48 8b 47 08 48 83 cf ff 48 d3 e7 f7 d7 44 21 cf 89 3a 48 8b 3e 48 8d 4f 07 48 83 e1 f8 48 39 cf 74 08 0f 0b eb 2e <0f> 0b eb 2a 48 8b 76 08 48 8d 4e 07 48 83 e1 f8 48 39 ce 74 04 0f
Dec 29 20:54:08 Tower kernel: RSP: 0018:ffffc90009467db8 EFLAGS: 00010002
Dec 29 20:54:08 Tower kernel: RAX: 0000000000000000 RBX: ffffffff8249c600 RCX: 0000000000000012
Dec 29 20:54:08 Tower kernel: RDX: ffffc90009467df0 RSI: ffffc90009467e00 RDI: ffffffff8249c628
Dec 29 20:54:08 Tower kernel: RBP: ffffc90009467e50 R08: fffffffffffc0cc8 R09: ffffffffffbc0d08
Dec 29 20:54:08 Tower kernel: R10: 00003fffffffffef R11: ffffc90009467d98 R12: ffff8881f4e6c0a8
Dec 29 20:54:08 Tower kernel: R13: 00000000df24e700 R14: 0000000000000036 R15: 0000000000002000
Dec 29 20:54:08 Tower kernel: FS:  0000152360cfbb80(0000) GS:ffff8887fe8c0000(0000) knlGS:0000000000000000
Dec 29 20:54:08 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 29 20:54:08 Tower kernel: CR2: 00000000004152f8 CR3: 00000001f241c000 CR4: 00000000003506e0
Dec 29 20:54:08 Tower kernel: Call Trace:
Dec 29 20:54:08 Tower kernel: _prb_read_valid+0x143/0x22e
Dec 29 20:54:08 Tower kernel: prb_read_valid+0xf/0x11
Dec 29 20:54:08 Tower kernel: devkmsg_read+0x86/0x25d
Dec 29 20:54:08 Tower kernel: vfs_read+0xa0/0xff
Dec 29 20:54:08 Tower kernel: ksys_read+0x71/0xba
Dec 29 20:54:08 Tower kernel: do_syscall_64+0x5d/0x6a
Dec 29 20:54:08 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 29 20:54:08 Tower kernel: RIP: 0033:0x152360e2578e
Dec 29 20:54:08 Tower kernel: Code: c0 e9 f6 fe ff ff 50 48 8d 3d 16 5e 0a 00 e8 f9 fd 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
Dec 29 20:54:08 Tower kernel: RSP: 002b:00007fffc86baf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Dec 29 20:54:08 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000411448 RCX: 0000152360e2578e
Dec 29 20:54:08 Tower kernel: RDX: 0000000000001fff RSI: 0000000000411448 RDI: 0000000000000003
Dec 29 20:54:08 Tower kernel: RBP: 000000000040d8b8 R08: 000000000000000a R09: 0000152360cfbb00
Dec 29 20:54:08 Tower kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000030
Dec 29 20:54:08 Tower kernel: R13: 000000000040c397 R14: 0000000000411448 R15: 0000000000411454
Dec 29 20:54:08 Tower kernel: ---[ end trace a2571ac05a71a072 ]---

 

I think this has nothing to do with gpu passthrough.

As you can see the kernel panic happened right after a start of a rclone backup.

Can you specify when you have such kernel panics? Is it totally random or can you track something in common in your syslogs?

Edited by ghost82
Link to comment
3 hours ago, ghost82 said:

You had a kernel panic from your syslog:

Dec 29 20:54:07 Tower emhttpd: shcmd (110): /etc/rc.d/rc.flash_backup start
Dec 29 20:54:08 Tower kernel: ------------[ cut here ]------------
Dec 29 20:54:08 Tower kernel: WARNING: CPU: 3 PID: 17032 at kernel/printk/printk_ringbuffer.c:1232 get_data+0x9f/0xd0
Dec 29 20:54:08 Tower kernel: Modules linked in: xt_mark xt_comment xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap veth macvlan xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd lockd grace sunrpc md_mod amdgpu gpu_sched i2c_algo_bit drm_kms_helper ttm drm backlight agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas aesni_intel crypto_simd nvme wmi_bmof cryptd raid_class r8169 nvme_core scsi_transport_sas i2c_piix4 glue_helper i2c_core rapl k10temp ahci wmi realtek ccp libahci thermal button acpi_cpufreq
Dec 29 20:54:08 Tower kernel: CPU: 3 PID: 17032 Comm: dmesg Not tainted 5.10.28-Unraid #1
Dec 29 20:54:08 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F50 11/27/2019
Dec 29 20:54:08 Tower kernel: RIP: 0010:get_data+0x9f/0xd0
Dec 29 20:54:08 Tower kernel: Code: 39 c2 75 26 48 8b 47 08 48 83 cf ff 48 d3 e7 f7 d7 44 21 cf 89 3a 48 8b 3e 48 8d 4f 07 48 83 e1 f8 48 39 cf 74 08 0f 0b eb 2e <0f> 0b eb 2a 48 8b 76 08 48 8d 4e 07 48 83 e1 f8 48 39 ce 74 04 0f
Dec 29 20:54:08 Tower kernel: RSP: 0018:ffffc90009467db8 EFLAGS: 00010002
Dec 29 20:54:08 Tower kernel: RAX: 0000000000000000 RBX: ffffffff8249c600 RCX: 0000000000000012
Dec 29 20:54:08 Tower kernel: RDX: ffffc90009467df0 RSI: ffffc90009467e00 RDI: ffffffff8249c628
Dec 29 20:54:08 Tower kernel: RBP: ffffc90009467e50 R08: fffffffffffc0cc8 R09: ffffffffffbc0d08
Dec 29 20:54:08 Tower kernel: R10: 00003fffffffffef R11: ffffc90009467d98 R12: ffff8881f4e6c0a8
Dec 29 20:54:08 Tower kernel: R13: 00000000df24e700 R14: 0000000000000036 R15: 0000000000002000
Dec 29 20:54:08 Tower kernel: FS:  0000152360cfbb80(0000) GS:ffff8887fe8c0000(0000) knlGS:0000000000000000
Dec 29 20:54:08 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 29 20:54:08 Tower kernel: CR2: 00000000004152f8 CR3: 00000001f241c000 CR4: 00000000003506e0
Dec 29 20:54:08 Tower kernel: Call Trace:
Dec 29 20:54:08 Tower kernel: _prb_read_valid+0x143/0x22e
Dec 29 20:54:08 Tower kernel: prb_read_valid+0xf/0x11
Dec 29 20:54:08 Tower kernel: devkmsg_read+0x86/0x25d
Dec 29 20:54:08 Tower kernel: vfs_read+0xa0/0xff
Dec 29 20:54:08 Tower kernel: ksys_read+0x71/0xba
Dec 29 20:54:08 Tower kernel: do_syscall_64+0x5d/0x6a
Dec 29 20:54:08 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 29 20:54:08 Tower kernel: RIP: 0033:0x152360e2578e
Dec 29 20:54:08 Tower kernel: Code: c0 e9 f6 fe ff ff 50 48 8d 3d 16 5e 0a 00 e8 f9 fd 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
Dec 29 20:54:08 Tower kernel: RSP: 002b:00007fffc86baf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Dec 29 20:54:08 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000411448 RCX: 0000152360e2578e
Dec 29 20:54:08 Tower kernel: RDX: 0000000000001fff RSI: 0000000000411448 RDI: 0000000000000003
Dec 29 20:54:08 Tower kernel: RBP: 000000000040d8b8 R08: 000000000000000a R09: 0000152360cfbb00
Dec 29 20:54:08 Tower kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000030
Dec 29 20:54:08 Tower kernel: R13: 000000000040c397 R14: 0000000000411448 R15: 0000000000411454
Dec 29 20:54:08 Tower kernel: ---[ end trace a2571ac05a71a072 ]---

 

I think this has nothing to do with gpu passthrough.

As you can see the kernel panic happened right before a start of a rclone backup.

Can you specify when you have such kernel panics? Is it totally random or can you track something in common in your syslogs?

I just tried to restore some files with File History off the NAS and it had another Kernel Panic

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.