[6.8.3] Unraid server crashes with kernel oops and becomes unresponsive


micahmo

Recommended Posts

Hi all,

 

I've been experiencing an issue with my server where it crashes every few days or so. I've been trying to track down the logs for the issue, but it's been crashing before the logs get synced to the syslog server (and of course, once I reboot, the logs reset). I finally got it working by catting the logs to a file locally on the box. When this occurs, the system is completely unresponsive, even to local input (mouse/keyboard).

 

Below is the kernel oops that I am getting. (I've also attached the full diagnostics, of course. :))

 

Oct 19 17:40:23 ywserv kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000002c
Oct 19 17:40:23 ywserv kernel: PGD 0 P4D 0 
Oct 19 17:40:23 ywserv kernel: Oops: 0000 [#1] SMP PTI
Oct 19 17:40:23 ywserv kernel: CPU: 4 PID: 5189 Comm: qemu-system-x86 Not tainted 4.19.107-Unraid #1
Oct 19 17:40:23 ywserv kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./Intel X79, BIOS X79DT00A 01/13/2020
Oct 19 17:40:23 ywserv kernel: RIP: 0010:do_sys_poll+0x1f4/0x437
Oct 19 17:40:23 ywserv kernel: Code: 89 c0 b8 20 00 00 00 4c 89 c7 48 83 e7 fc 74 75 41 8b 45 04 25 ff 27 00 00 83 c8 18 89 44 24 34 44 09 f0 89 84 24 68 01 00 00 <48> 8b 47 28 4c 8b 48 40 b8 45 01 00 00 4d 85 c9 74 21 4c 89 44 24
Oct 19 17:40:23 ywserv kernel: RSP: 0018:ffffc90007463ab0 EFLAGS: 00010202
Oct 19 17:40:23 ywserv kernel: RAX: 0000000000000019 RBX: ffffc90007463ee0 RCX: ffff88904865aec0
Oct 19 17:40:23 ywserv kernel: RDX: 0000000000000004 RSI: 0000000000004000 RDI: 0000000000000004
Oct 19 17:40:23 ywserv kernel: RBP: ffffc90007463ec0 R08: 0000000000000005 R09: 0000000000000038
Oct 19 17:40:23 ywserv kernel: R10: 00000000002bae14 R11: ffff8890457bda00 R12: 0000000000000000
Oct 19 17:40:23 ywserv kernel: R13: ffff888e86259414 R14: 0000000000000000 R15: ffff888e86259400
Oct 19 17:40:23 ywserv kernel: FS:  000014bb5b119e00(0000) GS:ffff88904f900000(0000) knlGS:0000000000000000
Oct 19 17:40:23 ywserv kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 19 17:40:23 ywserv kernel: CR2: 000000000000002c CR3: 0000000ff2244003 CR4: 00000000000626e0
Oct 19 17:40:23 ywserv kernel: Call Trace:
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: ? compat_poll_select_copy_remaining+0x11b/0x11b
Oct 19 17:40:23 ywserv kernel: __se_sys_ppoll+0xc5/0x159
Oct 19 17:40:23 ywserv kernel: do_syscall_64+0x57/0xf2
Oct 19 17:40:23 ywserv kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 19 17:40:23 ywserv kernel: RIP: 0033:0x14bb5c905f76
Oct 19 17:40:23 ywserv kernel: Code: 7c 24 08 e8 bc 4f f8 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 cf 89 44 24 08 e8 e6 4f f8 ff 8b 44
Oct 19 17:40:23 ywserv kernel: RSP: 002b:00007ffdcc35fd90 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
Oct 19 17:40:23 ywserv kernel: RAX: ffffffffffffffda RBX: 000014bb5ad635c0 RCX: 000014bb5c905f76
Oct 19 17:40:23 ywserv kernel: RDX: 00007ffdcc35fdb0 RSI: 0000000000000049 RDI: 000014bb5ad1f800
Oct 19 17:40:23 ywserv kernel: RBP: 00007ffdcc35fe10 R08: 0000000000000008 R09: 0000000000000000
Oct 19 17:40:23 ywserv kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdcc35fdb0
Oct 19 17:40:23 ywserv kernel: R13: 000014bb5ad635c0 R14: 00007ffdcc35fe0c R15: 000055d93cb3de90
Oct 19 17:40:23 ywserv kernel: Modules linked in: wireguard ip6_udp_tunnel udp_tunnel ccp xt_nat xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap veth ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod it87 hwmon_vid bonding sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper r8169 i2c_i801 i2c_core intel_cstate intel_uncore ahci libahci pcc_cpufreq intel_rapl_perf realtek button
Oct 19 17:40:23 ywserv kernel: CR2: 000000000000002c
Oct 19 17:40:23 ywserv kernel: ---[ end trace 8fe03567665ae792 ]---
Oct 19 17:40:23 ywserv kernel: RIP: 0010:do_sys_poll+0x1f4/0x437
Oct 19 17:40:23 ywserv kernel: Code: 89 c0 b8 20 00 00 00 4c 89 c7 48 83 e7 fc 74 75 41 8b 45 04 25 ff 27 00 00 83 c8 18 89 44 24 34 44 09 f0 89 84 24 68 01 00 00 <48> 8b 47 28 4c 8b 48 40 b8 45 01 00 00 4d 85 c9 74 21 4c 89 44 24
Oct 19 17:40:23 ywserv kernel: RSP: 0018:ffffc90007463ab0 EFLAGS: 00010202
Oct 19 17:40:23 ywserv kernel: RAX: 0000000000000019 RBX: ffffc90007463ee0 RCX: ffff88904865aec0
Oct 19 17:40:23 ywserv kernel: RDX: 0000000000000004 RSI: 0000000000004000 RDI: 0000000000000004
Oct 19 17:40:23 ywserv kernel: RBP: ffffc90007463ec0 R08: 0000000000000005 R09: 0000000000000038
Oct 19 17:40:23 ywserv kernel: R10: 00000000002bae14 R11: ffff8890457bda00 R12: 0000000000000000
Oct 19 17:40:23 ywserv kernel: R13: ffff888e86259414 R14: 0000000000000000 R15: ffff888e86259400
Oct 19 17:40:23 ywserv kernel: FS:  000014bb5b119e00(0000) GS:ffff88904f900000(0000) knlGS:0000000000000000
Oct 19 17:40:23 ywserv kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 19 17:40:23 ywserv kernel: CR2: 000000000000002c CR3: 0000000ff2244003 CR4: 00000000000626e0

 

Did some searching, but it seems this kind of kernel oops is somewhat generic, and I couldn't find anything specific to Unraid. 

 

Any help is much appreciated!

 

EDIT: I should note that the problem only occurs when there is "no" activity on the system. I'm not connected remotely in any way, there are no downloads, no tasks like mover or parity check, etc.

ywserv-diagnostics-20201019-2135.zip

Edited by micahmo
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.