Craig Dennis Posted September 22, 2023 Share Posted September 22, 2023 I am having issues after my server has been running for 30 mins to an hour. The UI crashes completely and I get output to the screen showing `tainted` and other worrying things. I have been having issues with overheating so I'm wondering if I have damaged the CPU somehow. Services still appear to be running (e.g. I can still access Plex) but not the Unraid web UI. Can someone help me decipher the errors please? Attached are the diagnostics. Below is also the output from the syslog mirrored to the flash drive (full log from today attached as well). Sep 22 10:24:16 Sakaar kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: #PF: supervisor read access in kernel mode Sep 22 10:24:16 Sakaar kernel: #PF: error_code(0x0000) - not-present page Sep 22 10:24:16 Sakaar kernel: PGD 0 P4D 0 Sep 22 10:24:16 Sakaar kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Sep 22 10:24:16 Sakaar kernel: CPU: 5 PID: 23791 Comm: python3 Tainted: P U O 6.1.49-Unraid #1 Sep 22 10:24:16 Sakaar kernel: Hardware name: To Be Filled By O.E.M. Z590M Pro4/Z590M Pro4, BIOS P2.20 06/06/2022 Sep 22 10:24:16 Sakaar kernel: RIP: 0010:get_mmap_base+0xe/0x47 Sep 22 10:24:16 Sakaar kernel: Code: ff ff 48 8d 73 38 49 89 e8 4c 89 e1 48 8d 7b 30 5b 48 89 c2 5d 41 5c e9 5e fe ff ff 0f 1f 44 00 00 65 48 8b 14 25 c0 cb 01 00 <f6> 42 10 02 48 8b 82 f8 03 00 00 75 0d 85 ff 74 1f 48 8b 40 28 c3 Sep 22 10:24:16 Sakaar kernel: RSP: 0018:ffffc9002c64bd60 EFLAGS: 00010246 Sep 22 10:24:16 Sakaar kernel: RAX: 0000000000000000 RBX: 0000000000009000 RCX: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: RBP: 0000000000000000 R08: 0000000000000022 R09: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000022 Sep 22 10:24:16 Sakaar kernel: R13: 0000000000000000 R14: ffff8883fad30cc0 R15: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: FS: 00001523c6161b48(0000) GS:ffff88904f740000(0000) knlGS:0000000000000000 Sep 22 10:24:16 Sakaar kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 22 10:24:16 Sakaar kernel: CR2: 0000000000000000 CR3: 000000065009a005 CR4: 0000000000770ee0 Sep 22 10:24:16 Sakaar kernel: PKRU: 55555554 Sep 22 10:24:16 Sakaar kernel: Call Trace: Sep 22 10:24:16 Sakaar kernel: <TASK> Sep 22 10:24:16 Sakaar kernel: ? __die_body+0x1a/0x5c Sep 22 10:24:16 Sakaar kernel: ? page_fault_oops+0x329/0x376 Sep 22 10:24:16 Sakaar kernel: ? do_user_addr_fault+0x12e/0x48d Sep 22 10:24:16 Sakaar kernel: ? exc_page_fault+0xfb/0x11d Sep 22 10:24:16 Sakaar kernel: ? asm_exc_page_fault+0x22/0x30 Sep 22 10:24:16 Sakaar kernel: ? get_mmap_base+0xe/0x47 Sep 22 10:24:16 Sakaar kernel: arch_get_unmapped_area_topdown+0xdd/0x1b2 Sep 22 10:24:16 Sakaar kernel: ? preempt_latency_start+0x1e/0x46 Sep 22 10:24:16 Sakaar kernel: get_unmapped_area+0xc4/0x14f Sep 22 10:24:16 Sakaar kernel: do_mmap+0x110/0x428 Sep 22 10:24:16 Sakaar kernel: vm_mmap_pgoff+0xbb/0x112 Sep 22 10:24:16 Sakaar kernel: ksys_mmap_pgoff+0x138/0x166 Sep 22 10:24:16 Sakaar kernel: do_syscall_64+0x68/0x81 Sep 22 10:24:16 Sakaar kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce Sep 22 10:24:16 Sakaar kernel: RIP: 0033:0x1523c61001c2 Sep 22 10:24:16 Sakaar kernel: Code: f6 c1 10 74 0f 4c 89 4c 24 08 e8 d5 fc 01 00 4c 8b 4c 24 08 48 63 d5 4c 63 d3 4d 63 c6 b8 09 00 00 00 4c 89 e7 4c 89 ee 0f 05 <48> 89 c7 48 83 f8 ff 75 20 4d 85 e4 75 1b 83 e3 30 48 c7 c0 f4 ff Sep 22 10:24:16 Sakaar kernel: RSP: 002b:00007ffedcd4b860 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 Sep 22 10:24:16 Sakaar kernel: RAX: ffffffffffffffda RBX: 0000000000000022 RCX: 00001523c61001c2 Sep 22 10:24:16 Sakaar kernel: RDX: 0000000000000003 RSI: 0000000000009000 RDI: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: RBP: 0000000000000003 R08: ffffffffffffffff R09: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: R13: 0000000000009000 R14: 00000000ffffffff R15: 00001523c615fb00 Sep 22 10:24:16 Sakaar kernel: </TASK> Sep 22 10:24:16 Sakaar kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap ipvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet bridge 8021q garp mrp stp llc ixgbe xfrm_algo mdio e1000e zfs(PO) i915 zunicode(PO) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal zzstd(O) intel_powerclamp coretemp kvm_intel zlua(O) zavl(PO) kvm icp(PO) iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper mei_pxp mei_hdcp crct10dif_pclmul crc32_pclmul crc32c_intel zcommon(PO) drm ghash_clmulni_intel sha512_ssse3 Sep 22 10:24:16 Sakaar kernel: aesni_intel znvpair(PO) mei_me intel_gtt crypto_simd spl(O) cryptd wmi_bmof intel_cstate mpt3sas agpgart intel_uncore i2c_i801 nvme i2c_smbus raid_class sr_mod i2c_core mei ahci nvme_core scsi_transport_sas cdrom input_leds joydev led_class libahci syscopyarea sysfillrect sysimgblt fb_sys_fops video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: xfrm_algo] Sep 22 10:24:16 Sakaar kernel: CR2: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: ---[ end trace 0000000000000000 ]--- Sep 22 10:24:16 Sakaar kernel: RIP: 0010:get_mmap_base+0xe/0x47 Sep 22 10:24:16 Sakaar kernel: Code: ff ff 48 8d 73 38 49 89 e8 4c 89 e1 48 8d 7b 30 5b 48 89 c2 5d 41 5c e9 5e fe ff ff 0f 1f 44 00 00 65 48 8b 14 25 c0 cb 01 00 <f6> 42 10 02 48 8b 82 f8 03 00 00 75 0d 85 ff 74 1f 48 8b 40 28 c3 Sep 22 10:24:16 Sakaar kernel: RSP: 0018:ffffc9002c64bd60 EFLAGS: 00010246 Sep 22 10:24:16 Sakaar kernel: RAX: 0000000000000000 RBX: 0000000000009000 RCX: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: RBP: 0000000000000000 R08: 0000000000000022 R09: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000022 Sep 22 10:24:16 Sakaar kernel: R13: 0000000000000000 R14: ffff8883fad30cc0 R15: 0000000000000000 Sep 22 10:24:16 Sakaar kernel: FS: 00001523c6161b48(0000) GS:ffff88904f740000(0000) knlGS:0000000000000000 Sep 22 10:24:16 Sakaar kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 22 10:24:16 Sakaar kernel: CR2: 0000000000000000 CR3: 000000065009a005 CR4: 0000000000770ee0 Sep 22 10:24:16 Sakaar kernel: PKRU: 55555554 Sep 22 10:24:16 Sakaar kernel: note: python3[23791] exited with irqs disabled sakaar-diagnostics-20230922-1257.zip syslog Quote Link to comment
JorgeB Posted September 22, 2023 Share Posted September 22, 2023 Those call traces look more hardware related to me, start by running memtest Quote Link to comment
Rkpaxam Posted September 22, 2023 Share Posted September 22, 2023 ive had this issue with a faulty HDD and torrents overloaded the disk. only way i could shut down was pulling the plug and finding out which disk was faulty. Quote Link to comment
Craig Dennis Posted September 22, 2023 Author Share Posted September 22, 2023 4 minutes ago, Rkpaxam said: ive had this issue with a faulty HDD and torrents overloaded the disk. only way i could shut down was pulling the plug and finding out which disk was faulty. How did you identify the faulty disk? Mine are all showing as good. Quote Link to comment
Craig Dennis Posted September 22, 2023 Author Share Posted September 22, 2023 31 minutes ago, JorgeB said: Those call traces look more hardware related to me, start by running memtest I ran that a few weeks ago. I'll run it again. Could this also indicate CPU hardware fault? Quote Link to comment
JorgeB Posted September 22, 2023 Share Posted September 22, 2023 1 hour ago, Craig Dennis said: Could this also indicate CPU hardware fault? Could be, but that's rare, instead of memtest you can also remove two RAM sticks and run with just the other two, if the same try the other ones, that would basically rule out a RAM issue and/or some board stability problem with all RAM sockets in use, another thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Craig Dennis Posted September 22, 2023 Author Share Posted September 22, 2023 1 hour ago, JorgeB said: Could be, but that's rare, instead of memtest you can also remove two RAM sticks and run with just the other two, if the same try the other ones, that would basically rule out a RAM issue and/or some board stability problem with all RAM sockets in use, another thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. This is all too familiar. 1st pass memtest with no issues. I'll remove 2 RAM sticks. Thanks for the help (again). Quote Link to comment
Craig Dennis Posted September 23, 2023 Author Share Posted September 23, 2023 After removing two sticks of RAM I was still experiencing crashes. I replaced with the other two sticks and have been up for 24 hours. Memtest reported no errors. I also thought Netdata docker was causing issues due some errors in the log Sep 22 21:22:32 Sakaar kernel: netdata[23864]: segfault at 30 ip 0000149933ef42c0 sp 000014992f5af500 error 4 in ld-musl-x86_64.so.1[149933eb1000+4c000] likely on CPU 9 (core 1, socket 0) Sep 22 21:22:32 Sakaar kernel: Code: 29 45 31 c0 31 c9 31 d2 4c 89 e6 bf 0f 00 00 00 31 c0 e8 d1 80 fc ff 89 c3 85 c0 0f 84 83 00 00 00 e8 7a 4c fc ff 8b 18 eb 7a <8b> 4d 30 48 8d 5c 24 0e be 22 00 00 00 31 c0 48 8d 15 aa db 03 00 Re-enabled with the second sticks of RAM and no issues. Quote Link to comment
coppit Posted December 28, 2023 Share Posted December 28, 2023 @Craig Dennis did you resolve this? How? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.