May 17, 20242 yr Greetings, I've been using unraid for like 2 months and I've been experiencing crashes every couple of days. I've been reading all over what could be causing my issues but havent found anything and honestly I just restart and continue my life in pain. Well that lasted until 1 hour ago, my server crashed and wont boot, couldnt get logs because server just died, cant access the share to check for logs and all I see is this (check attached image) when booting. I know is not ideal, and all I ask is if this error brings something up in your experience. I'm tempted to just try a different OS, I dont care about the data and I cant make it a week in unraid. Maybe its just something wrong with the hardware but dont really have spares to test, shit. This is my hardware if it helps, 2 nvme cache, 3 10tb disks WD Red (1 parity). Edited May 17, 20242 yr by Ceps
May 17, 20242 yr Community Expert Try with a different flash drive using a stock Unraid install, no key needed, just to see if it boots.
May 18, 20242 yr Author I couldnt try that this time, because at some point a couple of weeks back I bought a new flash drive and moved my install / license to it, problems continued after that, when I tried today to install unraid on my old flash drive I forgot it was black listed 😅. I spent all this time since I made this post running memtest on each memory module and all passed. After many restarts I got finally in, like no change, just resetting bios and restarting, trying some XMP profiles, removing modules, different memory also that I borrowed, eventually, after another reset bios to defaults and restart with my actual memory I got in. I ran to get into the logs and they were gone. I only have today (attached). Attached my docker containers and I also have a single VM. unraid-syslog-20240518-0537.zip
May 18, 20242 yr Author Oh I forgot to attach the diagnostics, just in case. unraid-diagnostics-20240518-0151.zip
May 18, 20242 yr Community Expert 3 hours ago, Ceps said: I couldnt try that this time, because at some point a couple of weeks back I bought a new flash drive and moved my install / license to it, problems continued after that, when I tried today to install unraid on my old flash drive I forgot it was black listed You could still use it to see if it boots. 3 hours ago, Ceps said: After many restarts I got finally in, SO it's working now? Diags look OK to me for now.
May 18, 20242 yr Author Thanks @JorgeB for taking the time. I'll update this with logs when it crashes again, I assume since nothing changed it will eventually, hopefully next time I can keep the logs.
May 19, 20242 yr Author Ok, it crashed again, had to restart (with the physical button). When checking the logs I noticed a dropdown I ignored before with a different file which contains everything, I said before I thought I lost the logs 😞. Attached the logs file. So this last crash, I guess somewhere around here the server died, 10:22 was me restarting the server. May 18 22:48:45 unRaid webGUI: Successful login user root from 192.168.1.243 May 18 23:06:07 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:13:22 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 04:35:51 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 04:40:01 unRaid root: Fix Common Problems Version 2024.05.04 May 19 06:03:46 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 06:06:20 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 10:22:27 unRaid root: Delaying execution of fix common problems scan for 10 minutes May 19 10:22:27 unRaid emhttpd: Starting services... May 19 10:22:27 unRaid emhttpd: shcmd (54): chmod 0777 '/mnt/user/documents' syslog-192.168.1.6.log.txt.zip
May 19, 20242 yr Author New crash, cant get in yet. This is the output when booting unRaid. Update: After some time I was able to boot again. Logs around the crash: May 19 10:23:02 unRaid network: reload service: nginx May 19 10:23:02 unRaid nginx: 2024/05/19 10:23:02 [alert] 6765#6765: *111 open socket #19 left in connection 10 May 19 10:23:02 unRaid nginx: 2024/05/19 10:23:02 [alert] 6765#6765: aborting May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 0/KVM/8776 took a split_lock trap at address: 0x733c4014 May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 1/KVM/8777 took a split_lock trap at address: 0x733c4014 May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 3/KVM/8779 took a split_lock trap at address: 0x733c4014 May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 2/KVM/8778 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 7/KVM/8783 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 5/KVM/8781 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 6/KVM/8782 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 4/KVM/8780 took a split_lock trap at address: 0x733c4014 May 19 10:32:00 unRaid root: Fix Common Problems Version 2024.05.04 May 19 12:12:49 unRaid kernel: mce_notify_irq: 8 callbacks suppressed May 19 12:12:49 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 12:18:01 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:19:45 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:40:42 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:53:49 unRaid kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI May 19 14:53:49 unRaid kernel: CPU: 8 PID: 14923 Comm: smartctl_type Tainted: P O 6.1.79-Unraid #1 May 19 14:53:49 unRaid kernel: Hardware name: ASUS System Product Name/ProArt Z790-CREATOR WIFI, BIOS 2202 04/17/2024 May 19 14:53:49 unRaid kernel: RIP: 0010:cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: Code: c0 48 89 fd 53 89 f3 8b 35 ba dd 34 01 89 c2 48 89 ef e8 96 96 3d 00 39 05 aa dd 34 01 89 c2 76 08 39 c3 75 04 ff c0 eb de 5b <89> d0 5d c3 cc cc cc cc 0f 1f 44 00 00 55 bd 1f 00 00 00 53 48 89 May 19 14:53:49 unRaid kernel: RSP: 0000:ffffc9000f30fcd8 EFLAGS: 00010246 May 19 14:53:49 unRaid kernel: RAX: 000000000000001c RBX: ffff88816c433300 RCX: 0000000000000009 May 19 14:53:49 unRaid kernel: RDX: 000000000000001c RSI: 000000000000001c RDI: ffff88816c433738 May 19 14:53:49 unRaid kernel: RBP: ffff88816c433738 R08: 0000000000000001 R09: 0000000000000059 May 19 14:53:49 unRaid kernel: R10: ffff8881068c5008 R11: ffff8881068c500c R12: ffff88816c433738 May 19 14:53:49 unRaid kernel: R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000 May 19 14:53:49 unRaid kernel: FS: 0000148a3496b640(0000) GS:ffff88a03f200000(0000) knlGS:0000000000000000 May 19 14:53:49 unRaid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 19 14:53:49 unRaid kernel: CR2: 0000148a34ba4280 CR3: 0000001661ca4000 CR4: 0000000000752ee0 May 19 14:53:49 unRaid kernel: PKRU: 55555554 May 19 14:53:49 unRaid kernel: Call Trace: May 19 14:53:49 unRaid kernel: <TASK> May 19 14:53:49 unRaid kernel: ? __die_body+0x1a/0x5c May 19 14:53:49 unRaid kernel: ? die+0x30/0x49 May 19 14:53:49 unRaid kernel: ? do_trap+0x7b/0xfe May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? do_error_trap+0x6e/0x98 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? exc_invalid_op+0x4c/0x60 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? asm_exc_invalid_op+0x16/0x20 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: flush_tlb_mm_range+0xb0/0x111 May 19 14:53:49 unRaid kernel: ptep_clear_flush+0x3c/0x45 May 19 14:53:49 unRaid kernel: wp_page_copy+0x36d/0x4a3 May 19 14:53:49 unRaid kernel: __handle_mm_fault+0x71c/0xcf9 May 19 14:53:49 unRaid kernel: handle_mm_fault+0x13d/0x20f May 19 14:53:49 unRaid kernel: do_user_addr_fault+0x2c3/0x48d May 19 14:53:49 unRaid kernel: exc_page_fault+0xfb/0x11d May 19 14:53:49 unRaid kernel: asm_exc_page_fault+0x22/0x30 May 19 14:53:49 unRaid kernel: RIP: 0033:0x148a3863f21c May 19 14:53:49 unRaid kernel: Code: 1f 80 00 00 00 00 48 8b 08 8b 50 08 4c 01 f9 48 83 fa 26 74 0a 48 83 fa 08 0f 85 cb 19 00 00 48 8b 50 10 48 83 c0 18 4c 01 fa <48> 89 11 48 39 d8 72 d4 4d 8b 93 e8 01 00 00 4d 85 d2 0f 84 fc 0a May 19 14:53:49 unRaid kernel: RSP: 002b:00007ffc7121ad40 EFLAGS: 00010202 May 19 14:53:49 unRaid kernel: RAX: 0000148a34a8a158 RBX: 0000148a34aafeb0 RCX: 0000148a34ba4280 May 19 14:53:49 unRaid kernel: RDX: 0000148a34ab3970 RSI: 0000148a38664ab0 RDI: 0000148a34a87ca8 May 19 14:53:49 unRaid kernel: RBP: 00007ffc7121ae40 R08: 0000148a34ab15c0 R09: 0000148a34ab2238 May 19 14:53:49 unRaid kernel: R10: 0000000000000001 R11: 0000148a34bdc090 R12: 0000000000000000 May 19 14:53:49 unRaid kernel: R13: 00007ffc7121add0 R14: 0000148a34a87000 R15: 0000148a34a87000 May 19 14:53:49 unRaid kernel: </TASK> May 19 14:53:49 unRaid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls igc atlantic i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iosf_mbi kvm drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 drm sha1_ssse3 aesni_intel btusb btrtl crypto_simd btbcm btintel input_leds cryptd rapl bluetooth intel_cstate mei_hdcp mei_pxp wmi_bmof intel_gtt joydev led_class i2c_i801 ecdh_generic ecc i2c_smbus May 19 14:53:49 unRaid kernel: agpgart nvme ahci mei_me intel_uncore thunderbolt i2c_core syscopyarea nvme_core mei libahci sysfillrect vmd sysimgblt video thermal fan fb_sys_fops tpm_crb tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: igc] May 19 14:53:49 unRaid kernel: ---[ end trace 0000000000000000 ]--- May 19 14:53:49 unRaid kernel: RIP: 0010:cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: Code: c0 48 89 fd 53 89 f3 8b 35 ba dd 34 01 89 c2 48 89 ef e8 96 96 3d 00 39 05 aa dd 34 01 89 c2 76 08 39 c3 75 04 ff c0 eb de 5b <89> d0 5d c3 cc cc cc cc 0f 1f 44 00 00 55 bd 1f 00 00 00 53 48 89 May 19 14:53:49 unRaid kernel: RSP: 0000:ffffc9000f30fcd8 EFLAGS: 00010246 May 19 14:53:49 unRaid kernel: RAX: 000000000000001c RBX: ffff88816c433300 RCX: 0000000000000009 May 19 14:53:49 unRaid kernel: RDX: 000000000000001c RSI: 000000000000001c RDI: ffff88816c433738 May 19 14:53:49 unRaid kernel: RBP: ffff88816c433738 R08: 0000000000000001 R09: 0000000000000059 May 19 14:53:49 unRaid kernel: R10: ffff8881068c5008 R11: ffff8881068c500c R12: ffff88816c433738 May 19 14:53:49 unRaid kernel: R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000 May 19 14:53:49 unRaid kernel: FS: 0000148a3496b640(0000) GS:ffff88a03f200000(0000) knlGS:0000000000000000 May 19 14:53:49 unRaid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 19 14:53:49 unRaid kernel: CR2: 0000148a34ba4280 CR3: 0000001661ca4000 CR4: 0000000000752ee0 May 19 14:53:49 unRaid kernel: PKRU: 55555554 May 19 14:53:49 unRaid kernel: note: smartctl_type[14923] exited with preempt_count 2 May 19 14:53:58 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:54:03 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 19:05:35 unRaid root: Delaying execution of fix common problems scan for 10 minutes May 19 19:05:35 unRaid emhttpd: Starting services... Edited May 19, 20242 yr by Ceps
May 20, 20242 yr Community Expert Solution 18 hours ago, Ceps said: May 18 23:06:07 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:13:22 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 04:35:51 unRaid kernel: mce: [Hardware Error]: Machine check events logged These suggest a hardware problem.
May 20, 20242 yr Author Thanks again @JorgeB Yeah makes sense, something tells me its the motherboard, I dont know why but since I built this system, I always felt the mobo was not right.
June 25, 20242 yr Author @JorgeB Just thought I'd update on this, it ended up being the CPU. Got a new one, running fine for a week now.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.