Ceps Posted May 17 Share Posted May 17 (edited) Greetings, I've been using unraid for like 2 months and I've been experiencing crashes every couple of days. I've been reading all over what could be causing my issues but havent found anything and honestly I just restart and continue my life in pain. Well that lasted until 1 hour ago, my server crashed and wont boot, couldnt get logs because server just died, cant access the share to check for logs and all I see is this (check attached image) when booting. I know is not ideal, and all I ask is if this error brings something up in your experience. I'm tempted to just try a different OS, I dont care about the data and I cant make it a week in unraid. Maybe its just something wrong with the hardware but dont really have spares to test, shit. This is my hardware if it helps, 2 nvme cache, 3 10tb disks WD Red (1 parity). Edited May 17 by Ceps Quote Link to comment
JorgeB Posted May 17 Share Posted May 17 Try with a different flash drive using a stock Unraid install, no key needed, just to see if it boots. Quote Link to comment
Ceps Posted May 18 Author Share Posted May 18 I couldnt try that this time, because at some point a couple of weeks back I bought a new flash drive and moved my install / license to it, problems continued after that, when I tried today to install unraid on my old flash drive I forgot it was black listed 😅. I spent all this time since I made this post running memtest on each memory module and all passed. After many restarts I got finally in, like no change, just resetting bios and restarting, trying some XMP profiles, removing modules, different memory also that I borrowed, eventually, after another reset bios to defaults and restart with my actual memory I got in. I ran to get into the logs and they were gone. I only have today (attached). Attached my docker containers and I also have a single VM. unraid-syslog-20240518-0537.zip Quote Link to comment
Ceps Posted May 18 Author Share Posted May 18 Oh I forgot to attach the diagnostics, just in case. unraid-diagnostics-20240518-0151.zip Quote Link to comment
JorgeB Posted May 18 Share Posted May 18 3 hours ago, Ceps said: I couldnt try that this time, because at some point a couple of weeks back I bought a new flash drive and moved my install / license to it, problems continued after that, when I tried today to install unraid on my old flash drive I forgot it was black listed You could still use it to see if it boots. 3 hours ago, Ceps said: After many restarts I got finally in, SO it's working now? Diags look OK to me for now. Quote Link to comment
Ceps Posted May 18 Author Share Posted May 18 Thanks @JorgeB for taking the time. I'll update this with logs when it crashes again, I assume since nothing changed it will eventually, hopefully next time I can keep the logs. Quote Link to comment
Ceps Posted May 19 Author Share Posted May 19 Ok, it crashed again, had to restart (with the physical button). When checking the logs I noticed a dropdown I ignored before with a different file which contains everything, I said before I thought I lost the logs 😞. Attached the logs file. So this last crash, I guess somewhere around here the server died, 10:22 was me restarting the server. May 18 22:48:45 unRaid webGUI: Successful login user root from 192.168.1.243 May 18 23:06:07 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:13:22 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 04:35:51 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 04:40:01 unRaid root: Fix Common Problems Version 2024.05.04 May 19 06:03:46 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 06:06:20 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 10:22:27 unRaid root: Delaying execution of fix common problems scan for 10 minutes May 19 10:22:27 unRaid emhttpd: Starting services... May 19 10:22:27 unRaid emhttpd: shcmd (54): chmod 0777 '/mnt/user/documents' syslog-192.168.1.6.log.txt.zip Quote Link to comment
Ceps Posted May 19 Author Share Posted May 19 (edited) New crash, cant get in yet. This is the output when booting unRaid. Update: After some time I was able to boot again. Logs around the crash: May 19 10:23:02 unRaid network: reload service: nginx May 19 10:23:02 unRaid nginx: 2024/05/19 10:23:02 [alert] 6765#6765: *111 open socket #19 left in connection 10 May 19 10:23:02 unRaid nginx: 2024/05/19 10:23:02 [alert] 6765#6765: aborting May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 0/KVM/8776 took a split_lock trap at address: 0x733c4014 May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 1/KVM/8777 took a split_lock trap at address: 0x733c4014 May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 3/KVM/8779 took a split_lock trap at address: 0x733c4014 May 19 10:23:30 unRaid kernel: x86/split lock detection: #AC: CPU 2/KVM/8778 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 7/KVM/8783 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 5/KVM/8781 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 6/KVM/8782 took a split_lock trap at address: 0x733c4014 May 19 10:23:31 unRaid kernel: x86/split lock detection: #AC: CPU 4/KVM/8780 took a split_lock trap at address: 0x733c4014 May 19 10:32:00 unRaid root: Fix Common Problems Version 2024.05.04 May 19 12:12:49 unRaid kernel: mce_notify_irq: 8 callbacks suppressed May 19 12:12:49 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 12:18:01 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:19:45 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:40:42 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:53:49 unRaid kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI May 19 14:53:49 unRaid kernel: CPU: 8 PID: 14923 Comm: smartctl_type Tainted: P O 6.1.79-Unraid #1 May 19 14:53:49 unRaid kernel: Hardware name: ASUS System Product Name/ProArt Z790-CREATOR WIFI, BIOS 2202 04/17/2024 May 19 14:53:49 unRaid kernel: RIP: 0010:cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: Code: c0 48 89 fd 53 89 f3 8b 35 ba dd 34 01 89 c2 48 89 ef e8 96 96 3d 00 39 05 aa dd 34 01 89 c2 76 08 39 c3 75 04 ff c0 eb de 5b <89> d0 5d c3 cc cc cc cc 0f 1f 44 00 00 55 bd 1f 00 00 00 53 48 89 May 19 14:53:49 unRaid kernel: RSP: 0000:ffffc9000f30fcd8 EFLAGS: 00010246 May 19 14:53:49 unRaid kernel: RAX: 000000000000001c RBX: ffff88816c433300 RCX: 0000000000000009 May 19 14:53:49 unRaid kernel: RDX: 000000000000001c RSI: 000000000000001c RDI: ffff88816c433738 May 19 14:53:49 unRaid kernel: RBP: ffff88816c433738 R08: 0000000000000001 R09: 0000000000000059 May 19 14:53:49 unRaid kernel: R10: ffff8881068c5008 R11: ffff8881068c500c R12: ffff88816c433738 May 19 14:53:49 unRaid kernel: R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000 May 19 14:53:49 unRaid kernel: FS: 0000148a3496b640(0000) GS:ffff88a03f200000(0000) knlGS:0000000000000000 May 19 14:53:49 unRaid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 19 14:53:49 unRaid kernel: CR2: 0000148a34ba4280 CR3: 0000001661ca4000 CR4: 0000000000752ee0 May 19 14:53:49 unRaid kernel: PKRU: 55555554 May 19 14:53:49 unRaid kernel: Call Trace: May 19 14:53:49 unRaid kernel: <TASK> May 19 14:53:49 unRaid kernel: ? __die_body+0x1a/0x5c May 19 14:53:49 unRaid kernel: ? die+0x30/0x49 May 19 14:53:49 unRaid kernel: ? do_trap+0x7b/0xfe May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? do_error_trap+0x6e/0x98 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? exc_invalid_op+0x4c/0x60 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: ? asm_exc_invalid_op+0x16/0x20 May 19 14:53:49 unRaid kernel: ? cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: flush_tlb_mm_range+0xb0/0x111 May 19 14:53:49 unRaid kernel: ptep_clear_flush+0x3c/0x45 May 19 14:53:49 unRaid kernel: wp_page_copy+0x36d/0x4a3 May 19 14:53:49 unRaid kernel: __handle_mm_fault+0x71c/0xcf9 May 19 14:53:49 unRaid kernel: handle_mm_fault+0x13d/0x20f May 19 14:53:49 unRaid kernel: do_user_addr_fault+0x2c3/0x48d May 19 14:53:49 unRaid kernel: exc_page_fault+0xfb/0x11d May 19 14:53:49 unRaid kernel: asm_exc_page_fault+0x22/0x30 May 19 14:53:49 unRaid kernel: RIP: 0033:0x148a3863f21c May 19 14:53:49 unRaid kernel: Code: 1f 80 00 00 00 00 48 8b 08 8b 50 08 4c 01 f9 48 83 fa 26 74 0a 48 83 fa 08 0f 85 cb 19 00 00 48 8b 50 10 48 83 c0 18 4c 01 fa <48> 89 11 48 39 d8 72 d4 4d 8b 93 e8 01 00 00 4d 85 d2 0f 84 fc 0a May 19 14:53:49 unRaid kernel: RSP: 002b:00007ffc7121ad40 EFLAGS: 00010202 May 19 14:53:49 unRaid kernel: RAX: 0000148a34a8a158 RBX: 0000148a34aafeb0 RCX: 0000148a34ba4280 May 19 14:53:49 unRaid kernel: RDX: 0000148a34ab3970 RSI: 0000148a38664ab0 RDI: 0000148a34a87ca8 May 19 14:53:49 unRaid kernel: RBP: 00007ffc7121ae40 R08: 0000148a34ab15c0 R09: 0000148a34ab2238 May 19 14:53:49 unRaid kernel: R10: 0000000000000001 R11: 0000148a34bdc090 R12: 0000000000000000 May 19 14:53:49 unRaid kernel: R13: 00007ffc7121add0 R14: 0000148a34a87000 R15: 0000148a34a87000 May 19 14:53:49 unRaid kernel: </TASK> May 19 14:53:49 unRaid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls igc atlantic i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iosf_mbi kvm drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 drm sha1_ssse3 aesni_intel btusb btrtl crypto_simd btbcm btintel input_leds cryptd rapl bluetooth intel_cstate mei_hdcp mei_pxp wmi_bmof intel_gtt joydev led_class i2c_i801 ecdh_generic ecc i2c_smbus May 19 14:53:49 unRaid kernel: agpgart nvme ahci mei_me intel_uncore thunderbolt i2c_core syscopyarea nvme_core mei libahci sysfillrect vmd sysimgblt video thermal fan fb_sys_fops tpm_crb tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: igc] May 19 14:53:49 unRaid kernel: ---[ end trace 0000000000000000 ]--- May 19 14:53:49 unRaid kernel: RIP: 0010:cpumask_any_but+0x2c/0x34 May 19 14:53:49 unRaid kernel: Code: c0 48 89 fd 53 89 f3 8b 35 ba dd 34 01 89 c2 48 89 ef e8 96 96 3d 00 39 05 aa dd 34 01 89 c2 76 08 39 c3 75 04 ff c0 eb de 5b <89> d0 5d c3 cc cc cc cc 0f 1f 44 00 00 55 bd 1f 00 00 00 53 48 89 May 19 14:53:49 unRaid kernel: RSP: 0000:ffffc9000f30fcd8 EFLAGS: 00010246 May 19 14:53:49 unRaid kernel: RAX: 000000000000001c RBX: ffff88816c433300 RCX: 0000000000000009 May 19 14:53:49 unRaid kernel: RDX: 000000000000001c RSI: 000000000000001c RDI: ffff88816c433738 May 19 14:53:49 unRaid kernel: RBP: ffff88816c433738 R08: 0000000000000001 R09: 0000000000000059 May 19 14:53:49 unRaid kernel: R10: ffff8881068c5008 R11: ffff8881068c500c R12: ffff88816c433738 May 19 14:53:49 unRaid kernel: R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000 May 19 14:53:49 unRaid kernel: FS: 0000148a3496b640(0000) GS:ffff88a03f200000(0000) knlGS:0000000000000000 May 19 14:53:49 unRaid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 19 14:53:49 unRaid kernel: CR2: 0000148a34ba4280 CR3: 0000001661ca4000 CR4: 0000000000752ee0 May 19 14:53:49 unRaid kernel: PKRU: 55555554 May 19 14:53:49 unRaid kernel: note: smartctl_type[14923] exited with preempt_count 2 May 19 14:53:58 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 14:54:03 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 19:05:35 unRaid root: Delaying execution of fix common problems scan for 10 minutes May 19 19:05:35 unRaid emhttpd: Starting services... Edited May 19 by Ceps Quote Link to comment
Solution JorgeB Posted May 20 Solution Share Posted May 20 18 hours ago, Ceps said: May 18 23:06:07 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:13:22 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 18 23:19:24 unRaid kernel: mce: [Hardware Error]: Machine check events logged May 19 04:35:51 unRaid kernel: mce: [Hardware Error]: Machine check events logged These suggest a hardware problem. Quote Link to comment
Ceps Posted May 20 Author Share Posted May 20 Thanks again @JorgeB Yeah makes sense, something tells me its the motherboard, I dont know why but since I built this system, I always felt the mobo was not right. Quote Link to comment
Ceps Posted June 25 Author Share Posted June 25 @JorgeB Just thought I'd update on this, it ended up being the CPU. Got a new one, running fine for a week now. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.