Patb Posted June 29, 2019 Share Posted June 29, 2019 I'm not exactly sure what's happening but my server is locking after a couple of days (this has happened a couple of times so far). The only thing I can do is a reboot. Attached is a copy of my syslog from last night after I got it back up and running as well as the diagnostics. I also have the following error in my log this morning. Jun 29 09:17:30 Tower kernel: WARNING: CPU: 3 PID: 5988 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0x97/0x686 Jun 29 09:17:30 Tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net vhost tap veth macvlan xt_nat iptable_filter iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod tun bonding bnx2x mdio igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate ipmi_ssif intel_uncore mpt3sas intel_rapl_perf wmi i2c_i801 ipmi_si i2c_core pcc_cpufreq raid_class button scsi_transport_sas [last unloaded: mdio] Jun 29 09:17:30 Tower kernel: CPU: 3 PID: 5988 Comm: kworker/3:2 Not tainted 4.19.55-Unraid #1 Jun 29 09:17:30 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018 Jun 29 09:17:30 Tower kernel: Workqueue: events macvlan_process_broadcast [macvlan] Jun 29 09:17:30 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x686 Jun 29 09:17:30 Tower kernel: Code: c1 ed 20 89 2c 24 e8 67 fb ff ff 8b 54 24 04 89 ef 89 c6 41 89 c4 e8 29 f9 ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 25 <0f> 0b 44 89 e6 89 ef 45 31 ff e8 b2 f1 ff ff be 00 02 00 00 48 c7 Jun 29 09:17:30 Tower kernel: RSP: 0018:ffff8886678c3d98 EFLAGS: 00010202 Jun 29 09:17:30 Tower kernel: RAX: 0000000000000188 RBX: ffff8886422ece00 RCX: 000000008e171493 Jun 29 09:17:30 Tower kernel: RDX: 0000000000000001 RSI: 000000000000021c RDI: ffffffff81e090e8 Jun 29 09:17:30 Tower kernel: RBP: 00000000000062ba R08: ffff88843db45d30 R09: 00000000817a4f1c Jun 29 09:17:30 Tower kernel: R10: 0000000000000000 R11: ffff88863e371401 R12: 0000000000005e1c Jun 29 09:17:30 Tower kernel: R13: ffffffff81e8e080 R14: ffff88843db45cc0 R15: ffff88843db45d18 Jun 29 09:17:30 Tower kernel: FS: 0000000000000000(0000) GS:ffff8886678c0000(0000) knlGS:0000000000000000 Jun 29 09:17:30 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 29 09:17:30 Tower kernel: CR2: 00001498c65fd320 CR3: 0000000001e0a002 CR4: 00000000000606e0 Jun 29 09:17:30 Tower kernel: Call Trace: Jun 29 09:17:30 Tower kernel: <IRQ> Jun 29 09:17:30 Tower kernel: ipv4_confirm+0xaf/0xb7 Jun 29 09:17:30 Tower kernel: nf_hook_slow+0x37/0x96 Jun 29 09:17:30 Tower kernel: ip_local_deliver+0xa9/0xd7 Jun 29 09:17:30 Tower kernel: ? ip_sublist_rcv_finish+0x53/0x53 Jun 29 09:17:30 Tower kernel: ip_rcv+0xa0/0xbe Jun 29 09:17:30 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e2/0x2e2 Jun 29 09:17:30 Tower kernel: __netif_receive_skb_one_core+0x4d/0x69 Jun 29 09:17:30 Tower kernel: process_backlog+0x7c/0x116 Jun 29 09:17:30 Tower kernel: net_rx_action+0x10b/0x274 Jun 29 09:17:30 Tower kernel: __do_softirq+0xce/0x1e2 Jun 29 09:17:30 Tower kernel: do_softirq_own_stack+0x2a/0x40 Jun 29 09:17:30 Tower kernel: </IRQ> Jun 29 09:17:30 Tower kernel: do_softirq+0x4d/0x59 Jun 29 09:17:30 Tower kernel: netif_rx_ni+0x1c/0x22 Jun 29 09:17:30 Tower kernel: macvlan_broadcast+0x10f/0x153 [macvlan] Jun 29 09:17:30 Tower kernel: ? __switch_to_asm+0x41/0x70 Jun 29 09:17:30 Tower kernel: macvlan_process_broadcast+0xeb/0x128 [macvlan] Jun 29 09:17:30 Tower kernel: process_one_work+0x16e/0x24f Jun 29 09:17:30 Tower kernel: ? pwq_unbound_release_workfn+0xb7/0xb7 Jun 29 09:17:30 Tower kernel: worker_thread+0x1dc/0x2ac Jun 29 09:17:30 Tower kernel: kthread+0x10b/0x113 Jun 29 09:17:30 Tower kernel: ? kthread_park+0x71/0x71 Jun 29 09:17:30 Tower kernel: ret_from_fork+0x35/0x40 Jun 29 09:17:30 Tower kernel: ---[ end trace 4f9949cbfd7b7ba0 ]--- syslog tower-diagnostics-20190629-1421.zip Quote Link to comment
JorgeB Posted July 1, 2019 Share Posted July 1, 2019 Macvlan call traces are usually related to dockers with custom IP addresses. Quote Link to comment
Patb Posted July 1, 2019 Author Share Posted July 1, 2019 The only docker I have with a fixed IP is pi-hole. The log recorded the following this morning: Jul 1 08:21:06 Tower kernel: mdcmd (102): set md_write_method 0 Jul 1 08:21:06 Tower kernel: Jul 1 08:28:05 Tower kernel: BUG: Bad page map in process C1 CompilerThre pte:ffff888c446f7d08 pmd:5ab41b067 Jul 1 08:28:05 Tower kernel: addr:000000004c68d19d vm_flags:00100073 anon_vma:000000000e448825 mapping: (null) index:1546759d4 Jul 1 08:28:05 Tower kernel: file: (null) fault: (null) mmap: (null) readpage: (null) Jul 1 08:28:05 Tower kernel: CPU: 15 PID: 29094 Comm: C1 CompilerThre Tainted: G B W 4.19.55-Unraid #1 Jul 1 08:28:05 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018 Jul 1 08:28:05 Tower kernel: Call Trace: Jul 1 08:28:05 Tower kernel: dump_stack+0x5d/0x79 Jul 1 08:28:05 Tower kernel: print_bad_pte+0x216/0x233 Jul 1 08:28:05 Tower kernel: _vm_normal_page+0x50/0xb7 Jul 1 08:28:05 Tower kernel: change_protection+0x54f/0x85c Jul 1 08:28:05 Tower kernel: change_prot_numa+0x13/0x22 Jul 1 08:28:05 Tower kernel: task_numa_work+0x208/0x2ac Jul 1 08:28:05 Tower kernel: task_work_run+0x77/0x8b Jul 1 08:28:05 Tower kernel: exit_to_usermode_loop+0x46/0x9b Jul 1 08:28:05 Tower kernel: prepare_exit_to_usermode+0x66/0x79 Jul 1 08:28:05 Tower kernel: retint_user+0x8/0x8 Jul 1 08:28:05 Tower kernel: RIP: 0033:0x154727f7a99d Jul 1 08:28:05 Tower kernel: Code: 4c 8b 21 48 89 55 90 48 89 bd 60 ff ff ff 0f 83 e0 00 00 00 66 0f 1f 84 00 00 00 00 00 0f b6 02 48 8d 0d 66 e6 e6 00 89 45 c4 <8b> 04 81 3d ee 00 00 00 89 45 c0 0f 86 12 03 00 00 48 83 ea 01 48 Jul 1 08:28:05 Tower kernel: RSP: 002b:00001546f84cd980 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13 Jul 1 08:28:05 Tower kernel: RAX: 00000000000000c6 RBX: 00001546e04acfc0 RCX: 0000154728de9000 Jul 1 08:28:05 Tower kernel: RDX: 00001546e04acf24 RSI: 0000000000000003 RDI: 00001546f84cd9a0 Jul 1 08:28:05 Tower kernel: RBP: 00001546f84cda30 R08: 0000000000000000 R09: 00000000ffffffff Jul 1 08:28:05 Tower kernel: R10: 00001546e04acf60 R11: 0000000000000007 R12: 00001546e04ad090 Jul 1 08:28:05 Tower kernel: R13: 00001546e04ac780 R14: 000015472889dba8 R15: 0000000000000070 Jul 1 08:52:29 Tower rpcbind[25832]: connect from 192.168.1.233 to null() Jul 1 08:52:29 Tower rpcbind[25833]: connect from 192.168.1.233 to getport/addr(mountd) Jul 1 08:52:30 Tower rpcbind[25834]: connect from 192.168.1.233 to null() Jul 1 08:52:30 Tower rpcbind[25835]: connect from 192.168.1.233 to getport/addr(mountd) Jul 1 08:52:30 Tower rpc.mountd[8977]: authenticated mount request from 192.168.1.233:60026 for /mnt/user/video (/mnt/user/video) Jul 1 08:52:30 Tower rpcbind[25836]: connect from 192.168.1.233 to null() Jul 1 08:52:30 Tower rpcbind[25837]: connect from 192.168.1.233 to getport/addr(nfs) Jul 1 09:04:24 Tower kernel: BUG: Bad page map in process VM Periodic Tas pte:ffff888c446f7d08 pmd:5ab41b067 Jul 1 09:04:24 Tower kernel: addr:000000004c68d19d vm_flags:00100073 anon_vma:000000000e448825 mapping: (null) index:1546759d4 Jul 1 09:04:24 Tower kernel: file: (null) fault: (null) mmap: (null) readpage: (null) Jul 1 09:04:24 Tower kernel: CPU: 2 PID: 29097 Comm: VM Periodic Tas Tainted: G B W 4.19.55-Unraid #1 Jul 1 09:04:24 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018 Jul 1 09:04:24 Tower kernel: Call Trace: Jul 1 09:04:24 Tower kernel: dump_stack+0x5d/0x79 Jul 1 09:04:24 Tower kernel: print_bad_pte+0x216/0x233 Jul 1 09:04:24 Tower kernel: _vm_normal_page+0x50/0xb7 Jul 1 09:04:24 Tower kernel: change_protection+0x54f/0x85c Jul 1 09:04:24 Tower kernel: change_prot_numa+0x13/0x22 Jul 1 09:04:24 Tower kernel: task_numa_work+0x208/0x2ac Jul 1 09:04:24 Tower kernel: task_work_run+0x77/0x8b Jul 1 09:04:24 Tower kernel: exit_to_usermode_loop+0x46/0x9b Jul 1 09:04:24 Tower kernel: do_syscall_64+0xdf/0xf2 Jul 1 09:04:24 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 1 09:04:24 Tower kernel: RIP: 0033:0x154728e481aa Jul 1 09:04:24 Tower kernel: Code: 00 00 b8 ca 00 00 00 0f 05 5a 5e c3 0f 1f 40 00 56 52 c7 07 00 00 00 00 81 f6 81 00 00 00 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 00 41 54 41 55 49 89 fc 49 89 f5 48 83 ec 18 48 89 Jul 1 09:04:24 Tower kernel: RSP: 002b:00001546f81cbc00 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca Jul 1 09:04:24 Tower kernel: RAX: 0000000000000000 RBX: 00000000fffffffd RCX: 0000154728e481aa Jul 1 09:04:24 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000081 RDI: 000015472022c728 Jul 1 09:04:24 Tower kernel: RBP: 00001546f81cbce0 R08: 0000000000000000 R09: 000015472022c750 Jul 1 09:04:24 Tower kernel: R10: 00001546f81cbc00 R11: 0000000000000206 R12: 000015472022c700 Jul 1 09:04:24 Tower kernel: R13: 000015472022c728 R14: 00001546f81cbca0 R15: 000015472022c750 Jul 1 09:38:35 Tower kernel: BUG: Bad page map in process G1 Young RemSet pte:ffff888c446f7d08 pmd:5ab41b067 Jul 1 09:38:35 Tower kernel: addr:000000004c68d19d vm_flags:00100073 anon_vma:000000000e448825 mapping: (null) index:1546759d4 Jul 1 09:38:35 Tower kernel: file: (null) fault: (null) mmap: (null) readpage: (null) Jul 1 09:38:35 Tower kernel: CPU: 23 PID: 29089 Comm: G1 Young RemSet Tainted: G B W 4.19.55-Unraid #1 Jul 1 09:38:35 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018 Jul 1 09:38:35 Tower kernel: Call Trace: Jul 1 09:38:35 Tower kernel: dump_stack+0x5d/0x79 Jul 1 09:38:35 Tower kernel: print_bad_pte+0x216/0x233 Jul 1 09:38:35 Tower kernel: _vm_normal_page+0x50/0xb7 Jul 1 09:38:35 Tower kernel: change_protection+0x54f/0x85c Jul 1 09:38:35 Tower kernel: ? _copy_to_user+0x22/0x28 Jul 1 09:38:35 Tower kernel: change_prot_numa+0x13/0x22 Jul 1 09:38:35 Tower kernel: task_numa_work+0x208/0x2ac Jul 1 09:38:35 Tower kernel: task_work_run+0x77/0x8b Jul 1 09:38:35 Tower kernel: exit_to_usermode_loop+0x46/0x9b Jul 1 09:38:35 Tower kernel: prepare_exit_to_usermode+0x66/0x79 Jul 1 09:38:35 Tower kernel: retint_user+0x8/0x8 Jul 1 09:38:35 Tower kernel: RIP: 0033:0x15472859565d Jul 1 09:38:35 Tower kernel: Code: e5 41 56 41 55 41 54 53 48 89 fb 66 48 8d 3d 7a 29 81 00 66 66 48 e8 62 b5 6f ff ba 01 00 00 00 4c 8b 28 31 c0 f0 48 0f b1 13 <48> 85 c0 75 23 e9 e9 00 00 00 66 0f 1f 84 00 00 00 00 00 48 89 c1 Jul 1 09:38:35 Tower kernel: RSP: 002b:00001546f93f1ce0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Jul 1 09:38:35 Tower kernel: RAX: 0000000000000000 RBX: 00001547201146a8 RCX: 0000000000000000 Jul 1 09:38:35 Tower kernel: RDX: 0000000000000001 RSI: 0000154720114510 RDI: 0000154728da7fc0 Jul 1 09:38:35 Tower kernel: RBP: 00001546f93f1d00 R08: 0000000000000000 R09: 000015472014cb50 Jul 1 09:38:35 Tower kernel: R10: 00001546f93f1c00 R11: 0000000000000206 R12: 00001546f93f1db0 Jul 1 09:38:35 Tower kernel: R13: 000015472014b800 R14: 0000154720114720 R15: 0000154728cf7328 Quote Link to comment
Squid Posted July 1, 2019 Share Posted July 1, 2019 On 6/29/2019 at 10:22 AM, Patb said: Attached is a copy of my syslog from last night after I got it back up and running as well as the diagnostics. Your best bet is to set up the syslog server (Settings - Syslog Server) to mirror to the flash drive, and then wait for another crash. After recovering, then upload that syslog. Quote Link to comment
Patb Posted July 1, 2019 Author Share Posted July 1, 2019 i did, that's what one of the attached files. Quote Link to comment
JorgeB Posted July 1, 2019 Share Posted July 1, 2019 Server has a memory problem, most are been corrected: Jun 5 05:59:39 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR But if there is an uncorrectable error server will halt to prevent data corruption, if there is one check the board's system event log, there might be more info there. Quote Link to comment
Patb Posted July 7, 2019 Author Share Posted July 7, 2019 (edited) I've tested the memory a couple of times and it always passes but will try again... not sure what's happening. Woke up this morning and server crashed again and cache drive (BTRFS) is corrupt. I swapped all the memory in that server and have reformatted my cache to XFS and am now in the process of restoring my appdata from backup. This is not a good day This is the last log entry before I noticed the crash in the morning (full log attached) Jul 6 21:15:30 Tower kernel: WARNING: CPU: 4 PID: 0 at net/netfilter/nf_nat_core.c:420 nf_nat_setup_info+0x6b/0x5fb [nf_nat] Jul 6 21:15:30 Tower kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle xt_nat ip6table_filter ip6_tables vhost_net tun vhost tap veth ipt_MASQUERADE iptable_nat nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod bonding bnx2x mdio igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd ipmi_ssif glue_helper mpt3sas intel_cstate wmi i2c_i801 intel_uncore i2c_core intel_rapl_perf raid_class pcc_cpufreq ipmi_si scsi_transport_sas button [last unloaded: mdio] Jul 6 21:15:30 Tower kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W 4.19.56-Unraid #1 Jul 6 21:15:30 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018 Jul 6 21:15:30 Tower kernel: RIP: 0010:nf_nat_setup_info+0x6b/0x5fb [nf_nat] Jul 6 21:15:30 Tower kernel: Code: 48 89 fb 48 8b 87 80 00 00 00 49 89 f7 41 89 d6 76 04 0f 0b eb 0b 85 d2 75 07 25 80 00 00 00 eb 05 25 00 01 00 00 85 c0 74 07 <0f> 0b e9 ac 04 00 00 48 8b 83 90 00 00 00 4c 8d 64 24 30 48 8d 73 Jul 6 21:15:30 Tower kernel: RSP: 0018:ffff8886679037b0 EFLAGS: 00010202 Jul 6 21:15:30 Tower kernel: RAX: 0000000000000080 RBX: ffff88810e868dc0 RCX: 0000000000000000 Jul 6 21:15:30 Tower kernel: RDX: 0000000000000000 RSI: ffff88866790389c RDI: ffff88810e868dc0 Jul 6 21:15:30 Tower kernel: RBP: ffff888667903888 R08: ffff88810e868dc0 R09: 0000000000000000 Jul 6 21:15:30 Tower kernel: R10: 0000000000000158 R11: ffffffff81e8e001 R12: ffff8883f98ec800 Jul 6 21:15:30 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff88866790389c Jul 6 21:15:30 Tower kernel: FS: 0000000000000000(0000) GS:ffff888667900000(0000) knlGS:0000000000000000 Jul 6 21:15:30 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 6 21:15:30 Tower kernel: CR2: 00001506159a4480 CR3: 0000000001e0a003 CR4: 00000000000606e0 Jul 6 21:15:30 Tower kernel: Call Trace: Jul 6 21:15:30 Tower kernel: <IRQ> Jul 6 21:15:30 Tower kernel: ? ipt_do_table+0x58e/0x5db [ip_tables] Jul 6 21:15:30 Tower kernel: nf_nat_alloc_null_binding+0x6f/0x86 [nf_nat] Jul 6 21:15:30 Tower kernel: nf_nat_inet_fn+0xa0/0x192 [nf_nat] Jul 6 21:15:30 Tower kernel: nf_hook_slow+0x37/0x96 Jul 6 21:15:30 Tower kernel: ip_local_deliver+0xa9/0xd7 Jul 6 21:15:30 Tower kernel: ? ip_sublist_rcv_finish+0x53/0x53 Jul 6 21:15:30 Tower kernel: ip_sabotage_in+0x38/0x3e Jul 6 21:15:30 Tower kernel: nf_hook_slow+0x37/0x96 Jul 6 21:15:30 Tower kernel: ip_rcv+0x8e/0xbe Jul 6 21:15:30 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e2/0x2e2 Jul 6 21:15:30 Tower kernel: __netif_receive_skb_one_core+0x4d/0x69 Jul 6 21:15:30 Tower kernel: netif_receive_skb_internal+0x9f/0xba Jul 6 21:15:30 Tower kernel: br_pass_frame_up+0x123/0x145 Jul 6 21:15:30 Tower kernel: ? br_port_flags_change+0x29/0x29 Jul 6 21:15:30 Tower kernel: br_handle_frame_finish+0x330/0x375 Jul 6 21:15:30 Tower kernel: ? ipt_do_table+0x58e/0x5db [ip_tables] Jul 6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145 Jul 6 21:15:30 Tower kernel: br_nf_hook_thresh+0xa3/0xc3 Jul 6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145 Jul 6 21:15:30 Tower kernel: br_nf_pre_routing_finish+0x239/0x260 Jul 6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145 Jul 6 21:15:30 Tower kernel: ? nf_nat_ipv4_in+0x1d/0x64 [nf_nat_ipv4] Jul 6 21:15:30 Tower kernel: br_nf_pre_routing+0x2fc/0x321 Jul 6 21:15:30 Tower kernel: ? br_nf_forward_ip+0x352/0x352 Jul 6 21:15:30 Tower kernel: nf_hook_slow+0x37/0x96 Jul 6 21:15:30 Tower kernel: br_handle_frame+0x290/0x2d3 Jul 6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145 Jul 6 21:15:30 Tower kernel: ? br_handle_local_finish+0xe/0xe Jul 6 21:15:30 Tower kernel: __netif_receive_skb_core+0x466/0x798 Jul 6 21:15:30 Tower kernel: ? udp_gro_receive+0x4c/0x134 Jul 6 21:15:30 Tower kernel: __netif_receive_skb_one_core+0x31/0x69 Jul 6 21:15:30 Tower kernel: netif_receive_skb_internal+0x9f/0xba Jul 6 21:15:30 Tower kernel: napi_gro_receive+0x42/0x76 Jul 6 21:15:30 Tower kernel: bnx2x_poll+0x101f/0x1527 [bnx2x] Jul 6 21:15:30 Tower kernel: ? recalibrate_cpu_khz+0x1/0x1 Jul 6 21:15:30 Tower kernel: ? ktime_get+0x3a/0x8d Jul 6 21:15:30 Tower kernel: ? tsc_refine_calibration_work+0x2d/0x225 Jul 6 21:15:30 Tower kernel: ? irq_entries_start+0x29a/0x660 Jul 6 21:15:30 Tower kernel: net_rx_action+0x10b/0x274 Jul 6 21:15:30 Tower kernel: __do_softirq+0xce/0x1e2 Jul 6 21:15:30 Tower kernel: irq_exit+0x5e/0x9d Jul 6 21:15:30 Tower kernel: do_IRQ+0xa9/0xc7 Jul 6 21:15:30 Tower kernel: common_interrupt+0xf/0xf Jul 6 21:15:30 Tower kernel: </IRQ> Jul 6 21:15:30 Tower kernel: RIP: 0010:cpuidle_enter_state+0xe5/0x141 Jul 6 21:15:30 Tower kernel: Code: 97 7e ba ff 45 84 ff 74 1d 9c 58 66 66 90 66 90 0f ba e0 09 73 09 0f 0b fa 66 66 90 66 66 90 31 ff e8 ae 0c be ff fb 66 66 90 <66> 66 90 48 2b 1c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00 Jul 6 21:15:30 Tower kernel: RSP: 0018:ffffc900062f7ea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc Jul 6 21:15:30 Tower kernel: RAX: ffff888667920b00 RBX: 00017ffcf6cd2949 RCX: 000000000000001f Jul 6 21:15:30 Tower kernel: RDX: 00017ffcf6cd2949 RSI: 00000000435e532a RDI: 0000000000000000 Jul 6 21:15:30 Tower kernel: RBP: ffff88866792b600 R08: 0000000000000002 R09: 00000000000203c0 Jul 6 21:15:30 Tower kernel: R10: 0000000185276e44 R11: 00048c26e6f75a54 R12: 0000000000000005 Jul 6 21:15:30 Tower kernel: R13: 0000000000000005 R14: ffffffff81e5a078 R15: 0000000000000000 Jul 6 21:15:30 Tower kernel: do_idle+0x192/0x20e Jul 6 21:15:30 Tower kernel: cpu_startup_entry+0x6a/0x6c Jul 6 21:15:30 Tower kernel: start_secondary+0x197/0x1b2 Jul 6 21:15:30 Tower kernel: secondary_startup_64+0xa4/0xb0 Jul 6 21:15:30 Tower kernel: ---[ end trace c82e319e0a257db0 ]--- syslog Edited July 7, 2019 by Patb didn't finish Quote Link to comment
JorgeB Posted July 8, 2019 Share Posted July 8, 2019 15 hours ago, Patb said: I've tested the memory a couple of times and it always passes but will try again... not sure what's happening. Regular memtest won't show ECC errors since they are being corrected, passmark memtest might, or just remove/swap one dimm at a time until the MCE errors are gone. Quote Link to comment
Mytherium Posted October 2, 2019 Share Posted October 2, 2019 Hey there, I've been running into a similar circumstance where UnRaid crashes unexpectedly and the only way to recover is by power-cycling my machine. I've tried using ECC and non-ECC DIMMs, but it just keeps crashing. The System Event Log of my X9DR3-LN4F+ keeps reporting the same OS Stop Shutdown event on... FAN6 and with very strange timestamps. Only FAN2-4 are populated in the system and temperatures are under control because the system idles while I try to troubleshoot these crashes (which tend to be every 3-8 days or so). SEL records lead nowhere. I've disabled C1E Support in the bios as a troubleshooting step that did nothing. Changing memory speeds has no remedy either. I also have reason to believe that maybe MDS vulnerability is a factor here because the readout of my /sys/devices/system/cpu/vulnerabilities/mds reads Mitigation: Clear CPU buffers; SMT vulnerable. I currently have the syslog being saved to my usb for now, but I'll try to setup an external logging server up & running when I can. Any help is greatly appreciated. Attached are screenshots of the terminal before rebooting the machine on two recent crashes, logs of those most recent crashes and a screenshot from the IPMI SEL log. My system is using Unraid v6.7.2 2x E5-2670 v1 at 2.60GHz 128GB of Samsung ECC memory at 1600MHz M393B2G70BH0-YK0 Supermicro X9DR3-LN4F+ rev1.10 using Bios v3.3, IPMI firmware 3.48 syslog_01_01_2019.txt syslog_28_09_2019.txt Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.