Unraid freezing


Recommended Posts

I'm not exactly sure what's happening but my server is locking after a couple of days (this has happened a couple of times so far). The only thing I can do is a reboot.  Attached is a copy of my syslog from last night after I got it back up and running as well as the diagnostics.

 

I also have the following error in my log this morning.

 

Jun 29 09:17:30 Tower kernel: WARNING: CPU: 3 PID: 5988 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0x97/0x686
Jun 29 09:17:30 Tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net vhost tap veth macvlan xt_nat iptable_filter iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod tun bonding bnx2x mdio igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate ipmi_ssif intel_uncore mpt3sas intel_rapl_perf wmi i2c_i801 ipmi_si i2c_core pcc_cpufreq raid_class button scsi_transport_sas [last unloaded: mdio]
Jun 29 09:17:30 Tower kernel: CPU: 3 PID: 5988 Comm: kworker/3:2 Not tainted 4.19.55-Unraid #1
Jun 29 09:17:30 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018
Jun 29 09:17:30 Tower kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Jun 29 09:17:30 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0x97/0x686
Jun 29 09:17:30 Tower kernel: Code: c1 ed 20 89 2c 24 e8 67 fb ff ff 8b 54 24 04 89 ef 89 c6 41 89 c4 e8 29 f9 ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 25 <0f> 0b 44 89 e6 89 ef 45 31 ff e8 b2 f1 ff ff be 00 02 00 00 48 c7
Jun 29 09:17:30 Tower kernel: RSP: 0018:ffff8886678c3d98 EFLAGS: 00010202
Jun 29 09:17:30 Tower kernel: RAX: 0000000000000188 RBX: ffff8886422ece00 RCX: 000000008e171493
Jun 29 09:17:30 Tower kernel: RDX: 0000000000000001 RSI: 000000000000021c RDI: ffffffff81e090e8
Jun 29 09:17:30 Tower kernel: RBP: 00000000000062ba R08: ffff88843db45d30 R09: 00000000817a4f1c
Jun 29 09:17:30 Tower kernel: R10: 0000000000000000 R11: ffff88863e371401 R12: 0000000000005e1c
Jun 29 09:17:30 Tower kernel: R13: ffffffff81e8e080 R14: ffff88843db45cc0 R15: ffff88843db45d18
Jun 29 09:17:30 Tower kernel: FS:  0000000000000000(0000) GS:ffff8886678c0000(0000) knlGS:0000000000000000
Jun 29 09:17:30 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 29 09:17:30 Tower kernel: CR2: 00001498c65fd320 CR3: 0000000001e0a002 CR4: 00000000000606e0
Jun 29 09:17:30 Tower kernel: Call Trace:
Jun 29 09:17:30 Tower kernel: <IRQ>
Jun 29 09:17:30 Tower kernel: ipv4_confirm+0xaf/0xb7
Jun 29 09:17:30 Tower kernel: nf_hook_slow+0x37/0x96
Jun 29 09:17:30 Tower kernel: ip_local_deliver+0xa9/0xd7
Jun 29 09:17:30 Tower kernel: ? ip_sublist_rcv_finish+0x53/0x53
Jun 29 09:17:30 Tower kernel: ip_rcv+0xa0/0xbe
Jun 29 09:17:30 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e2/0x2e2
Jun 29 09:17:30 Tower kernel: __netif_receive_skb_one_core+0x4d/0x69
Jun 29 09:17:30 Tower kernel: process_backlog+0x7c/0x116
Jun 29 09:17:30 Tower kernel: net_rx_action+0x10b/0x274
Jun 29 09:17:30 Tower kernel: __do_softirq+0xce/0x1e2
Jun 29 09:17:30 Tower kernel: do_softirq_own_stack+0x2a/0x40
Jun 29 09:17:30 Tower kernel: </IRQ>
Jun 29 09:17:30 Tower kernel: do_softirq+0x4d/0x59
Jun 29 09:17:30 Tower kernel: netif_rx_ni+0x1c/0x22
Jun 29 09:17:30 Tower kernel: macvlan_broadcast+0x10f/0x153 [macvlan]
Jun 29 09:17:30 Tower kernel: ? __switch_to_asm+0x41/0x70
Jun 29 09:17:30 Tower kernel: macvlan_process_broadcast+0xeb/0x128 [macvlan]
Jun 29 09:17:30 Tower kernel: process_one_work+0x16e/0x24f
Jun 29 09:17:30 Tower kernel: ? pwq_unbound_release_workfn+0xb7/0xb7
Jun 29 09:17:30 Tower kernel: worker_thread+0x1dc/0x2ac
Jun 29 09:17:30 Tower kernel: kthread+0x10b/0x113
Jun 29 09:17:30 Tower kernel: ? kthread_park+0x71/0x71
Jun 29 09:17:30 Tower kernel: ret_from_fork+0x35/0x40
Jun 29 09:17:30 Tower kernel: ---[ end trace 4f9949cbfd7b7ba0 ]---

syslog tower-diagnostics-20190629-1421.zip

Link to comment

The only docker I have with a fixed IP is pi-hole.

 

The log recorded the following this morning:

 

Jul  1 08:21:06 Tower kernel: mdcmd (102): set md_write_method 0
Jul  1 08:21:06 Tower kernel: 
Jul  1 08:28:05 Tower kernel: BUG: Bad page map in process C1 CompilerThre  pte:ffff888c446f7d08 pmd:5ab41b067
Jul  1 08:28:05 Tower kernel: addr:000000004c68d19d vm_flags:00100073 anon_vma:000000000e448825 mapping:          (null) index:1546759d4
Jul  1 08:28:05 Tower kernel: file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
Jul  1 08:28:05 Tower kernel: CPU: 15 PID: 29094 Comm: C1 CompilerThre Tainted: G    B   W         4.19.55-Unraid #1
Jul  1 08:28:05 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018
Jul  1 08:28:05 Tower kernel: Call Trace:
Jul  1 08:28:05 Tower kernel: dump_stack+0x5d/0x79
Jul  1 08:28:05 Tower kernel: print_bad_pte+0x216/0x233
Jul  1 08:28:05 Tower kernel: _vm_normal_page+0x50/0xb7
Jul  1 08:28:05 Tower kernel: change_protection+0x54f/0x85c
Jul  1 08:28:05 Tower kernel: change_prot_numa+0x13/0x22
Jul  1 08:28:05 Tower kernel: task_numa_work+0x208/0x2ac
Jul  1 08:28:05 Tower kernel: task_work_run+0x77/0x8b
Jul  1 08:28:05 Tower kernel: exit_to_usermode_loop+0x46/0x9b
Jul  1 08:28:05 Tower kernel: prepare_exit_to_usermode+0x66/0x79
Jul  1 08:28:05 Tower kernel: retint_user+0x8/0x8
Jul  1 08:28:05 Tower kernel: RIP: 0033:0x154727f7a99d
Jul  1 08:28:05 Tower kernel: Code: 4c 8b 21 48 89 55 90 48 89 bd 60 ff ff ff 0f 83 e0 00 00 00 66 0f 1f 84 00 00 00 00 00 0f b6 02 48 8d 0d 66 e6 e6 00 89 45 c4 <8b> 04 81 3d ee 00 00 00 89 45 c0 0f 86 12 03 00 00 48 83 ea 01 48
Jul  1 08:28:05 Tower kernel: RSP: 002b:00001546f84cd980 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
Jul  1 08:28:05 Tower kernel: RAX: 00000000000000c6 RBX: 00001546e04acfc0 RCX: 0000154728de9000
Jul  1 08:28:05 Tower kernel: RDX: 00001546e04acf24 RSI: 0000000000000003 RDI: 00001546f84cd9a0
Jul  1 08:28:05 Tower kernel: RBP: 00001546f84cda30 R08: 0000000000000000 R09: 00000000ffffffff
Jul  1 08:28:05 Tower kernel: R10: 00001546e04acf60 R11: 0000000000000007 R12: 00001546e04ad090
Jul  1 08:28:05 Tower kernel: R13: 00001546e04ac780 R14: 000015472889dba8 R15: 0000000000000070
Jul  1 08:52:29 Tower rpcbind[25832]: connect from 192.168.1.233 to null()
Jul  1 08:52:29 Tower rpcbind[25833]: connect from 192.168.1.233 to getport/addr(mountd)
Jul  1 08:52:30 Tower rpcbind[25834]: connect from 192.168.1.233 to null()
Jul  1 08:52:30 Tower rpcbind[25835]: connect from 192.168.1.233 to getport/addr(mountd)
Jul  1 08:52:30 Tower rpc.mountd[8977]: authenticated mount request from 192.168.1.233:60026 for /mnt/user/video (/mnt/user/video)
Jul  1 08:52:30 Tower rpcbind[25836]: connect from 192.168.1.233 to null()
Jul  1 08:52:30 Tower rpcbind[25837]: connect from 192.168.1.233 to getport/addr(nfs)
Jul  1 09:04:24 Tower kernel: BUG: Bad page map in process VM Periodic Tas  pte:ffff888c446f7d08 pmd:5ab41b067
Jul  1 09:04:24 Tower kernel: addr:000000004c68d19d vm_flags:00100073 anon_vma:000000000e448825 mapping:          (null) index:1546759d4
Jul  1 09:04:24 Tower kernel: file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
Jul  1 09:04:24 Tower kernel: CPU: 2 PID: 29097 Comm: VM Periodic Tas Tainted: G    B   W         4.19.55-Unraid #1
Jul  1 09:04:24 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018
Jul  1 09:04:24 Tower kernel: Call Trace:
Jul  1 09:04:24 Tower kernel: dump_stack+0x5d/0x79
Jul  1 09:04:24 Tower kernel: print_bad_pte+0x216/0x233
Jul  1 09:04:24 Tower kernel: _vm_normal_page+0x50/0xb7
Jul  1 09:04:24 Tower kernel: change_protection+0x54f/0x85c
Jul  1 09:04:24 Tower kernel: change_prot_numa+0x13/0x22
Jul  1 09:04:24 Tower kernel: task_numa_work+0x208/0x2ac
Jul  1 09:04:24 Tower kernel: task_work_run+0x77/0x8b
Jul  1 09:04:24 Tower kernel: exit_to_usermode_loop+0x46/0x9b
Jul  1 09:04:24 Tower kernel: do_syscall_64+0xdf/0xf2
Jul  1 09:04:24 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul  1 09:04:24 Tower kernel: RIP: 0033:0x154728e481aa
Jul  1 09:04:24 Tower kernel: Code: 00 00 b8 ca 00 00 00 0f 05 5a 5e c3 0f 1f 40 00 56 52 c7 07 00 00 00 00 81 f6 81 00 00 00 ba 01 00 00 00 b8 ca 00 00 00 0f 05 <5a> 5e c3 0f 1f 00 41 54 41 55 49 89 fc 49 89 f5 48 83 ec 18 48 89
Jul  1 09:04:24 Tower kernel: RSP: 002b:00001546f81cbc00 EFLAGS: 00000206 ORIG_RAX: 00000000000000ca
Jul  1 09:04:24 Tower kernel: RAX: 0000000000000000 RBX: 00000000fffffffd RCX: 0000154728e481aa
Jul  1 09:04:24 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000081 RDI: 000015472022c728
Jul  1 09:04:24 Tower kernel: RBP: 00001546f81cbce0 R08: 0000000000000000 R09: 000015472022c750
Jul  1 09:04:24 Tower kernel: R10: 00001546f81cbc00 R11: 0000000000000206 R12: 000015472022c700
Jul  1 09:04:24 Tower kernel: R13: 000015472022c728 R14: 00001546f81cbca0 R15: 000015472022c750
Jul  1 09:38:35 Tower kernel: BUG: Bad page map in process G1 Young RemSet  pte:ffff888c446f7d08 pmd:5ab41b067
Jul  1 09:38:35 Tower kernel: addr:000000004c68d19d vm_flags:00100073 anon_vma:000000000e448825 mapping:          (null) index:1546759d4
Jul  1 09:38:35 Tower kernel: file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
Jul  1 09:38:35 Tower kernel: CPU: 23 PID: 29089 Comm: G1 Young RemSet Tainted: G    B   W         4.19.55-Unraid #1
Jul  1 09:38:35 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018
Jul  1 09:38:35 Tower kernel: Call Trace:
Jul  1 09:38:35 Tower kernel: dump_stack+0x5d/0x79
Jul  1 09:38:35 Tower kernel: print_bad_pte+0x216/0x233
Jul  1 09:38:35 Tower kernel: _vm_normal_page+0x50/0xb7
Jul  1 09:38:35 Tower kernel: change_protection+0x54f/0x85c
Jul  1 09:38:35 Tower kernel: ? _copy_to_user+0x22/0x28
Jul  1 09:38:35 Tower kernel: change_prot_numa+0x13/0x22
Jul  1 09:38:35 Tower kernel: task_numa_work+0x208/0x2ac
Jul  1 09:38:35 Tower kernel: task_work_run+0x77/0x8b
Jul  1 09:38:35 Tower kernel: exit_to_usermode_loop+0x46/0x9b
Jul  1 09:38:35 Tower kernel: prepare_exit_to_usermode+0x66/0x79
Jul  1 09:38:35 Tower kernel: retint_user+0x8/0x8
Jul  1 09:38:35 Tower kernel: RIP: 0033:0x15472859565d
Jul  1 09:38:35 Tower kernel: Code: e5 41 56 41 55 41 54 53 48 89 fb 66 48 8d 3d 7a 29 81 00 66 66 48 e8 62 b5 6f ff ba 01 00 00 00 4c 8b 28 31 c0 f0 48 0f b1 13 <48> 85 c0 75 23 e9 e9 00 00 00 66 0f 1f 84 00 00 00 00 00 48 89 c1
Jul  1 09:38:35 Tower kernel: RSP: 002b:00001546f93f1ce0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Jul  1 09:38:35 Tower kernel: RAX: 0000000000000000 RBX: 00001547201146a8 RCX: 0000000000000000
Jul  1 09:38:35 Tower kernel: RDX: 0000000000000001 RSI: 0000154720114510 RDI: 0000154728da7fc0
Jul  1 09:38:35 Tower kernel: RBP: 00001546f93f1d00 R08: 0000000000000000 R09: 000015472014cb50
Jul  1 09:38:35 Tower kernel: R10: 00001546f93f1c00 R11: 0000000000000206 R12: 00001546f93f1db0
Jul  1 09:38:35 Tower kernel: R13: 000015472014b800 R14: 0000154720114720 R15: 0000154728cf7328

Link to comment
On 6/29/2019 at 10:22 AM, Patb said:

Attached is a copy of my syslog from last night after I got it back up and running as well as the diagnostics.

Your best bet is to set up the syslog server (Settings - Syslog Server) to mirror to the flash drive, and then wait for another crash.  After recovering, then upload that syslog.

Link to comment

Server has a memory problem, most are been corrected:

 

Jun  5 05:59:39 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR

 

But if there is an uncorrectable error server will halt to prevent data corruption, if there is one check the board's system event log, there might be more info there.

Link to comment

I've tested the memory a couple of times and it always passes but will try again... not sure what's happening.

 

Woke up this morning and server crashed again and cache drive (BTRFS) is corrupt. 

 

I swapped all the memory in that server and have reformatted my cache to XFS and am now in the process of restoring my appdata from backup.

 

This is not a good day :(

 

This is the last log entry before I noticed the crash in the morning (full log attached)

 

Jul  6 21:15:30 Tower kernel: WARNING: CPU: 4 PID: 0 at net/netfilter/nf_nat_core.c:420 nf_nat_setup_info+0x6b/0x5fb [nf_nat]
Jul  6 21:15:30 Tower kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle xt_nat ip6table_filter ip6_tables vhost_net tun vhost tap veth ipt_MASQUERADE iptable_nat nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod bonding bnx2x mdio igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd ipmi_ssif glue_helper mpt3sas intel_cstate wmi i2c_i801 intel_uncore i2c_core intel_rapl_perf raid_class pcc_cpufreq ipmi_si scsi_transport_sas button [last unloaded: mdio]
Jul  6 21:15:30 Tower kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G        W         4.19.56-Unraid #1
Jul  6 21:15:30 Tower kernel: Hardware name: IBM System x3630 M4 -[7158AC1]-/00KF924, BIOS -[BEE166CUS-3.10]- 08/29/2018
Jul  6 21:15:30 Tower kernel: RIP: 0010:nf_nat_setup_info+0x6b/0x5fb [nf_nat]
Jul  6 21:15:30 Tower kernel: Code: 48 89 fb 48 8b 87 80 00 00 00 49 89 f7 41 89 d6 76 04 0f 0b eb 0b 85 d2 75 07 25 80 00 00 00 eb 05 25 00 01 00 00 85 c0 74 07 <0f> 0b e9 ac 04 00 00 48 8b 83 90 00 00 00 4c 8d 64 24 30 48 8d 73
Jul  6 21:15:30 Tower kernel: RSP: 0018:ffff8886679037b0 EFLAGS: 00010202
Jul  6 21:15:30 Tower kernel: RAX: 0000000000000080 RBX: ffff88810e868dc0 RCX: 0000000000000000
Jul  6 21:15:30 Tower kernel: RDX: 0000000000000000 RSI: ffff88866790389c RDI: ffff88810e868dc0
Jul  6 21:15:30 Tower kernel: RBP: ffff888667903888 R08: ffff88810e868dc0 R09: 0000000000000000
Jul  6 21:15:30 Tower kernel: R10: 0000000000000158 R11: ffffffff81e8e001 R12: ffff8883f98ec800
Jul  6 21:15:30 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff88866790389c
Jul  6 21:15:30 Tower kernel: FS:  0000000000000000(0000) GS:ffff888667900000(0000) knlGS:0000000000000000
Jul  6 21:15:30 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  6 21:15:30 Tower kernel: CR2: 00001506159a4480 CR3: 0000000001e0a003 CR4: 00000000000606e0
Jul  6 21:15:30 Tower kernel: Call Trace:
Jul  6 21:15:30 Tower kernel: <IRQ>
Jul  6 21:15:30 Tower kernel: ? ipt_do_table+0x58e/0x5db [ip_tables]
Jul  6 21:15:30 Tower kernel: nf_nat_alloc_null_binding+0x6f/0x86 [nf_nat]
Jul  6 21:15:30 Tower kernel: nf_nat_inet_fn+0xa0/0x192 [nf_nat]
Jul  6 21:15:30 Tower kernel: nf_hook_slow+0x37/0x96
Jul  6 21:15:30 Tower kernel: ip_local_deliver+0xa9/0xd7
Jul  6 21:15:30 Tower kernel: ? ip_sublist_rcv_finish+0x53/0x53
Jul  6 21:15:30 Tower kernel: ip_sabotage_in+0x38/0x3e
Jul  6 21:15:30 Tower kernel: nf_hook_slow+0x37/0x96
Jul  6 21:15:30 Tower kernel: ip_rcv+0x8e/0xbe
Jul  6 21:15:30 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e2/0x2e2
Jul  6 21:15:30 Tower kernel: __netif_receive_skb_one_core+0x4d/0x69
Jul  6 21:15:30 Tower kernel: netif_receive_skb_internal+0x9f/0xba
Jul  6 21:15:30 Tower kernel: br_pass_frame_up+0x123/0x145
Jul  6 21:15:30 Tower kernel: ? br_port_flags_change+0x29/0x29
Jul  6 21:15:30 Tower kernel: br_handle_frame_finish+0x330/0x375
Jul  6 21:15:30 Tower kernel: ? ipt_do_table+0x58e/0x5db [ip_tables]
Jul  6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145
Jul  6 21:15:30 Tower kernel: br_nf_hook_thresh+0xa3/0xc3
Jul  6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145
Jul  6 21:15:30 Tower kernel: br_nf_pre_routing_finish+0x239/0x260
Jul  6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145
Jul  6 21:15:30 Tower kernel: ? nf_nat_ipv4_in+0x1d/0x64 [nf_nat_ipv4]
Jul  6 21:15:30 Tower kernel: br_nf_pre_routing+0x2fc/0x321
Jul  6 21:15:30 Tower kernel: ? br_nf_forward_ip+0x352/0x352
Jul  6 21:15:30 Tower kernel: nf_hook_slow+0x37/0x96
Jul  6 21:15:30 Tower kernel: br_handle_frame+0x290/0x2d3
Jul  6 21:15:30 Tower kernel: ? br_pass_frame_up+0x145/0x145
Jul  6 21:15:30 Tower kernel: ? br_handle_local_finish+0xe/0xe
Jul  6 21:15:30 Tower kernel: __netif_receive_skb_core+0x466/0x798
Jul  6 21:15:30 Tower kernel: ? udp_gro_receive+0x4c/0x134
Jul  6 21:15:30 Tower kernel: __netif_receive_skb_one_core+0x31/0x69
Jul  6 21:15:30 Tower kernel: netif_receive_skb_internal+0x9f/0xba
Jul  6 21:15:30 Tower kernel: napi_gro_receive+0x42/0x76
Jul  6 21:15:30 Tower kernel: bnx2x_poll+0x101f/0x1527 [bnx2x]
Jul  6 21:15:30 Tower kernel: ? recalibrate_cpu_khz+0x1/0x1
Jul  6 21:15:30 Tower kernel: ? ktime_get+0x3a/0x8d
Jul  6 21:15:30 Tower kernel: ? tsc_refine_calibration_work+0x2d/0x225
Jul  6 21:15:30 Tower kernel: ? irq_entries_start+0x29a/0x660
Jul  6 21:15:30 Tower kernel: net_rx_action+0x10b/0x274
Jul  6 21:15:30 Tower kernel: __do_softirq+0xce/0x1e2
Jul  6 21:15:30 Tower kernel: irq_exit+0x5e/0x9d
Jul  6 21:15:30 Tower kernel: do_IRQ+0xa9/0xc7
Jul  6 21:15:30 Tower kernel: common_interrupt+0xf/0xf
Jul  6 21:15:30 Tower kernel: </IRQ>
Jul  6 21:15:30 Tower kernel: RIP: 0010:cpuidle_enter_state+0xe5/0x141
Jul  6 21:15:30 Tower kernel: Code: 97 7e ba ff 45 84 ff 74 1d 9c 58 66 66 90 66 90 0f ba e0 09 73 09 0f 0b fa 66 66 90 66 66 90 31 ff e8 ae 0c be ff fb 66 66 90 <66> 66 90 48 2b 1c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00
Jul  6 21:15:30 Tower kernel: RSP: 0018:ffffc900062f7ea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc
Jul  6 21:15:30 Tower kernel: RAX: ffff888667920b00 RBX: 00017ffcf6cd2949 RCX: 000000000000001f
Jul  6 21:15:30 Tower kernel: RDX: 00017ffcf6cd2949 RSI: 00000000435e532a RDI: 0000000000000000
Jul  6 21:15:30 Tower kernel: RBP: ffff88866792b600 R08: 0000000000000002 R09: 00000000000203c0
Jul  6 21:15:30 Tower kernel: R10: 0000000185276e44 R11: 00048c26e6f75a54 R12: 0000000000000005
Jul  6 21:15:30 Tower kernel: R13: 0000000000000005 R14: ffffffff81e5a078 R15: 0000000000000000
Jul  6 21:15:30 Tower kernel: do_idle+0x192/0x20e
Jul  6 21:15:30 Tower kernel: cpu_startup_entry+0x6a/0x6c
Jul  6 21:15:30 Tower kernel: start_secondary+0x197/0x1b2
Jul  6 21:15:30 Tower kernel: secondary_startup_64+0xa4/0xb0
Jul  6 21:15:30 Tower kernel: ---[ end trace c82e319e0a257db0 ]---

 

syslog

Edited by Patb
didn't finish
Link to comment
15 hours ago, Patb said:

I've tested the memory a couple of times and it always passes but will try again... not sure what's happening.

Regular memtest won't show ECC errors since they are being corrected, passmark memtest might, or just remove/swap one dimm at a time until the MCE errors are gone.

Link to comment
  • 2 months later...

Hey there,

I've been running into a similar circumstance where UnRaid crashes unexpectedly and the only way to recover is by power-cycling my machine.

I've tried using ECC and non-ECC DIMMs, but it just keeps crashing. The System Event Log of my X9DR3-LN4F+ keeps reporting the same OS Stop Shutdown event on... FAN6 and with very strange timestamps. Only FAN2-4 are populated in the system and temperatures are under control because the system idles while I try to troubleshoot these crashes (which tend to be every 3-8 days or so).

 

SEL records lead nowhere. I've disabled C1E Support in the bios as a troubleshooting step that did nothing. Changing memory speeds has no remedy either. I also have reason to believe that maybe MDS vulnerability is a factor here because the readout of my /sys/devices/system/cpu/vulnerabilities/mds reads Mitigation: Clear CPU buffers; SMT vulnerable.

 

I currently have the syslog being saved to my usb for now, but I'll try to setup an external logging server up & running when I can. Any help is greatly appreciated.

 

Attached are screenshots of the terminal before rebooting the machine on two recent crashes, logs of those most recent crashes and a screenshot from the IPMI SEL log.

 

My system is using Unraid v6.7.2

2x E5-2670 v1 at 2.60GHz

128GB of Samsung ECC memory at 1600MHz M393B2G70BH0-YK0

Supermicro X9DR3-LN4F+ rev1.10 using Bios v3.3, IPMI firmware 3.48

thumbnail (2).jpg

thumbnail (1).jpg

thumbnail.jpg

syslog_01_01_2019.txt syslog_28_09_2019.txt

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.