coldtech Posted May 21, 2022 Share Posted May 21, 2022 (edited) Hi, for some months now I have been having an issue where my UnRaid server crashes over night every few weeks. Sometimes only the webUI is not accessible, but most times it also does not respond to ping anymore - which is bad since almost my whole house automation and data gathering is running on that machine inside of several docker images. I'm on 6.10 now, but the issue started happening in 6.9 3-4 months ago. Someone mentioned in another thread where there was a similar problem to switch to ipvlan interfaces for docker, but I'm not sure if the issue really is the same. These are the last log messages that arrived at my log server - maybe anyone with a bit of experience can tell me where the actual issue might be or which component may cause it: 2022-05-21 04:05:23 Information ColdStation kern kernel igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX 2022-05-21 04:05:23 Error ColdStation kern kernel igb 0000:02:00.0 eth0: Reset adapter 2022-05-21 04:05:23 Warning ColdStation kern kernel ---[ end trace 712a220cf666edc4 ]--- 2022-05-21 04:05:23 Warning ColdStation kern kernel </TASK> 2022-05-21 04:05:23 Warning ColdStation kern kernel secondary_startup_64_no_verify+0xb0/0xbb 2022-05-21 04:05:23 Warning ColdStation kern kernel start_kernel+0x656/0x67b 2022-05-21 04:05:23 Warning ColdStation kern kernel cpu_startup_entry+0x1d/0x1f 2022-05-21 04:05:23 Warning ColdStation kern kernel do_idle+0x1b7/0x225 2022-05-21 04:05:23 Warning ColdStation kern kernel cpuidle_enter+0x2a/0x36 2022-05-21 04:05:23 Warning ColdStation kern kernel cpuidle_enter_state+0x117/0x1db 2022-05-21 04:05:23 Warning ColdStation kern kernel R13: 0000000000000004 R14: 00017d8a36ebe486 R15: 0000000000000000 2022-05-21 04:05:23 Warning ColdStation kern kernel R10: 0000000000000020 R11: 000000000000024d R12: ffffffff82311ca0 2022-05-21 04:05:23 Warning ColdStation kern kernel RBP: ffffe8ffffc20300 R08: 00000000ffffffff R09: 071c71c71c71c71c 2022-05-21 04:05:23 Warning ColdStation kern kernel RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 2022-05-21 04:05:23 Warning ColdStation kern kernel RAX: ffff88845e42bac0 RBX: 0000000000000004 RCX: 000000000000001f 2022-05-21 04:05:23 Warning ColdStation kern kernel RSP: 0018:ffffffff82203e58 EFLAGS: 00000246 2022-05-21 04:05:23 Warning ColdStation kern kernel Code: b6 b6 1c 00 85 db 48 89 e8 79 03 48 63 c3 5b 5d 41 5c c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 0f 1f 44 00 00 55 49 89 d3 48 81 c7 b0 00 00 00 48 83 c6 70 53 2022-05-21 04:05:23 Warning ColdStation kern kernel RIP: 0010:arch_local_irq_enable+0x7/0x8 2022-05-21 04:05:23 Warning ColdStation kern kernel asm_sysvec_apic_timer_interrupt+0x12/0x20 2022-05-21 04:05:23 Warning ColdStation kern kernel <TASK> 2022-05-21 04:05:23 Warning ColdStation kern kernel </IRQ> 2022-05-21 04:05:23 Warning ColdStation kern kernel sysvec_apic_timer_interrupt+0x66/0x7d 2022-05-21 04:05:23 Warning ColdStation kern kernel __irq_exit_rcu+0x4d/0x88 2022-05-21 04:05:23 Warning ColdStation kern kernel __do_softirq+0xef/0x218 2022-05-21 04:05:23 Warning ColdStation kern kernel run_timer_softirq+0x19/0x2d 2022-05-21 04:05:23 Warning ColdStation kern kernel ? recalibrate_cpu_khz+0x1/0x1 2022-05-21 04:05:23 Warning ColdStation kern kernel ? enqueue_hrtimer+0x62/0x69 2022-05-21 04:05:23 Warning ColdStation kern kernel __run_timers+0x146/0x184 2022-05-21 04:05:23 Warning ColdStation kern kernel call_timer_fn+0x59/0xde 2022-05-21 04:05:23 Warning ColdStation kern kernel ? psched_ppscfg_precompute+0x40/0x40 2022-05-21 04:05:23 Warning ColdStation kern kernel <IRQ> 2022-05-21 04:05:23 Warning ColdStation kern kernel Call Trace: 2022-05-21 04:05:23 Warning ColdStation kern kernel CR2: 000055d0b641e8f0 CR3: 000000000420a006 CR4: 00000000003726f0 2022-05-21 04:05:23 Warning ColdStation kern kernel CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2022-05-21 04:05:23 Warning ColdStation kern kernel FS: 0000000000000000(0000) GS:ffff88845e400000(0000) knlGS:0000000000000000 2022-05-21 04:05:23 Warning ColdStation kern kernel R13: 0000000118fca200 R14: ffffc90000003f18 R15: ffffffff816e01e3 2022-05-21 04:05:23 Warning ColdStation kern kernel R10: 00007fffffffffff R11: ffffffff82866357 R12: 0000000000000000 2022-05-21 04:05:23 Warning ColdStation kern kernel RBP: ffff8881022b4000 R08: ffffffff822b4e28 R09: 0000000000000000 2022-05-21 04:05:23 Warning ColdStation kern kernel RDX: 0000000000000003 RSI: ffffc90000003d50 RDI: ffff88845e41c510 2022-05-21 04:05:23 Warning ColdStation kern kernel RAX: 0000000000000000 RBX: ffff8881022b4480 RCX: 0000000000000027 2022-05-21 04:05:23 Warning ColdStation kern kernel RSP: 0018:ffffc90000003ec8 EFLAGS: 00010282 2022-05-21 04:05:23 Warning ColdStation kern kernel Code: 2b cb 00 00 75 36 48 89 ef c6 05 0c 2b cb 00 01 e8 cb 8e fb ff 44 89 e1 48 89 ee 48 c7 c7 00 04 15 82 48 89 c2 e8 d6 2f 12 00 <0f> 0b eb 0e 41 ff c4 48 05 40 01 00 00 e9 65 ff ff ff 48 8b 83 48 2022-05-21 04:05:23 Warning ColdStation kern kernel RIP: 0010:dev_watchdog+0x115/0x180 2022-05-21 04:05:23 Warning ColdStation kern kernel Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B360M Pro4, BIOS P3.20 09/13/2018 2022-05-21 04:05:23 Warning ColdStation kern kernel CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D W 5.15.38-Unraid #1 2022-05-21 04:05:23 Warning ColdStation kern kernel Modules linked in: xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod nct6775 hwmon_vid efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables e1000e igb i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi ttm drm_kms_helper crct10dif_pclmul crc32_pclmul i2c_i801 crc32c_intel ghash_clmulni_intel aesni_intel wmi_bmof crypto_simd cryptd rapl intel_cstate intel_uncore i2c_smbus drm nvme intel_gtt i2c_algo_bit nvme_core agpgart ahci i2c_core libahci syscopyarea sysfillrect sysimgblt intel_pch_thermal fb_sys_fops wmi video backlight acpi_pad button [last unloaded: e1000e] 2022-05-21 04:05:23 Warning ColdStation kern kernel WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x115/0x180 2022-05-21 04:05:23 Information ColdStation kern kernel NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out 2022-05-21 04:05:23 Warning ColdStation kern kernel ------------[ cut here ]------------ 2022-05-21 04:05:12 Warning ColdStation kern kernel CR2: fffff8efd3e7c108 CR3: 000000010aab4006 CR4: 00000000003726f0 2022-05-21 04:05:12 Warning ColdStation kern kernel CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2022-05-21 04:05:12 Warning ColdStation kern kernel FS: 000014e80877b740(0000) GS:ffff88845e400000(0000) knlGS:0000000000000000 2022-05-21 04:05:12 Warning ColdStation kern kernel R13: 0000000000000000 R14: ffff888197967300 R15: 7c00003bbf4f9f04 2022-05-21 04:05:12 Warning ColdStation kern kernel R10: 0000000000000000 R11: 0000000000000000 R12: fffff8efd3e7c100 2022-05-21 04:05:12 Warning ColdStation kern kernel RBP: ffffea0004768328 R08: ffff888151bb14c0 R09: 0000000000000000 2022-05-21 04:05:12 Warning ColdStation kern kernel RDX: 0000003bbf4f9f04 RSI: ffff88811da0ce60 RDI: fffff8efd3e7c100 2022-05-21 04:05:12 Warning ColdStation kern kernel RAX: ffffea0000000000 RBX: fff00000000006c8 RCX: 0000000000000000 2022-05-21 04:05:12 Warning ColdStation kern kernel RSP: 0000:ffffc900058a7d68 EFLAGS: 00010246 2022-05-21 04:05:12 Warning ColdStation kern kernel Code: 55 f2 ff 5b 5d 41 5c 41 5d c3 8b 44 24 10 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 fe d4 f3 ff 89 d2 89 f6 e9 df d1 f3 ff <48> 8b 57 08 48 89 f8 f6 c2 01 74 04 48 8d 42 ff c3 e8 ea ff ff ff 2022-05-21 04:05:12 Warning ColdStation kern kernel RIP: 0010:_compound_head+0x0/0x11 2022-05-21 04:05:12 Warning ColdStation kern kernel ---[ end trace 712a220cf666edc3 ]--- 2022-05-21 04:05:12 Warning ColdStation kern kernel CR2: fffff8efd3e7c108 2022-05-21 04:05:12 Warning ColdStation kern kernel Modules linked in: xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod nct6775 hwmon_vid efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables e1000e igb i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi ttm drm_kms_helper crct10dif_pclmul crc32_pclmul i2c_i801 crc32c_intel ghash_clmulni_intel aesni_intel wmi_bmof crypto_simd cryptd rapl intel_cstate intel_uncore i2c_smbus drm nvme intel_gtt i2c_algo_bit nvme_core agpgart ahci i2c_core libahci syscopyarea sysfillrect sysimgblt intel_pch_thermal fb_sys_fops wmi video backlight acpi_pad button [last unloaded: e1000e] 2022-05-21 04:05:12 Warning ColdStation kern kernel </TASK> 2022-05-21 04:05:12 Warning ColdStation kern kernel R13: 000055b4931d8e60 R14: 0000000000000001 R15: 0000000000010040 2022-05-21 04:05:12 Warning ColdStation kern kernel R10: 000055b49326f430 R11: 000014e808954ca0 R12: 000014e808954c40 2022-05-21 04:05:12 Warning ColdStation kern kernel RBP: 000000000000c030 R08: 0000000000003fff R09: 000014e808954ca0 2022-05-21 04:05:12 Warning ColdStation kern kernel RDX: 000000000006e1f0 RSI: 0000000000000000 RDI: 00000000000003ff 2022-05-21 04:05:12 Warning ColdStation kern kernel RAX: 0000000000008020 RBX: 000055b4931cce30 RCX: 000055b493107010 2022-05-21 04:05:12 Warning ColdStation kern kernel RSP: 002b:00007ffeca1959f0 EFLAGS: 00010206 2022-05-21 04:05:12 Warning ColdStation kern kernel Code: 08 00 00 0f 86 8b 03 00 00 8b 35 71 b6 13 00 85 f6 0f 85 bd 04 00 00 f6 43 08 01 0f 85 93 00 00 00 48 8b 03 48 29 c3 48 01 c5 <48> 8b 4b 08 48 89 ca 48 83 e2 f8 48 39 c2 0f 85 e0 05 00 00 48 3b 2022-05-21 04:05:12 Warning ColdStation kern kernel RIP: 0033:0x14e80881b2d6 2022-05-21 04:05:12 Warning ColdStation kern kernel asm_exc_page_fault+0x1e/0x30 2022-05-21 04:05:12 Warning ColdStation kern kernel ? asm_exc_page_fault+0x8/0x30 2022-05-21 04:05:12 Warning ColdStation kern kernel exc_page_fault+0xe2/0x101 2022-05-21 04:05:12 Warning ColdStation kern kernel do_user_addr_fault+0x342/0x50b 2022-05-21 04:05:12 Warning ColdStation kern kernel handle_mm_fault+0x11c/0x1e2 2022-05-21 04:05:12 Warning ColdStation kern kernel __handle_mm_fault+0x470/0xc5c 2022-05-21 04:05:12 Warning ColdStation kern kernel ? vm_normal_page+0x1c/0xa4 2022-05-21 04:05:12 Warning ColdStation kern kernel do_swap_page+0x57/0x534 2022-05-21 04:05:12 Warning ColdStation kern kernel __migration_entry_wait+0x48/0x82 2022-05-21 04:05:12 Warning ColdStation kern kernel pfn_swap_entry_to_page+0x27/0x3c 2022-05-21 04:05:12 Warning ColdStation kern kernel <TASK> 2022-05-21 04:05:12 Warning ColdStation kern kernel Call Trace: 2022-05-21 04:05:12 Warning ColdStation kern kernel CR2: fffff8efd3e7c108 CR3: 000000010aab4006 CR4: 00000000003726f0 2022-05-21 04:05:12 Warning ColdStation kern kernel CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2022-05-21 04:05:12 Warning ColdStation kern kernel FS: 000014e80877b740(0000) GS:ffff88845e400000(0000) knlGS:0000000000000000 2022-05-21 04:05:12 Warning ColdStation kern kernel R13: 0000000000000000 R14: ffff888197967300 R15: 7c00003bbf4f9f04 2022-05-21 04:05:12 Warning ColdStation kern kernel R10: 0000000000000000 R11: 0000000000000000 R12: fffff8efd3e7c100 2022-05-21 04:05:12 Warning ColdStation kern kernel RBP: ffffea0004768328 R08: ffff888151bb14c0 R09: 0000000000000000 2022-05-21 04:05:12 Warning ColdStation kern kernel RDX: 0000003bbf4f9f04 RSI: ffff88811da0ce60 RDI: fffff8efd3e7c100 2022-05-21 04:05:12 Warning ColdStation kern kernel RAX: ffffea0000000000 RBX: fff00000000006c8 RCX: 0000000000000000 2022-05-21 04:05:12 Warning ColdStation kern kernel RSP: 0000:ffffc900058a7d68 EFLAGS: 00010246 2022-05-21 04:05:12 Warning ColdStation kern kernel Code: 55 f2 ff 5b 5d 41 5c 41 5d c3 8b 44 24 10 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 fe d4 f3 ff 89 d2 89 f6 e9 df d1 f3 ff <48> 8b 57 08 48 89 f8 f6 c2 01 74 04 48 8d 42 ff c3 e8 ea ff ff ff 2022-05-21 04:05:12 Warning ColdStation kern kernel RIP: 0010:_compound_head+0x0/0x11 2022-05-21 04:05:12 Warning ColdStation kern kernel Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B360M Pro4, BIOS P3.20 09/13/2018 2022-05-21 04:05:12 Warning ColdStation kern kernel CPU: 0 PID: 31954 Comm: nginx Tainted: G W 5.15.38-Unraid #1 2022-05-21 04:05:12 Warning ColdStation kern kernel Oops: 0000 [#1] SMP PTI 2022-05-21 04:05:12 Information ColdStation kern kernel PGD 0 P4D 0 2022-05-21 04:05:12 Alert ColdStation kern kernel #PF: error_code(0x0000) - not-present page 2022-05-21 04:05:12 Alert ColdStation kern kernel #PF: supervisor read access in kernel mode 2022-05-21 04:05:12 Alert ColdStation kern kernel BUG: unable to handle page fault for address: fffff8efd3e7c108 Thanks in advance! Edited May 21, 2022 by coldtech spelling Quote Link to comment
JorgeB Posted May 21, 2022 Share Posted May 21, 2022 Enable the syslog server and post that after a crash together with the diagnostics. Quote Link to comment
coldtech Posted May 21, 2022 Author Share Posted May 21, 2022 (edited) 2 hours ago, JorgeB said: Enable the syslog server and post that after a crash together with the diagnostics. The syslogs are what I attached in my initial post. Those are the last messages before the system halted. I'm also attaching diagnostics now. Thanks! coldstation-diagnostics-20220521-1306.zip Edited May 21, 2022 by coldtech Quote Link to comment
JorgeB Posted May 21, 2022 Share Posted May 21, 2022 1 minute ago, coldtech said: The syslogs are what I attached in my initial post If that's from a syslog server then please attach the complete file. Quote Link to comment
coldtech Posted May 21, 2022 Author Share Posted May 21, 2022 Attaching everything from yesterday and today until after the system was back up. All_2022-5-21-7 22 49_20052022.csv Quote Link to comment
JorgeB Posted May 21, 2022 Share Posted May 21, 2022 Doesn't look like the typical mcvlan issue, looks more like a NIC problem, do you have an add-on NIC you could test with? Quote Link to comment
coldtech Posted May 21, 2022 Author Share Posted May 21, 2022 I'm currently using a 4-port Intel NIC which I could exchange for a 2-port NIC that I have ling around - I think it uses the same chipset though. Should I still try? Do you think it may be a hardware or a driver issue? Quote Link to comment
JorgeB Posted May 21, 2022 Share Posted May 21, 2022 4 minutes ago, coldtech said: Should I still try? Still worth a try. 4 minutes ago, coldtech said: Do you think it may be a hardware or a driver issue? Difficult to say. Quote Link to comment
coldtech Posted May 21, 2022 Author Share Posted May 21, 2022 OK, will do. Thanks for the advice. May I ask what exactly is looking suspicious to you? Just so that if/when it happens next time, I know what to look for... What looked iffy to me was 2022-05-21,04:05:12,Alert,ColdStation,kern,kernel,#PF: error_code(0x0000) - not-present page 2022-05-21,04:05:12,Alert,ColdStation,kern,kernel,#PF: supervisor read access in kernel mode 2022-05-21,04:05:12,Alert,ColdStation,kern,kernel,BUG: unable to handle page fault for address: fffff8efd3e7c108 . . . 2022-05-21,04:05:12,Warning,ColdStation,kern,kernel,Oops: 0000 [#1] SMP PTI . . . 2022-05-21,04:05:23,Error,ColdStation,kern,kernel,igb 0000:02:00.0 eth0: Reset adapter Quote Link to comment
JorgeB Posted May 21, 2022 Share Posted May 21, 2022 8 minutes ago, coldtech said: 2022-05-21,04:05:23,Error,ColdStation,kern,kernel,igb 0000:02:00.0 eth0: Reset adapter That and this: 2022-05-21,04:05:23,Information,ColdStation,kern,kernel,NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out; It suggests the NIC stopped responding and was reset, but id the server doesn't come back online the reset likely failed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.