UnRaid kernel(?) crashes now and then (logs attached)


Recommended Posts

Hi,

 

for some months now I have been having an issue where my UnRaid server crashes over night every few weeks. Sometimes only the webUI is not accessible, but most times it also does not respond to ping anymore - which is bad since almost my whole house automation and data gathering is running on that machine inside of several docker images.

 

I'm on 6.10 now, but the issue started happening in 6.9 3-4 months ago.

Someone mentioned in another thread where there was a similar problem to switch to ipvlan interfaces for docker, but I'm not sure if the issue really is the same. These are the last log messages that arrived at my log server - maybe anyone with a bit of experience can tell me where the actual issue might be or which component may cause it:

 

2022-05-21	04:05:23	Information	ColdStation	kern	kernel	igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
2022-05-21	04:05:23	Error	ColdStation	kern	kernel	igb 0000:02:00.0 eth0: Reset adapter
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	---[ end trace 712a220cf666edc4 ]---
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	</TASK>
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	secondary_startup_64_no_verify+0xb0/0xbb
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	start_kernel+0x656/0x67b
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	cpu_startup_entry+0x1d/0x1f
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	do_idle+0x1b7/0x225
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	cpuidle_enter+0x2a/0x36
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	cpuidle_enter_state+0x117/0x1db
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	R13: 0000000000000004 R14: 00017d8a36ebe486 R15: 0000000000000000
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	R10: 0000000000000020 R11: 000000000000024d R12: ffffffff82311ca0
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RBP: ffffe8ffffc20300 R08: 00000000ffffffff R09: 071c71c71c71c71c
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RAX: ffff88845e42bac0 RBX: 0000000000000004 RCX: 000000000000001f
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RSP: 0018:ffffffff82203e58 EFLAGS: 00000246
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	Code: b6 b6 1c 00 85 db 48 89 e8 79 03 48 63 c3 5b 5d 41 5c c3 9c 58 0f 1f 44 00 00 c3 fa 66 0f 1f 44 00 00 c3 fb 66 0f 1f 44 00 00 <c3> 0f 1f 44 00 00 55 49 89 d3 48 81 c7 b0 00 00 00 48 83 c6 70 53
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RIP: 0010:arch_local_irq_enable+0x7/0x8
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	asm_sysvec_apic_timer_interrupt+0x12/0x20
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	<TASK>
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	</IRQ>
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	sysvec_apic_timer_interrupt+0x66/0x7d
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	__irq_exit_rcu+0x4d/0x88
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	__do_softirq+0xef/0x218
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	run_timer_softirq+0x19/0x2d
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	? recalibrate_cpu_khz+0x1/0x1
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	? enqueue_hrtimer+0x62/0x69
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	__run_timers+0x146/0x184
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	call_timer_fn+0x59/0xde
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	? psched_ppscfg_precompute+0x40/0x40
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	<IRQ>
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	Call Trace:
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	CR2: 000055d0b641e8f0 CR3: 000000000420a006 CR4: 00000000003726f0
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	FS: 0000000000000000(0000) GS:ffff88845e400000(0000) knlGS:0000000000000000
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	R13: 0000000118fca200 R14: ffffc90000003f18 R15: ffffffff816e01e3
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	R10: 00007fffffffffff R11: ffffffff82866357 R12: 0000000000000000
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RBP: ffff8881022b4000 R08: ffffffff822b4e28 R09: 0000000000000000
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RDX: 0000000000000003 RSI: ffffc90000003d50 RDI: ffff88845e41c510
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RAX: 0000000000000000 RBX: ffff8881022b4480 RCX: 0000000000000027
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RSP: 0018:ffffc90000003ec8 EFLAGS: 00010282
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	Code: 2b cb 00 00 75 36 48 89 ef c6 05 0c 2b cb 00 01 e8 cb 8e fb ff 44 89 e1 48 89 ee 48 c7 c7 00 04 15 82 48 89 c2 e8 d6 2f 12 00 <0f> 0b eb 0e 41 ff c4 48 05 40 01 00 00 e9 65 ff ff ff 48 8b 83 48
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	RIP: 0010:dev_watchdog+0x115/0x180
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B360M Pro4, BIOS P3.20 09/13/2018
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D W 5.15.38-Unraid #1
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	Modules linked in: xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod nct6775 hwmon_vid efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables e1000e igb i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi ttm drm_kms_helper crct10dif_pclmul crc32_pclmul i2c_i801 crc32c_intel ghash_clmulni_intel aesni_intel wmi_bmof crypto_simd cryptd rapl intel_cstate intel_uncore i2c_smbus drm nvme intel_gtt i2c_algo_bit nvme_core agpgart ahci i2c_core libahci syscopyarea sysfillrect sysimgblt intel_pch_thermal fb_sys_fops wmi video backlight acpi_pad button [last unloaded: e1000e]
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x115/0x180
2022-05-21	04:05:23	Information	ColdStation	kern	kernel	NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out
2022-05-21	04:05:23	Warning	ColdStation	kern	kernel	------------[ cut here ]------------
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	CR2: fffff8efd3e7c108 CR3: 000000010aab4006 CR4: 00000000003726f0
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	FS: 000014e80877b740(0000) GS:ffff88845e400000(0000) knlGS:0000000000000000
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	R13: 0000000000000000 R14: ffff888197967300 R15: 7c00003bbf4f9f04
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	R10: 0000000000000000 R11: 0000000000000000 R12: fffff8efd3e7c100
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RBP: ffffea0004768328 R08: ffff888151bb14c0 R09: 0000000000000000
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RDX: 0000003bbf4f9f04 RSI: ffff88811da0ce60 RDI: fffff8efd3e7c100
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RAX: ffffea0000000000 RBX: fff00000000006c8 RCX: 0000000000000000
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RSP: 0000:ffffc900058a7d68 EFLAGS: 00010246
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	Code: 55 f2 ff 5b 5d 41 5c 41 5d c3 8b 44 24 10 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 fe d4 f3 ff 89 d2 89 f6 e9 df d1 f3 ff <48> 8b 57 08 48 89 f8 f6 c2 01 74 04 48 8d 42 ff c3 e8 ea ff ff ff
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RIP: 0010:_compound_head+0x0/0x11
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	---[ end trace 712a220cf666edc3 ]---
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	CR2: fffff8efd3e7c108
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	Modules linked in: xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod nct6775 hwmon_vid efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables e1000e igb i915 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi ttm drm_kms_helper crct10dif_pclmul crc32_pclmul i2c_i801 crc32c_intel ghash_clmulni_intel aesni_intel wmi_bmof crypto_simd cryptd rapl intel_cstate intel_uncore i2c_smbus drm nvme intel_gtt i2c_algo_bit nvme_core agpgart ahci i2c_core libahci syscopyarea sysfillrect sysimgblt intel_pch_thermal fb_sys_fops wmi video backlight acpi_pad button [last unloaded: e1000e]
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	</TASK>
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	R13: 000055b4931d8e60 R14: 0000000000000001 R15: 0000000000010040
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	R10: 000055b49326f430 R11: 000014e808954ca0 R12: 000014e808954c40
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RBP: 000000000000c030 R08: 0000000000003fff R09: 000014e808954ca0
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RDX: 000000000006e1f0 RSI: 0000000000000000 RDI: 00000000000003ff
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RAX: 0000000000008020 RBX: 000055b4931cce30 RCX: 000055b493107010
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RSP: 002b:00007ffeca1959f0 EFLAGS: 00010206
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	Code: 08 00 00 0f 86 8b 03 00 00 8b 35 71 b6 13 00 85 f6 0f 85 bd 04 00 00 f6 43 08 01 0f 85 93 00 00 00 48 8b 03 48 29 c3 48 01 c5 <48> 8b 4b 08 48 89 ca 48 83 e2 f8 48 39 c2 0f 85 e0 05 00 00 48 3b
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RIP: 0033:0x14e80881b2d6
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	asm_exc_page_fault+0x1e/0x30
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	? asm_exc_page_fault+0x8/0x30
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	exc_page_fault+0xe2/0x101
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	do_user_addr_fault+0x342/0x50b
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	handle_mm_fault+0x11c/0x1e2
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	__handle_mm_fault+0x470/0xc5c
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	? vm_normal_page+0x1c/0xa4
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	do_swap_page+0x57/0x534
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	__migration_entry_wait+0x48/0x82
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	pfn_swap_entry_to_page+0x27/0x3c
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	<TASK>
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	Call Trace:
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	CR2: fffff8efd3e7c108 CR3: 000000010aab4006 CR4: 00000000003726f0
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	FS: 000014e80877b740(0000) GS:ffff88845e400000(0000) knlGS:0000000000000000
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	R13: 0000000000000000 R14: ffff888197967300 R15: 7c00003bbf4f9f04
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	R10: 0000000000000000 R11: 0000000000000000 R12: fffff8efd3e7c100
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RBP: ffffea0004768328 R08: ffff888151bb14c0 R09: 0000000000000000
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RDX: 0000003bbf4f9f04 RSI: ffff88811da0ce60 RDI: fffff8efd3e7c100
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RAX: ffffea0000000000 RBX: fff00000000006c8 RCX: 0000000000000000
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RSP: 0000:ffffc900058a7d68 EFLAGS: 00010246
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	Code: 55 f2 ff 5b 5d 41 5c 41 5d c3 8b 44 24 10 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 fe d4 f3 ff 89 d2 89 f6 e9 df d1 f3 ff <48> 8b 57 08 48 89 f8 f6 c2 01 74 04 48 8d 42 ff c3 e8 ea ff ff ff
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	RIP: 0010:_compound_head+0x0/0x11
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B360M Pro4, BIOS P3.20 09/13/2018
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	CPU: 0 PID: 31954 Comm: nginx Tainted: G W 5.15.38-Unraid #1
2022-05-21	04:05:12	Warning	ColdStation	kern	kernel	Oops: 0000 [#1] SMP PTI
2022-05-21	04:05:12	Information	ColdStation	kern	kernel	PGD 0 P4D 0
2022-05-21	04:05:12	Alert	ColdStation	kern	kernel	#PF: error_code(0x0000) - not-present page
2022-05-21	04:05:12	Alert	ColdStation	kern	kernel	#PF: supervisor read access in kernel mode
2022-05-21	04:05:12	Alert	ColdStation	kern	kernel	BUG: unable to handle page fault for address: fffff8efd3e7c108

 

Thanks in advance!

Edited by coldtech
spelling
Link to comment

OK, will do. Thanks for the advice. May I ask what exactly is looking suspicious to you? Just so that if/when it happens next time, I know what to look for...

What looked iffy to me was

2022-05-21,04:05:12,Alert,ColdStation,kern,kernel,#PF: error_code(0x0000) - not-present page
2022-05-21,04:05:12,Alert,ColdStation,kern,kernel,#PF: supervisor read access in kernel mode
2022-05-21,04:05:12,Alert,ColdStation,kern,kernel,BUG: unable to handle page fault for address: fffff8efd3e7c108

.
.
.

2022-05-21,04:05:12,Warning,ColdStation,kern,kernel,Oops: 0000 [#1] SMP PTI

.
.
.
2022-05-21,04:05:23,Error,ColdStation,kern,kernel,igb 0000:02:00.0 eth0: Reset adapter

 

Link to comment
8 minutes ago, coldtech said:
2022-05-21,04:05:23,Error,ColdStation,kern,kernel,igb 0000:02:00.0 eth0: Reset adapter

That and this:

2022-05-21,04:05:23,Information,ColdStation,kern,kernel,NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out;

 

It suggests the NIC stopped responding and was reset, but id the server doesn't come back online the reset likely failed.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.