corgan Posted June 19, 2020 Posted June 19, 2020 (edited) Hello my Unraid Server still [1] [2] randomly crashed with a RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] message, i dont really understand. The error happend around every 3-5 days, or like today, 2 times a day. I cant really reproduce it. So please help me to understand where the error is comming from and what i could do about it. Thanks! Running Unraid 6.8.3 Network: The server is connected with 2x 10gb on a Mikrotik CRS312-4C+8XG-RM. Bond Mode 4 (802.3ad) Router: pfSense @ Dell RS210 II WiFi: Asus RT-AX88U + Lyra I have some VMs configured, but not running. Running Dockers: Gitlab-CE hddtemp2influx JD mariaDB NginxProxy phpmyadmin telegraf zabbix-agent zerotier Hardware MB: ASRock X470D4U2-2T CPU: AMD Ryzen 7 3700X GPU1: Nvidia GeForce RTX 2060 @ NVMe Slot2 on AST1150 PCI-to-PCI Bridge GPU2: Radeon RX 570 @ PCIe x8 Onboard 2x 10gb NIC Array HBA: LSI SAS2308 PCI-Express Fusion-MPT SAS-2 @ PCIe x8 8x16TB HDD Cache: Onboard 400 Series Chipset SATA Controller ASMedia Technology 5x 500gb Sandisk SSD UD: 1tb nvme0n1 - SSD 970 EVO PCIe ACS override active Thinks i allready tryed: - I added rcu_nocbs=0-15 to the boot option. Quote kernel: general protection fault: 0000 [#1] SMP NOPTI kernel: CPU: 4 PID: 19723 Comm: curl Tainted: G W O 4.19.107-Unraid #1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U2-2T, BIOS P3.30 10/03/2019 kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98 kernel: RSP: 0018:ffff88881e7036d8 EFLAGS: 00010202 kernel: RAX: ffff88811593ab06 RBX: ffffffff81e91080 RCX: 000000003a6b22e9 kernel: RDX: ffff888798580000 RSI: 000000003ec6935f RDI: 000000006a07be3d kernel: RBP: ffff88881e7037b0 R08: ffff88881e703708 R09: ffffffff81c8a6e0 kernel: R10: ffff8887d5bb4388 R11: 0000000000000000 R12: 0000000000000000 kernel: R13: 0c800bfffffffee0 R14: ffff88813b494500 R15: ffff88881e7037c4 kernel: FS: 000014a0958cde00(0000) GS:ffff88881e700000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 000014a096abd9e0 CR3: 00000002250f4000 CR4: 0000000000340ee0 kernel: DR0: 00007ff73530c8e0 DR1: 00007ff73530c8e0 DR2: 00007ff73530c8e0 kernel: DR3: 00007ff73530c8e0 DR6: 00000000ffff0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: <IRQ> kernel: ? __krealloc+0x25/0x5d kernel: ? nf_ct_ext_add+0x97/0xf6 kernel: nf_nat_masquerade_ipv4+0x123/0x14b [nf_nat_ipv4] kernel: masquerade_tg+0x44/0x5e [ipt_MASQUERADE] kernel: ? __dev_queue_xmit+0x5ff/0x627 kernel: ipt_do_table+0x582/0x62a [ip_tables] kernel: ? ipt_do_table+0x5da/0x62a [ip_tables] kernel: nf_nat_inet_fn+0xeb/0x1b9 [nf_nat] kernel: nf_nat_ipv4_out+0xf/0x89 [nf_nat_ipv4] kernel: nf_hook_slow+0x3a/0x90 kernel: ip_output+0xab/0xdd kernel: ? ip_fragment.constprop.0+0x7d/0x7d kernel: ip_forward+0x3c0/0x3ef kernel: ? ipv4_frags_exit_net+0x2b/0x2b kernel: ip_sabotage_in+0x38/0x3e kernel: nf_hook_slow+0x3a/0x90 kernel: ip_rcv+0x8e/0xbe kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 kernel: __netif_receive_skb_one_core+0x53/0x6f kernel: netif_receive_skb_internal+0x79/0x94 kernel: br_pass_frame_up+0x128/0x14a kernel: ? br_port_flags_change+0x29/0x29 kernel: br_handle_frame_finish+0x342/0x383 kernel: ? br_pass_frame_up+0x14a/0x14a kernel: br_nf_hook_thresh+0xa3/0xc3 kernel: ? br_pass_frame_up+0x14a/0x14a kernel: br_nf_pre_routing_finish+0x24a/0x271 kernel: ? br_pass_frame_up+0x14a/0x14a kernel: ? br_handle_local_finish+0xe/0xe kernel: ? nf_nat_ipv4_in+0x1e/0x62 [nf_nat_ipv4] kernel: ? br_handle_local_finish+0xe/0xe kernel: br_nf_pre_routing+0x31c/0x343 kernel: ? br_nf_forward_ip+0x362/0x362 kernel: nf_hook_slow+0x3a/0x90 kernel: br_handle_frame+0x27e/0x2bd kernel: ? br_pass_frame_up+0x14a/0x14a kernel: __netif_receive_skb_core+0x4a7/0x7b1 kernel: __netif_receive_skb_one_core+0x35/0x6f kernel: process_backlog+0x77/0x10e kernel: net_rx_action+0x107/0x26c kernel: __do_softirq+0xc9/0x1d7 kernel: do_softirq_own_stack+0x2a/0x40 kernel: </IRQ> kernel: do_softirq+0x4d/0x5a kernel: __local_bh_enable_ip+0x42/0x4a kernel: ip_finish_output2+0x30d/0x353 kernel: ? __switch_to_asm+0x41/0x70 kernel: ip_output+0xbe/0xdd kernel: __ip_queue_xmit+0x309/0x333 kernel: ? __kmalloc_reserve.isra.0+0x27/0x68 kernel: __tcp_transmit_skb+0x8a5/0x93f kernel: tcp_connect+0x7c6/0x87a kernel: tcp_v4_connect+0x412/0x46b kernel: __inet_stream_connect+0xd3/0x2b7 kernel: ? __handle_mm_fault+0xea3/0x11b7 kernel: inet_stream_connect+0x31/0x45 kernel: __sys_connect+0x73/0xad kernel: ? do_fcntl+0x28f/0x58f kernel: ? __se_sys_fcntl+0x4e/0x6b kernel: __x64_sys_connect+0x11/0x14 kernel: do_syscall_64+0x57/0xf2 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: RIP: 0033:0x14a096c4a53b kernel: Code: 83 ec 18 89 54 24 0c 48 89 34 24 89 7c 24 08 e8 bb fa ff ff 8b 54 24 0c 48 8b 34 24 41 89 c0 8b 7c 24 08 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 08 e8 f1 fa ff ff 8b 44 kernel: RSP: 002b:00007ffda11bd7d0 EFLAGS: 00000293 ORIG_RAX: 000000000000002a kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 000014a096c4a53b kernel: RDX: 0000000000000010 RSI: 00007ffda11bd970 RDI: 0000000000000005 kernel: RBP: 00005556b8757ab0 R08: 0000000000000000 R09: 003931312e353931 kernel: R10: 0000000000000002 R11: 0000000000000293 R12: 0000000000000000 kernel: R13: 00005556b8758db0 R14: 0000000000000005 R15: 0000000000000000 kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod ipmi_devintf nct6775 hwmon_vid k10temp bonding ixgbe(O) edac_mce_amd kvm_amd ipmi_ssif kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc m kernel: ---[ end trace 34dd9e13a6df294b ]--- kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98 kernel: RSP: 0018:ffff88881e7036d8 EFLAGS: 00010202 kernel: RAX: ffff88811593ab06 RBX: ffffffff81e91080 RCX: 000000003a6b22e9 kernel: RDX: ffff888798580000 RSI: 000000003ec6935f RDI: 000000006a07be3d kernel: RBP: ffff88881e7037b0 R08: ffff88881e703708 R09: ffffffff81c8a6e0 kernel: R10: ffff8887d5bb4388 R11: 0000000000000000 R12: 0000000000000000 kernel: R13: 0c800bfffffffee0 R14: ffff88813b494500 R15: ffff88881e7037c4 kernel: FS: 000014a0958cde00(0000) GS:ffff88881e700000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 000014a096abd9e0 CR3: 00000002250f4000 CR4: 0000000000340ee0 kernel: DR0: 00007ff73530c8e0 DR1: 00007ff73530c8e0 DR2: 00007ff73530c8e0 and Quote kernel: general protection fault: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 14491 Comm: curl Tainted: G W O 4.19.107-Unraid #1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U2-2T, BIOS P3.30 10/03/2019 kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98 kernel: RSP: 0018:ffff88881e7836d8 EFLAGS: 00010202 kernel: RAX: ffff88841c647f11 RBX: ffffffff81e91080 RCX: 00000000a1ff25a9 kernel: RDX: ffff88879b480000 RSI: 0000000002fc0ed5 RDI: 000000007c87bd3c kernel: RBP: ffff88881e7837b0 R08: ffff88881e783708 R09: ffffffff81c8aa80 kernel: R10: 0000000000000348 R11: 0000000000000000 R12: 0000000000000000 kernel: R13: 025a0a22736368da R14: ffff88813eb657c0 R15: ffff88881e7837c4 kernel: FS: 000014b2fa342700(0000) GS:ffff88881e780000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 000014b2fa33fe40 CR3: 00000004a059c000 CR4: 0000000000340ee0 kernel: Call Trace: kernel: <IRQ> kernel: ? __krealloc+0x25/0x5d kernel: ? nf_ct_ext_add+0x97/0xf6 kernel: nf_nat_masquerade_ipv4+0x123/0x14b [nf_nat_ipv4] kernel: masquerade_tg+0x44/0x5e [ipt_MASQUERADE] kernel: ? __dev_queue_xmit+0x5ff/0x627 kernel: ipt_do_table+0x582/0x62a [ip_tables] kernel: ? ipt_do_table+0x5da/0x62a [ip_tables] kernel: nf_nat_inet_fn+0xeb/0x1b9 [nf_nat] kernel: nf_nat_ipv4_out+0xf/0x89 [nf_nat_ipv4] kernel: nf_hook_slow+0x3a/0x90 kernel: ip_output+0xab/0xdd kernel: ? ip_fragment.constprop.0+0x7d/0x7d kernel: ip_forward+0x3c0/0x3ef kernel: ? ipv4_frags_exit_net+0x2b/0x2b kernel: ip_sabotage_in+0x38/0x3e kernel: nf_hook_slow+0x3a/0x90 kernel: ip_rcv+0x8e/0xbe kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 kernel: __netif_receive_skb_one_core+0x53/0x6f kernel: netif_receive_skb_internal+0x79/0x94 kernel: br_pass_frame_up+0x128/0x14a kernel: ? br_port_flags_change+0x29/0x29 kernel: br_handle_frame_finish+0x342/0x383 kernel: ? br_pass_frame_up+0x14a/0x14a kernel: br_nf_hook_thresh+0xa3/0xc3 kernel: ? br_pass_frame_up+0x14a/0x14a kernel: br_nf_pre_routing_finish+0x24a/0x271 kernel: ? br_pass_frame_up+0x14a/0x14a kernel: ? br_handle_local_finish+0xe/0xe kernel: ? nf_nat_ipv4_in+0x1e/0x62 [nf_nat_ipv4] kernel: ? br_handle_local_finish+0xe/0xe kernel: br_nf_pre_routing+0x31c/0x343 kernel: ? br_nf_forward_ip+0x362/0x362 kernel: nf_hook_slow+0x3a/0x90 kernel: br_handle_frame+0x27e/0x2bd kernel: ? br_pass_frame_up+0x14a/0x14a kernel: __netif_receive_skb_core+0x4a7/0x7b1 kernel: ? enqueue_task_fair+0xba/0x676 kernel: __netif_receive_skb_one_core+0x35/0x6f kernel: process_backlog+0x77/0x10e kernel: net_rx_action+0x107/0x26c kernel: __do_softirq+0xc9/0x1d7 kernel: do_softirq_own_stack+0x2a/0x40 kernel: do_softirq+0x4d/0x5a kernel: __local_bh_enable_ip+0x42/0x4a kernel: ip_finish_output2+0x30d/0x353 kernel: ip_output+0xbe/0xdd kernel: ? ip_reply_glue_bits+0x36/0x36 kernel: ip_send_skb+0x10/0x32 kernel: udp_send_skb+0x26a/0x2cb kernel: udp_sendmsg+0x5df/0x809 kernel: ? ip_reply_glue_bits+0x36/0x36 kernel: ? rw_copy_check_uvector+0x6d/0xf2 kernel: ? import_iovec+0x6f/0xa3 kernel: ? copy_msghdr_from_user+0xf7/0x115 kernel: ? sock_sendmsg+0x14/0x1e kernel: sock_sendmsg+0x14/0x1e kernel: ___sys_sendmsg+0x1b1/0x236 kernel: ? __alloc_pages_nodemask+0x150/0xae1 kernel: ? __ip_dev_find+0x1e/0xc6 kernel: ? ip_route_output_key_hash_rcu+0x51a/0x65a kernel: ? ip4_datagram_release_cb+0x4e/0x1a5 kernel: __sys_sendmmsg+0xfc/0x17b kernel: ? __sys_connect+0x86/0xad kernel: __x64_sys_sendmmsg+0x1b/0x1e kernel: do_syscall_64+0x57/0xf2 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: RIP: 0033:0x14b2fb8c3c5e kernel: Code: 10 89 7c 24 0c 89 4c 24 1c e8 1e 3b f7 ff 44 8b 54 24 1c 8b 54 24 18 41 89 c0 48 8b 74 24 10 8b 7c 24 0c b8 33 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2c 44 89 c7 89 44 24 0c e8 4e 3b f7 ff 8b 44 kernel: RSP: 002b:000014b2fa33fc30 EFLAGS: 00000293 ORIG_RAX: 0000000000000133 kernel: RAX: ffffffffffffffda RBX: 000000000e630009 RCX: 000014b2fb8c3c5e kernel: RDX: 0000000000000002 RSI: 000014b2fa33fdd0 RDI: 0000000000000007 kernel: RBP: 000014b2fa33fd70 R08: 0000000000000000 R09: 0000000000000007 kernel: R10: 0000000000004000 R11: 0000000000000293 R12: 0000000000000000 kernel: R13: 000014b2fa33fda8 R14: 0000000000000000 R15: 0000000000000000 kernel: Modules linked in: vhost_net vhost tap kvm_amd ccp kvm macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables tun veth xt_nat ipt_MASQUERADE iptabl kernel: ---[ end trace c0ec99ed8429dee3 ]--- kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98 kernel: RSP: 0018:ffff88881e7836d8 EFLAGS: 00010202 kernel: RAX: ffff88841c647f11 RBX: ffffffff81e91080 RCX: 00000000a1ff25a9 kernel: RDX: ffff88879b480000 RSI: 0000000002fc0ed5 RDI: 000000007c87bd3c kernel: RBP: ffff88881e7837b0 R08: ffff88881e783708 R09: ffffffff81c8aa80 kernel: R10: 0000000000000348 R11: 0000000000000000 R12: 0000000000000000 kernel: R13: 025a0a22736368da R14: ffff88813eb657c0 R15: ffff88881e7837c4 kernel: FS: 000014b2fa342700(0000) GS:ffff88881e780000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 000014b2fa33fe40 CR3: 00000004a059c000 CR4: 0000000000340ee0 I allready did all things what mentioned in this thread. added rcu_nocbs=0-15 to the boot option. Bios: set Power Supply Idle Control to Typical Bios: C6 Enabled (there was some missleading comments at first, but i think turning off was not correct?) Latest Bios So any other tips what i can do? Edited June 19, 2020 by corgan Quote
tlrmcknz Posted November 25, 2020 Posted November 25, 2020 Did you ever make any headway on this? Saw something extremely similar today for me: Short hardware list: MB: ASRock B450 Pro4 CPU: AMD Ryzen 7 1800X GPU1: Nvidia GeForce GTX 1650 My BIOS cstate = DISABLED Slightly truncated kernel log from remote syslog: general protection fault: 0000 [#1] SMP NOPTI CPU: 3 PID: 21050 Comm: nzbget Tainted: P W O 4.19.107-Unraid #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4, BIOS P4.20 06/18/2020 RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98 hrtimer: interrupt took 2467865 ns RSP: 0018:ffff88881e6c36d8 EFLAGS: 00010206 general protection fault: 0000 [#1] SMP NOPTI CPU: 3 PID: 21050 Comm: nzbget Tainted: P W O 4.19.107-Unraid #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4, BIOS P4.20 06/18/2020 RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98 hrtimer: interrupt took 2467865 ns RSP: 0018:ffff88881e6c36d8 EFLAGS: 00010206 RAX: ffff88813f034906 RBX: ffffffff81e91080 RCX: 0000000086ee2de1 RDX: ffff8887d1180000 RSI: 0000000045680e5d RDI: 0000000087f6373f RBP: ffff88881e6c37b0 R08: ffff88881e6c3708 R09: ffffffff81c8a6e0 R10: ffff8884c1518388 R11: 0000000000000000 R12: 0000000000000000 R13: 04fa37f2f9f462ee R14: ffff888050b64b40 R15: ffff88881e6c37c4 FS: 0000153f3737bb20(0000) GS:ffff88881e6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000153f36cc6fd0 CR3: 00000005711c0000 CR4: 00000000003406e0 Call Trace: ... Quote
corgan Posted November 26, 2020 Author Posted November 26, 2020 actually no, i cant fix this. It happend sometime once a week and sometimes once a mounth. I wrote a little bash script which reads the ipmi error log from the ASRock board and if the kernel error appears then send a reboot signal and clear the logs. Additional it sends a slack notice. Thats the same way, i would handle the error by hand. This runs as cron job on my home assistant raspi. You need the install ipmi tools. you have to change USER and PASSWORD in the script with the actuall username and password ipmi_check.sh count=$(/usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel info | grep Entries | cut -d: -f2 ) datetime=$(date) seconds=$(date +%s) function sl_send(){ log=$(/usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel list) if [[ $1 -gt 0 ]]; then img="https://knaak.org/assets/img/icons/backup200x200.png" ti="OS Critical Stop" ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD power cycle fi if [[ $2 -gt 0 ]]; then img="https://knaak.org/assets/img/icons/backup200x200.png" va="Kernel Panic" fi slack chat send \ --actions '{"type": "button", "style": "primary", "text": "Check on HA", "url": "https://ha.knaak.work"}' \ --author 'Homeassistant' \ --author-icon 'https://knaak.org/assets/img/icons/warning.png' \ --author-link 'https://192.168.2.241' \ --channel '#officeknaak' \ --color '#8B0000' \ --fields '{"title": "", "value": "", "short": true}' \ --footer 'footer' \ --footer-icon 'https://knaak.org/assets/img/icons/info_red.png' \ --image "$img" \ --pretext "$datetime" \ --text "$log" \ --time $seconds \ --title 'New IPMI Log' \ --title-link 'https://github.com/rockymadden/slack-cli' } if [[ $count -eq 0 ]]; then echo "String is empty" elif [[ $count -gt 0 ]]; then /usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel list >sel.log os_crit_count=$(cat sel.log | grep "OS Critical Stop" | wc -l) kernel_panic_count=$(cat sel.log | grep "kernel panic" | wc -l) sl_send $os_crit_count $kernel_panic_count slack file upload sel.log '#officeknaak' rm sel.log /usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel list >>sel_all.txt /usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel clear # echo $log Quote
5252525111 Posted February 3, 2021 Posted February 3, 2021 (edited) Every time I search this issue, I come right back to this post. I MAY have found the solution and hope it may help someone else. I'll preface this first. I've been having this issue for months. On average I would crash 3-4 days, and once I made it to 5 only to be let down again. At time of writing this it marks my 10th day stable, which is no major milestone but looks promising. I made a few changes at once and I'm not sure which one fixed it so I'll write them all down. I noticed we had the same switch, the Mikrotik CRS312-4C+8XG-RM. My first change was here, under the link tab I disabled (unchecked) all "Flow Control Tx/Rx" from unraid ports. I don't know why this was enabled by default but unless you need it, disable it. In unraid I made changes to my network settings. The `nf_nat_setup_info` issue when doing research is network related. after digging in non unraid specific issues it seems to be a wide range of things. Settings> Network Settings My main network in unraid was let mostly unchanged. I assigned a static IP in unraid, matching the static mapping on my router. If you have VLans on your main network it seems that unraid broadcasts the vlans with the same mac and can cause confusion for your router. I set all vlans on this network to not get auto assigned IP. Note, if you are using bridge network on this vlan you'll have to have an ip. I followed a guide to set this up. https://staging.forums.unraid.net/topic/62107-network-isolation-in-unraid-64/?ct=1612387651 Settings > Docker This is the one I truly believe may have fixed the issue. One of the issues I've read is that the docker service assigns an ip address to containers that are in use by something else on your network.... or something along those lines. So you should be setting a range that your router doesn't use and have your docker service run in that range. To do this set enable docker to no and apply change. Once your containers are turned off make sure you're in advanced view. You'll see check boxes for custom network. for each of those enabled you'll want to set a DHCP range. So lets say on your router you isolate 192.168.1.128 to 192.168.1.159 which is br0 you can write the CIDR as 192.168.1.128/27 (you can do CIDR transitions at this site https://www.ipaddressguide.com/cidr). Hopefully those make sense. Figured I'd share, in hopes this is the solution and possibly help someone else out there. If I crash again I'll update this to let other know. If something above is not clear let me know I'll try to clarify. This is mostly new to me. Edited February 3, 2021 by 5252525111 Quote
corgan Posted February 3, 2021 Author Posted February 3, 2021 Nice hind with the flow control, which was also activated on my CRS312. I will try this. I think you are absolutely right with your last point. I changed the IP range of the BR net away from my normal DHCP IP range some time ago. My crashes went down to 1-2 per month, but I changed a lot at once and didn't figure out which change was the "one". But now as I'm reading your comment, this makes absolutely sense! Quote
Shonky Posted September 26, 2021 Posted September 26, 2021 (edited) How's your fix holding up? I get this sporadically. certainly not every 3-4 days. 1-3 months perhaps. I do not have the mentioned switch. I have a static IP but don't use VLANs within unRAID nor on my network. I don't really follow the fix though. I do have a br0 setup and some of my dockers connect that way to get a specific LAN IP of their own. My docker br0 is IPv4 custom network on interface br0: Subnet: 192.168.99.0/24 Gateway: 192.168.99.1 DHCP pool: not set That's my LAN too. Is the suggested fix to have docker restricted to a sub range of the full /24 and then for each docker that needs it, only use IPs within that range? My DHCP range for my LAN is 192.168.99.100-192.168.99.255. Below .100 I reserve for static assigned IPs and that's where my dockers that have their own IP run from. @corgan if you're still getting the problem does that not mean you didn't really fix it though? Edited September 26, 2021 by Shonky Quote
5252525111 Posted October 7, 2021 Posted October 7, 2021 On 9/26/2021 at 1:11 AM, Shonky said: How's your fix holding up? I get this sporadically. certainly not every 3-4 days. 1-3 months perhaps. So far it's holding up well. I'm having new issues with NFS but don't think that's related to this at all. Long story short, no more panics. On 9/26/2021 at 1:11 AM, Shonky said: Is the suggested fix to have docker restricted to a sub range of the full /24 and then for each docker that needs it, only use IPs within that range? That was one of the main fixes for me. I used a range outside my DHCP and reserve that for containers on unraid. If I recall correctly since I currently don't have access to my system, 100-223 is my LAN range, 50-99 I used for static and 224-255 (192.168.99.224/27) I reserve for containers on unraid. Quote
Shonky Posted October 10, 2021 Posted October 10, 2021 On 10/7/2021 at 11:24 PM, 5252525111 said: So far it's holding up well. I'm having new issues with NFS but don't think that's related to this at all. Long story short, no more panics. That was one of the main fixes for me. I used a range outside my DHCP and reserve that for containers on unraid. If I recall correctly since I currently don't have access to my system, 100-223 is my LAN range, 50-99 I used for static and 224-255 (192.168.99.224/27) I reserve for containers on unraid. Ok well at risk of bursting your bubble, that's how mine is setup anyway and I still had the problem. I have a /24 LAN. 1-99 are static which I just assign manually. Some are dockers some are things like routers/printers. DHCP is 100+. That's the way it's always been and kind of has to be really. If you have static IPs in the middle of a DHCP server's range you're going to have a bad time (tm) at some point. Putting dockers above or below the DHCP range makes no difference. Quote
DrDirtyDevil Posted February 18, 2022 Posted February 18, 2022 i have the same issue over and over again and i cant find the culprit..... i have ubiquiti switches and i cant find the flow control settings on there in the first place.... all the other suggestions are not applicable in my situation my LAN subnet is an completely different subnet as the default docker network ones so that also is unlikely any suggestions ? Quote
Shonky Posted February 21, 2022 Posted February 21, 2022 I came across this solution separately, and then found this other thread just now. Seems like it could be a possible solution. My router (pfSense) was complaining about an IP having two different MAC addresses (the real hardware and a virtual interfaces called something like br0-shim but responds to ARP requests I presume resulting in packets to one IP coming in on two different network interfaces) Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.