Kernel Panic RIP: 0010:nf_nat_setup_info [nf_nat]


Recommended Posts

Hello

 

my Unraid Server still [1] [2] randomly crashed with a RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat] message, i dont really understand.

The error happend around every 3-5 days, or like today, 2 times a day. I cant really reproduce it.

So please help me to understand where the error is comming from and what i could do about it.

 

Thanks!

 

Running Unraid 6.8.3

 

Network:

The server is connected with 2x 10gb on a Mikrotik CRS312-4C+8XG-RM.

Bond Mode 4 (802.3ad)

Router: pfSense @ Dell RS210 II

WiFi: Asus RT-AX88U + Lyra

 

I have some VMs configured, but not running.

 

Running Dockers:

  • Gitlab-CE
  • hddtemp2influx
  • JD
  • mariaDB
  • NginxProxy
  • phpmyadmin
  • telegraf
  • zabbix-agent
  • zerotier

 

Hardware

MB: ASRock X470D4U2-2T

CPU: AMD Ryzen 7 3700X

GPU1: Nvidia GeForce RTX 2060 @ NVMe Slot2 on AST1150 PCI-to-PCI Bridge

GPU2: Radeon RX 570 @ PCIe x8

Onboard 2x 10gb NIC

 

Array

HBA: LSI SAS2308 PCI-Express Fusion-MPT SAS-2 @ PCIe x8

8x16TB HDD

 

Cache:

Onboard  400 Series Chipset SATA Controller ASMedia Technology

5x 500gb Sandisk SSD

 

UD:

1tb nvme0n1 - SSD 970 EVO

 

PCIe ACS override active

 

Thinks i allready tryed:

 

- I added rcu_nocbs=0-15 to the boot option.

Quote

kernel: general protection fault: 0000 [#1] SMP NOPTI                                                   
kernel: CPU: 4 PID: 19723 Comm: curl Tainted: G        W  O      4.19.107-Unraid #1                     
kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U2-2T, BIOS P3.30 10/03/2019
kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat]                                                
kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98  
kernel: RSP: 0018:ffff88881e7036d8 EFLAGS: 00010202                       
kernel: RAX: ffff88811593ab06 RBX: ffffffff81e91080 RCX: 000000003a6b22e9
kernel: RDX: ffff888798580000 RSI: 000000003ec6935f RDI: 000000006a07be3d
kernel: RBP: ffff88881e7037b0 R08: ffff88881e703708 R09: ffffffff81c8a6e0
kernel: R10: ffff8887d5bb4388 R11: 0000000000000000 R12: 0000000000000000
kernel: R13: 0c800bfffffffee0 R14: ffff88813b494500 R15: ffff88881e7037c4
kernel: FS:  000014a0958cde00(0000) GS:ffff88881e700000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000014a096abd9e0 CR3: 00000002250f4000 CR4: 0000000000340ee0
kernel: DR0: 00007ff73530c8e0 DR1: 00007ff73530c8e0 DR2: 00007ff73530c8e0
kernel: DR3: 00007ff73530c8e0 DR6: 00000000ffff0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel: <IRQ>
kernel: ? __krealloc+0x25/0x5d                          
kernel: ? nf_ct_ext_add+0x97/0xf6                       
kernel: nf_nat_masquerade_ipv4+0x123/0x14b [nf_nat_ipv4]
kernel: masquerade_tg+0x44/0x5e [ipt_MASQUERADE]        
kernel: ? __dev_queue_xmit+0x5ff/0x627                  
kernel: ipt_do_table+0x582/0x62a [ip_tables]            
kernel: ? ipt_do_table+0x5da/0x62a [ip_tables]          
kernel: nf_nat_inet_fn+0xeb/0x1b9 [nf_nat]              
kernel: nf_nat_ipv4_out+0xf/0x89 [nf_nat_ipv4]          
kernel: nf_hook_slow+0x3a/0x90                          
kernel: ip_output+0xab/0xdd                             
kernel: ? ip_fragment.constprop.0+0x7d/0x7d             
kernel: ip_forward+0x3c0/0x3ef                          
kernel: ? ipv4_frags_exit_net+0x2b/0x2b                 
kernel: ip_sabotage_in+0x38/0x3e                        
kernel: nf_hook_slow+0x3a/0x90                          
kernel: ip_rcv+0x8e/0xbe                                
kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1         
kernel: __netif_receive_skb_one_core+0x53/0x6f          
kernel: netif_receive_skb_internal+0x79/0x94            
kernel: br_pass_frame_up+0x128/0x14a                    
kernel: ? br_port_flags_change+0x29/0x29                
kernel: br_handle_frame_finish+0x342/0x383              
kernel: ? br_pass_frame_up+0x14a/0x14a                  
kernel: br_nf_hook_thresh+0xa3/0xc3                     
kernel: ? br_pass_frame_up+0x14a/0x14a                  
kernel: br_nf_pre_routing_finish+0x24a/0x271            
kernel: ? br_pass_frame_up+0x14a/0x14a                  
kernel: ? br_handle_local_finish+0xe/0xe                
kernel: ? nf_nat_ipv4_in+0x1e/0x62 [nf_nat_ipv4]        
kernel: ? br_handle_local_finish+0xe/0xe                
kernel: br_nf_pre_routing+0x31c/0x343                   
kernel: ? br_nf_forward_ip+0x362/0x362                  
kernel: nf_hook_slow+0x3a/0x90                          
kernel: br_handle_frame+0x27e/0x2bd                     
kernel: ? br_pass_frame_up+0x14a/0x14a                  
kernel: __netif_receive_skb_core+0x4a7/0x7b1            
kernel: __netif_receive_skb_one_core+0x35/0x6f          
kernel: process_backlog+0x77/0x10e                      
kernel: net_rx_action+0x107/0x26c                       
kernel: __do_softirq+0xc9/0x1d7                         
kernel: do_softirq_own_stack+0x2a/0x40                  
kernel: </IRQ>                                
kernel: do_softirq+0x4d/0x5a                            
kernel: __local_bh_enable_ip+0x42/0x4a                  
kernel: ip_finish_output2+0x30d/0x353                   
kernel: ? __switch_to_asm+0x41/0x70                     
kernel: ip_output+0xbe/0xdd                             
kernel: __ip_queue_xmit+0x309/0x333                     
kernel: ? __kmalloc_reserve.isra.0+0x27/0x68            
kernel: __tcp_transmit_skb+0x8a5/0x93f                  
kernel: tcp_connect+0x7c6/0x87a                         
kernel: tcp_v4_connect+0x412/0x46b                      
kernel: __inet_stream_connect+0xd3/0x2b7                
kernel: ? __handle_mm_fault+0xea3/0x11b7                
kernel: inet_stream_connect+0x31/0x45                   
kernel: __sys_connect+0x73/0xad                         
kernel: ? do_fcntl+0x28f/0x58f                          
kernel: ? __se_sys_fcntl+0x4e/0x6b                      
kernel: __x64_sys_connect+0x11/0x14                     
kernel: do_syscall_64+0x57/0xf2                         
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9        
kernel: RIP: 0033:0x14a096c4a53b                        
kernel: Code: 83 ec 18 89 54 24 0c 48 89 34 24 89 7c 24 08 e8 bb fa ff ff 8b 54 24 0c 48 8b 34 24 41 89 c0 8b 7c 24 08 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 08 e8 f1 fa ff ff 8b 44  
kernel: RSP: 002b:00007ffda11bd7d0 EFLAGS: 00000293 ORIG_RAX: 000000000000002a
kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 000014a096c4a53b      
kernel: RDX: 0000000000000010 RSI: 00007ffda11bd970 RDI: 0000000000000005      
kernel: RBP: 00005556b8757ab0 R08: 0000000000000000 R09: 003931312e353931      
kernel: R10: 0000000000000002 R11: 0000000000000293 R12: 0000000000000000      
kernel: R13: 00005556b8758db0 R14: 0000000000000005 R15: 0000000000000000      
kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod ipmi_devintf nct6775 hwmon_vid k10temp bonding ixgbe(O) edac_mce_amd kvm_amd ipmi_ssif kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc m
kernel: ---[ end trace 34dd9e13a6df294b ]---
kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat]
kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98
kernel: RSP: 0018:ffff88881e7036d8 EFLAGS: 00010202
kernel: RAX: ffff88811593ab06 RBX: ffffffff81e91080 RCX: 000000003a6b22e9
kernel: RDX: ffff888798580000 RSI: 000000003ec6935f RDI: 000000006a07be3d
kernel: RBP: ffff88881e7037b0 R08: ffff88881e703708 R09: ffffffff81c8a6e0
kernel: R10: ffff8887d5bb4388 R11: 0000000000000000 R12: 0000000000000000
kernel: R13: 0c800bfffffffee0 R14: ffff88813b494500 R15: ffff88881e7037c4
kernel: FS:  000014a0958cde00(0000) GS:ffff88881e700000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000014a096abd9e0 CR3: 00000002250f4000 CR4: 0000000000340ee0
kernel: DR0: 00007ff73530c8e0 DR1: 00007ff73530c8e0 DR2: 00007ff73530c8e0

 

and

 

Quote

kernel: general protection fault: 0000 [#1] SMP NOPTI
kernel: CPU: 6 PID: 14491 Comm: curl Tainted: G        W  O      4.19.107-Unraid #1
kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U2-2T, BIOS P3.30 10/03/2019
kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat]
kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98       
kernel: RSP: 0018:ffff88881e7836d8 EFLAGS: 00010202
kernel: RAX: ffff88841c647f11 RBX: ffffffff81e91080 RCX: 00000000a1ff25a9
kernel: RDX: ffff88879b480000 RSI: 0000000002fc0ed5 RDI: 000000007c87bd3c
kernel: RBP: ffff88881e7837b0 R08: ffff88881e783708 R09: ffffffff81c8aa80
kernel: R10: 0000000000000348 R11: 0000000000000000 R12: 0000000000000000
kernel: R13: 025a0a22736368da R14: ffff88813eb657c0 R15: ffff88881e7837c4
kernel: FS:  000014b2fa342700(0000) GS:ffff88881e780000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000014b2fa33fe40 CR3: 00000004a059c000 CR4: 0000000000340ee0
kernel: Call Trace:
kernel: <IRQ>
kernel: ? __krealloc+0x25/0x5d                           
kernel: ? nf_ct_ext_add+0x97/0xf6                        
kernel: nf_nat_masquerade_ipv4+0x123/0x14b [nf_nat_ipv4]
kernel: masquerade_tg+0x44/0x5e [ipt_MASQUERADE]         
kernel: ? __dev_queue_xmit+0x5ff/0x627                   
kernel: ipt_do_table+0x582/0x62a [ip_tables]             
kernel: ? ipt_do_table+0x5da/0x62a [ip_tables]           
kernel: nf_nat_inet_fn+0xeb/0x1b9 [nf_nat]               
kernel: nf_nat_ipv4_out+0xf/0x89 [nf_nat_ipv4]           
kernel: nf_hook_slow+0x3a/0x90                           
kernel: ip_output+0xab/0xdd                              
kernel: ? ip_fragment.constprop.0+0x7d/0x7d              
kernel: ip_forward+0x3c0/0x3ef                           
kernel: ? ipv4_frags_exit_net+0x2b/0x2b                  
kernel: ip_sabotage_in+0x38/0x3e                         
kernel: nf_hook_slow+0x3a/0x90                           
kernel: ip_rcv+0x8e/0xbe                                 
kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1          
kernel: __netif_receive_skb_one_core+0x53/0x6f           
kernel: netif_receive_skb_internal+0x79/0x94             
kernel: br_pass_frame_up+0x128/0x14a                     
kernel: ? br_port_flags_change+0x29/0x29                 
kernel: br_handle_frame_finish+0x342/0x383               
kernel: ? br_pass_frame_up+0x14a/0x14a                   
kernel: br_nf_hook_thresh+0xa3/0xc3                      
kernel: ? br_pass_frame_up+0x14a/0x14a                   
kernel: br_nf_pre_routing_finish+0x24a/0x271             
kernel: ? br_pass_frame_up+0x14a/0x14a                   
kernel: ? br_handle_local_finish+0xe/0xe                 
kernel: ? nf_nat_ipv4_in+0x1e/0x62 [nf_nat_ipv4]         
kernel: ? br_handle_local_finish+0xe/0xe                 
kernel: br_nf_pre_routing+0x31c/0x343                    
kernel: ? br_nf_forward_ip+0x362/0x362                   
kernel: nf_hook_slow+0x3a/0x90                           
kernel: br_handle_frame+0x27e/0x2bd                      
kernel: ? br_pass_frame_up+0x14a/0x14a                   
kernel: __netif_receive_skb_core+0x4a7/0x7b1             
kernel: ? enqueue_task_fair+0xba/0x676                   
kernel: __netif_receive_skb_one_core+0x35/0x6f           
kernel: process_backlog+0x77/0x10e                       
kernel: net_rx_action+0x107/0x26c                        
kernel: __do_softirq+0xc9/0x1d7                          
kernel: do_softirq_own_stack+0x2a/0x40
kernel: do_softirq+0x4d/0x5a                             
kernel: __local_bh_enable_ip+0x42/0x4a                   
kernel: ip_finish_output2+0x30d/0x353                    
kernel: ip_output+0xbe/0xdd                              
kernel: ? ip_reply_glue_bits+0x36/0x36                   
kernel: ip_send_skb+0x10/0x32                            
kernel: udp_send_skb+0x26a/0x2cb                         
kernel: udp_sendmsg+0x5df/0x809                          
kernel: ? ip_reply_glue_bits+0x36/0x36                   
kernel: ? rw_copy_check_uvector+0x6d/0xf2                
kernel: ? import_iovec+0x6f/0xa3                         
kernel: ? copy_msghdr_from_user+0xf7/0x115               
kernel: ? sock_sendmsg+0x14/0x1e                         
kernel: sock_sendmsg+0x14/0x1e                           
kernel: ___sys_sendmsg+0x1b1/0x236                       
kernel: ? __alloc_pages_nodemask+0x150/0xae1             
kernel: ? __ip_dev_find+0x1e/0xc6                        
kernel: ? ip_route_output_key_hash_rcu+0x51a/0x65a       
kernel: ? ip4_datagram_release_cb+0x4e/0x1a5             
kernel: __sys_sendmmsg+0xfc/0x17b                        
kernel: ? __sys_connect+0x86/0xad                        
kernel: __x64_sys_sendmmsg+0x1b/0x1e                     
kernel: do_syscall_64+0x57/0xf2                          
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9         
kernel: RIP: 0033:0x14b2fb8c3c5e                                                                                                                                                                                      
kernel: Code: 10 89 7c 24 0c 89 4c 24 1c e8 1e 3b f7 ff 44 8b 54 24 1c 8b 54 24 18 41 89 c0 48 8b 74 24 10 8b 7c 24 0c b8 33 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2c 44 89 c7 89 44 24 0c e8 4e 3b f7 ff 8b 44       
kernel: RSP: 002b:000014b2fa33fc30 EFLAGS: 00000293 ORIG_RAX: 0000000000000133
kernel: RAX: ffffffffffffffda RBX: 000000000e630009 RCX: 000014b2fb8c3c5e
kernel: RDX: 0000000000000002 RSI: 000014b2fa33fdd0 RDI: 0000000000000007
kernel: RBP: 000014b2fa33fd70 R08: 0000000000000000 R09: 0000000000000007
kernel: R10: 0000000000004000 R11: 0000000000000293 R12: 0000000000000000
kernel: R13: 000014b2fa33fda8 R14: 0000000000000000 R15: 0000000000000000
kernel: Modules linked in: vhost_net vhost tap kvm_amd ccp kvm macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables tun veth xt_nat ipt_MASQUERADE iptabl
kernel: ---[ end trace c0ec99ed8429dee3 ]---
kernel: RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat]
kernel: Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98       
kernel: RSP: 0018:ffff88881e7836d8 EFLAGS: 00010202  
kernel: RAX: ffff88841c647f11 RBX: ffffffff81e91080 RCX: 00000000a1ff25a9
kernel: RDX: ffff88879b480000 RSI: 0000000002fc0ed5 RDI: 000000007c87bd3c
kernel: RBP: ffff88881e7837b0 R08: ffff88881e783708 R09: ffffffff81c8aa80
kernel: R10: 0000000000000348 R11: 0000000000000000 R12: 0000000000000000
kernel: R13: 025a0a22736368da R14: ffff88813eb657c0 R15: ffff88881e7837c4
kernel: FS:  000014b2fa342700(0000) GS:ffff88881e780000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033   
kernel: CR2: 000014b2fa33fe40 CR3: 00000004a059c000 CR4: 0000000000340ee0   

 

 

I allready did all things what mentioned in this thread.

 

added rcu_nocbs=0-15 to the boot option.

Bios: set Power Supply Idle Control to Typical

Bios: C6 Enabled (there was some missleading comments at first, but i think turning off was not correct?)

Latest Bios

 

So any other tips what i can do?

 

Edited by corgan
Link to comment
  • 5 months later...

Did you ever make any headway on this? Saw something extremely similar today for me:

 

Short hardware list:

MB: ASRock B450 Pro4

CPU: AMD Ryzen 7 1800X

GPU1: Nvidia GeForce GTX 1650

 

My BIOS cstate = DISABLED

Slightly truncated kernel log from remote syslog:

general protection fault: 0000 [#1] SMP NOPTI
CPU: 3 PID: 21050 Comm: nzbget Tainted: P        W  O      4.19.107-Unraid #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4, BIOS P4.20 06/18/2020
RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat]
Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98
hrtimer: interrupt took 2467865 ns
RSP: 0018:ffff88881e6c36d8 EFLAGS: 00010206
general protection fault: 0000 [#1] SMP NOPTI
CPU: 3 PID: 21050 Comm: nzbget Tainted: P        W  O      4.19.107-Unraid #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4, BIOS P4.20 06/18/2020
RIP: 0010:nf_nat_setup_info+0x365/0x666 [nf_nat]
Code: ed 75 23 45 8b 17 48 8d 7c 24 58 b9 0a 00 00 00 48 8d 74 24 30 f3 a5 41 f6 c2 01 0f 85 c4 00 00 00 e9 25 02 00 00 8a 44 24 56 <41> 38 45 46 74 15 4d 8b ad 98 00 00 00 4d 85 ed 74 c7 49 81 ed 98
hrtimer: interrupt took 2467865 ns
RSP: 0018:ffff88881e6c36d8 EFLAGS: 00010206
RAX: ffff88813f034906 RBX: ffffffff81e91080 RCX: 0000000086ee2de1
RDX: ffff8887d1180000 RSI: 0000000045680e5d RDI: 0000000087f6373f
RBP: ffff88881e6c37b0 R08: ffff88881e6c3708 R09: ffffffff81c8a6e0
R10: ffff8884c1518388 R11: 0000000000000000 R12: 0000000000000000
R13: 04fa37f2f9f462ee R14: ffff888050b64b40 R15: ffff88881e6c37c4
FS:  0000153f3737bb20(0000) GS:ffff88881e6c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000153f36cc6fd0 CR3: 00000005711c0000 CR4: 00000000003406e0
Call Trace: 
...

 

Link to comment

actually no, i cant fix this. It happend sometime once a week and sometimes once a mounth.

 

I wrote a little bash script which reads the ipmi error log from the ASRock board and if the kernel error appears then send a reboot signal and clear the logs.

Additional it sends a slack notice.

Thats the same way, i would handle the error by hand.

 

This runs as cron job on my home assistant raspi. You need the install ipmi tools.

you have to change USER and PASSWORD in the script with the actuall username and password

 

ipmi_check.sh

count=$(/usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel info | grep Entries | cut -d: -f2 )
datetime=$(date)
seconds=$(date +%s)

function sl_send(){

log=$(/usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel list)

if [[ $1 -gt 0 ]]; then
  img="https://knaak.org/assets/img/icons/backup200x200.png"
  ti="OS Critical Stop"
  ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD power cycle
fi

if [[ $2 -gt 0 ]]; then
  img="https://knaak.org/assets/img/icons/backup200x200.png"
  va="Kernel Panic"
fi


slack chat send \
  --actions '{"type": "button", "style": "primary", "text": "Check on HA", "url": "https://ha.knaak.work"}' \
  --author 'Homeassistant' \
  --author-icon 'https://knaak.org/assets/img/icons/warning.png' \
  --author-link 'https://192.168.2.241' \
  --channel '#officeknaak' \
  --color '#8B0000' \
  --fields '{"title": "", "value": "", "short": true}' \
  --footer 'footer' \
  --footer-icon 'https://knaak.org/assets/img/icons/info_red.png' \
  --image "$img" \
  --pretext "$datetime" \
  --text "$log" \
  --time $seconds \
  --title 'New IPMI Log' \
  --title-link 'https://github.com/rockymadden/slack-cli'
}


if [[ $count -eq 0 ]]; then
  echo "String is empty"
elif [[ $count -gt 0 ]]; then
  /usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel list >sel.log

  os_crit_count=$(cat sel.log | grep "OS Critical Stop" | wc -l)
  kernel_panic_count=$(cat sel.log | grep "kernel panic" | wc -l)

  sl_send $os_crit_count $kernel_panic_count
  slack file upload sel.log '#officeknaak'
  rm sel.log
  /usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel list >>sel_all.txt
  /usr/bin/ipmitool -I lanplus -H 192.168.2.241 -U USER -P PASSWORD sel clear
 # echo $log

 

 

 

Link to comment
  • 2 months later...

Every time I search this issue, I come right back to this post. I MAY have found the solution and hope it may help someone else. 

 

I'll preface this first. I've been having this issue for months. On average I would crash 3-4 days, and once I made it to 5 only to be let down again. At time of writing this it marks my 10th day stable, which is no major milestone but looks promising. I made a few changes at once and I'm not sure which one fixed it so I'll write them all down.  

 

  1. I noticed we had the same switch, the Mikrotik CRS312-4C+8XG-RM. My first change was here, under the link tab I disabled (unchecked) all "Flow Control Tx/Rx" from unraid ports. I don't know why this was enabled by default but unless you need it, disable it. 
  2. In unraid I made changes to my network settings. The `nf_nat_setup_info` issue when doing research is network related. after digging in non unraid specific issues it seems to be a wide range of things. 
    1. Settings> Network Settings
      1. My main network in unraid was let mostly unchanged. I assigned a static IP in unraid, matching the static mapping on my router. If you have VLans on your main network it seems that unraid broadcasts the vlans with the same mac and can cause confusion for your router. 
      2. I set all vlans on this network to not get auto assigned IP. Note, if you are using bridge network on this vlan you'll have to have an ip. 
        I followed a guide to set this up. https://staging.forums.unraid.net/topic/62107-network-isolation-in-unraid-64/?ct=1612387651
    2. Settings > Docker
      1. This is the one I truly believe may have fixed the issue. One of the issues I've read is that the docker service assigns an ip address to containers that are in use by something else on your network.... or something along those lines. So you should be setting a range that your router doesn't use and have your docker service run in that range. To do this set enable docker to no and apply change. Once your containers are turned off make sure you're in advanced view. You'll see check boxes for custom network. for each of those enabled you'll want to set a DHCP range. So lets say on your router you isolate 192.168.1.128 to 192.168.1.159 which is br0 you can write the CIDR as 192.168.1.128/27 (you can do CIDR transitions at this site https://www.ipaddressguide.com/cidr). 

Hopefully those make sense. Figured I'd share, in hopes this is the solution and possibly help someone else out there. If I crash again I'll update this to let other know. If something above is not clear let me know I'll try to clarify. This is mostly new to me. 

Edited by 5252525111
Link to comment

Nice hind with the flow control, which was also activated on my CRS312. I will try this.

 

I think you are absolutely right with your last point. I changed the IP range of the BR net away from my normal DHCP IP range some time ago.

My crashes went down to 1-2 per month, but I changed a lot at once and didn't figure out which change was the "one". But now as I'm reading your comment, this makes absolutely sense!

Link to comment
  • 7 months later...

 

How's your fix holding up? I get this sporadically. certainly not every 3-4 days. 1-3 months perhaps.

 

I do not have the mentioned switch.

I have a static IP but don't use VLANs within unRAID nor on my network.

 

I don't really follow the fix though. I do have a br0 setup and some of my dockers connect that way to get a specific LAN IP of their own.

 

My docker br0 is 

IPv4 custom network on interface br0: Subnet: 192.168.99.0/24 Gateway: 192.168.99.1 DHCP pool: not set

 

 

That's my LAN too. Is the suggested fix to have docker restricted to a sub range of the full /24 and then for each docker that needs it, only use IPs within that range?

 

My DHCP range for my LAN is 192.168.99.100-192.168.99.255. Below .100 I reserve for static assigned IPs and that's where my dockers that have their own IP run from.

 

@corgan if you're still getting the problem does that not mean you didn't really fix it though?

Edited by Shonky
Link to comment
  • 2 weeks later...
On 9/26/2021 at 1:11 AM, Shonky said:

How's your fix holding up? I get this sporadically. certainly not every 3-4 days. 1-3 months perhaps.

 

So far it's holding up well. I'm having new issues with NFS but don't think that's related to this at all. Long story short, no more panics. 

 

On 9/26/2021 at 1:11 AM, Shonky said:

Is the suggested fix to have docker restricted to a sub range of the full /24 and then for each docker that needs it, only use IPs within that range?

 

That was one of the main fixes for me. I used a range outside my DHCP and reserve that for containers on unraid. If I recall correctly since I currently don't have access to my system, 100-223 is my LAN range, 50-99 I used for static and 224-255 (192.168.99.224/27) I reserve for containers on unraid. 

Link to comment
On 10/7/2021 at 11:24 PM, 5252525111 said:

So far it's holding up well. I'm having new issues with NFS but don't think that's related to this at all. Long story short, no more panics. 

 

That was one of the main fixes for me. I used a range outside my DHCP and reserve that for containers on unraid. If I recall correctly since I currently don't have access to my system, 100-223 is my LAN range, 50-99 I used for static and 224-255 (192.168.99.224/27) I reserve for containers on unraid. 

 

Ok well at risk of bursting your bubble, that's how mine is setup anyway and I still had the problem. I have a /24 LAN. 1-99 are static which I just assign manually. Some are dockers some are things like routers/printers. DHCP is 100+. That's the way it's always been and kind of has to be really. If you have static IPs in the middle of a DHCP server's range you're going to have a bad time (tm) at some point. Putting dockers above or below the DHCP range makes no difference.

Link to comment
  • 4 months later...

i have the same issue over and over again and i cant find the culprit..... 

 

i have ubiquiti switches and i cant find the flow control settings on there in the first place.... 

 

all the other suggestions are not applicable in my situation

 

my LAN subnet is an completely different subnet as the default docker network ones so that also is unlikely

 

any suggestions ? 

 

firefox_oa8RcDl3nK.png

Link to comment

I came across this solution separately, and then found this other thread just now. Seems like it could be a possible solution. My router (pfSense) was complaining about an IP having two different MAC addresses (the real hardware and a virtual interfaces called something like br0-shim but responds to ARP requests I presume resulting in packets to one IP coming in on two different network interfaces)

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.