gbozarth Posted April 23, 2021 Posted April 23, 2021 Hi All, Help Needed! My server has been randomly crashing ever sense my upgrade to 6.9.2. Below is some log info. I have attached the full diagnostics info below. Any help or ideas would be appreciated and very much needed. I can't keep running 15 disks 18+ hours continuously. Apr 21 15:25:56 Pegasus kernel: ------------[ cut here ]------------ Apr 21 15:25:56 Pegasus kernel: WARNING: CPU: 6 PID: 0 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack] Apr 21 15:25:56 Pegasus kernel: Modules linked in: veth macvlan xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter md4 sha512_ssse3 sha512_generic cmac cifs libarc4 xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding mlx4_en mlx4_core igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mxm_wmi aesni_intel crypto_simd cryptd ipmi_ssif glue_helper rapl isci mpt3sas intel_cstate libsas acpi_ipmi raid_class intel_uncore ahci scsi_transport_sas i2c_i801 input_leds i2c_smbus libahci i2c_core led_class wmi ipmi_si button [last unloaded: mlx4_core] Apr 21 15:25:56 Pegasus kernel: CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.10.28-Unraid #1 Apr 21 15:25:56 Pegasus kernel: Hardware name: Penguin Computing Icebreaker 4824/X9DR3-F, BIOS 3.2a 07/09/2015 Apr 21 15:25:56 Pegasus kernel: RIP: 0010:__nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack] Apr 21 15:25:56 Pegasus kernel: Code: e8 dc f8 ff ff 44 89 fa 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 36 f6 ff ff 84 c0 75 bb 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 6d f3 ff ff e8 35 f5 ff ff e9 22 01 Apr 21 15:25:56 Pegasus kernel: RSP: 0018:ffffc900002d08a0 EFLAGS: 00010202 Apr 21 15:25:56 Pegasus kernel: RAX: 0000000000000188 RBX: 00000000000041c4 RCX: 00000000eadc3d22 Apr 21 15:25:56 Pegasus kernel: RDX: 0000000000000000 RSI: 0000000000000012 RDI: ffffffffa050c910 Apr 21 15:25:56 Pegasus kernel: RBP: ffff88822c452a00 R08: 00000000bdba9a8b R09: 0000000000000000 Apr 21 15:25:56 Pegasus kernel: R10: 0000000000000158 R11: ffff888258215400 R12: 0000000000003c12 Apr 21 15:25:56 Pegasus kernel: R13: ffffffff8210b440 R14: 00000000000041c4 R15: 0000000000000000 Apr 21 15:25:56 Pegasus kernel: FS: 0000000000000000(0000) GS:ffff88885fb80000(0000) knlGS:0000000000000000 Apr 21 15:25:56 Pegasus kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 21 15:25:56 Pegasus kernel: CR2: 000055f06d2ce7b0 CR3: 000000000600a002 CR4: 00000000000606e0 Apr 21 15:25:56 Pegasus kernel: Call Trace: Server Info: Supermicro 4U CSE-846 24 Bay SAS2 BP Unraid server Pro, version 6.9.2 Supermicro - X9DR3-F Intel@ Xeon@ CPU E5-2670 32 GB DDR3 Multi-bit ECC Eth2 10GB Ethernet Card Thanks in advanced for your help! Gary pegasus-diagnostics-20210421-1756.zip pegasus-syslog-20210421-1754.zip Quote
JorgeB Posted April 23, 2021 Posted April 23, 2021 Likely related to this, see if it applies to you: https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ Quote
gbozarth Posted April 23, 2021 Author Posted April 23, 2021 (edited) Thank you sir. I'll take a look and let you know what I found and did. I have stopped all Docker containers which either use br0 or host with the exception of below as I've been running them for well over an 1.5 years with no issues. UniFI br0 Pi-Hole br0 Duckdns Host Glances Host Hope that solves the problem. Edited April 23, 2021 by gbozarth Quote
gbozarth Posted April 24, 2021 Author Posted April 24, 2021 (edited) Hi, All, Latest Update: 12:20:00 AM Event: Unraid Status Subject: Notice [PEGASUS] - array health report [PASS] Description: Array has 15 disks (including parity & cache) Importance: normal Parity - ST12000VN0007-2GS116_ZJV4YFYA (sdg) - active 86 F [OK] Parity 2 - WDC_WD120EMFZ-11A6JA0_9JH3VDWT (sdl) - active 91 F [OK] Disk 1 - WDC_WD30EFRX-68EUZN0_WD-WCC4N0SH4NLZ (sdb) - active 82 F [OK] Disk 2 - WDC_WD30EFRX-68EUZN0_WD-WCC4N0XUA4DV (sde) - active 81 F [OK] Disk 3 - WDC_WD80EMAZ-00WJTA0_2SG3JNMJ (sdf) - active 90 F [OK] Disk 4 - ST2000VN004-2E4164_Z528MMXJ (sdd) - active 75 F [OK] Disk 5 - WDC_WD100EMAZ-00WJTA0_2YKBGPED (sdk) - active 93 F [OK] Disk 6 - WDC_WD120EMFZ-11A6JA0_9JH3TV9T (sdj) - active 95 F [OK] Disk 7 - WDC_WD100EMAZ-00WJTA0_2YK2503D (sdi) - active 91 F [OK] Disk 8 - WDC_WD120EMFZ-11A6JA0_9JKNYR7T (sdo) - active 88 F [OK] Disk 9 - WDC_WD120EMFZ-11A6JA0_9JH0MGVT (sdn) - active 90 F [OK] Appdatassd - Samsung_SSD_870_EVO_500GB_S62ANJ0R150794Z (sdp) - active 68 F [OK] Cache - Samsung_SSD_860_EVO_500GB_S59UNJ0MC03069W (sdc) - active 75 F [OK] Cache 2 - Samsung_SSD_860_EVO_500GB_S598NE0M870240P (sdh) - active 77 F [OK] Newsnet - ST1000VN002-2EY102_Z9CCNMA7 (sdm) - active 75 F [OK] Parity check in progress. Total size: 12 TB Elapsed time: 14 hours, 54 minutes Current position: 5.01 TB (41.7 %) Estimated speed: 100.5 MB/sec Estimated finish: 19 hours, 20 minutes Sync errors corrected: 0 Server Crashed around 3:00:00 AM Error Message from console Kernel panic – not syncing: Fatal exception in interrupt Kernel Offset: disabled ---{ end Kernel panic – not syncing: Fatal exception in interrupt ]--- Server Event Logs. Event,Type,Timestamp,Sensor Type,Sensor,Event Type 1,System Event,2021/04/24 00:29:22 Sat,OS Stop Shutdown,FAN6,Assertion: OS Stop Shutdown| Event = Run-time Stop (a.k.a. core dump, blue screen) 2,System Event,2021/09/18 15:05:04 Sat,OS Stop Shutdown,#0x65,OEM 3,System Event,2026/01/23 23:15:44 Fri,undefined,#0x6E,reserved 4,System Event,2030/10/26 05:18:24 Sat,undefined,#0x00,unspecified I have attached a screen shot of the terminal below. I rebooted the server and stopped the parity check. Deleted Docker image and switched to directory. Recreated needed dockers but only the ones using bridge for network. Disabled docker and rebooted the server. Server came up this time saying parity is valid. Rebooted the server again in an attempt to go full cycle and insure I had a clean prior shutdown. Server came up normal. I then manually run the mover. Once done, I started a parity check. Time: 6:06:00 AM Event: Unraid Parity check Subject: Notice [PEGASUS] - Parity check started Description: Size: 12.0 TB Importance: warning Not sure what the next step is but the server is again up with another parity check in progress. Maybe this time it will stay up? Also been thinking of downgrading sense this issue started occurring after my initial upgrade to 6.9.1. Thanks and looking forward on next steps if needed. syslog.txt vfio-pci.txt network.cfg network-rules.cfg docker.cfg Edited April 24, 2021 by gbozarth Added additional log info to post Quote
JorgeB Posted April 25, 2021 Posted April 25, 2021 On 4/23/2021 at 5:08 PM, gbozarth said: with the exception of below as I've been running them for well over an 1.5 years with no issues. Many users didn't have issues running them with v6.8.x but started having issues with 6.9.x, so you'd need to stop all of them to test. Quote
gbozarth Posted April 26, 2021 Author Posted April 26, 2021 Hi All, The parity checked finished successfully last night. I then did a system reboot , and started a few Docker containers. Those that only use bridge for networking. This morning the server was still up and running but I did notice the following error. Apr 26 00:50:05 Pegasus kernel: WARNING: CPU: 2 PID: 28059 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack] Apr 26 00:50:05 Pegasus kernel: Modules linked in: xt_nat xt_tcpudp veth macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink. I’ll attach the log files below. I’m assuming that sense this was only a Kernel Warning message the system didn’t crash. At this point I then turned off auto start for 2 Dockers that were previously running and rebooted the server. The server came up normal. At this point I just leave what’s running and see what happens. I’ll continue to report progress in a day or so assuming the server stays up. Thanks everyone so far for your help and suggestions on this issue. syslog.txt docker.txt Quote
gbozarth Posted May 2, 2021 Author Posted May 2, 2021 Hi All, I like to report that my server have been up now for nearly 6 days now and the last parity check finished. As of right now I'm only running 5 docker containers all with only use the Bridge Network. As far as the other docker containers that I was previously running prior to the OS upgrade to 6.9.2 I'm in the process of migrating them to a Docker for Windows environment. This server has an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz, 3600 Mhz, 8 Core(s), 16 Logical Processor(s) with 64.0 GB of RAM. Much more suited hardware that my main storage array. I just wanted to thank everyone for all there help with this issue. Thanks again. Gary 2 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.