kernel: WARNING: CPU: 6 PID - Server Crashes shortly after. (SOLVED)

gbozarth · April 23, 2021

Hi All,

Help Needed!

My server has been randomly crashing ever sense my upgrade to 6.9.2. Below is some log info. I have attached the full diagnostics info below. Any help or ideas would be appreciated and very much needed. I can't keep running 15 disks 18+ hours continuously.

Apr 21 15:25:56 Pegasus kernel: ------------[ cut here ]------------
Apr 21 15:25:56 Pegasus kernel: WARNING: CPU: 6 PID: 0 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
Apr 21 15:25:56 Pegasus kernel: Modules linked in: veth macvlan xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter md4 sha512_ssse3 sha512_generic cmac cifs libarc4 xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding mlx4_en mlx4_core igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mxm_wmi aesni_intel crypto_simd cryptd ipmi_ssif glue_helper rapl isci mpt3sas intel_cstate libsas acpi_ipmi raid_class intel_uncore ahci scsi_transport_sas i2c_i801 input_leds i2c_smbus libahci i2c_core led_class wmi ipmi_si button [last unloaded: mlx4_core]
Apr 21 15:25:56 Pegasus kernel: CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.10.28-Unraid #1
Apr 21 15:25:56 Pegasus kernel: Hardware name: Penguin Computing Icebreaker 4824/X9DR3-F, BIOS 3.2a 07/09/2015
Apr 21 15:25:56 Pegasus kernel: RIP: 0010:__nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]
Apr 21 15:25:56 Pegasus kernel: Code: e8 dc f8 ff ff 44 89 fa 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 36 f6 ff ff 84 c0 75 bb 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 6d f3 ff ff e8 35 f5 ff ff e9 22 01
Apr 21 15:25:56 Pegasus kernel: RSP: 0018:ffffc900002d08a0 EFLAGS: 00010202
Apr 21 15:25:56 Pegasus kernel: RAX: 0000000000000188 RBX: 00000000000041c4 RCX: 00000000eadc3d22
Apr 21 15:25:56 Pegasus kernel: RDX: 0000000000000000 RSI: 0000000000000012 RDI: ffffffffa050c910
Apr 21 15:25:56 Pegasus kernel: RBP: ffff88822c452a00 R08: 00000000bdba9a8b R09: 0000000000000000
Apr 21 15:25:56 Pegasus kernel: R10: 0000000000000158 R11: ffff888258215400 R12: 0000000000003c12
Apr 21 15:25:56 Pegasus kernel: R13: ffffffff8210b440 R14: 00000000000041c4 R15: 0000000000000000
Apr 21 15:25:56 Pegasus kernel: FS: 0000000000000000(0000) GS:ffff88885fb80000(0000) knlGS:0000000000000000
Apr 21 15:25:56 Pegasus kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 21 15:25:56 Pegasus kernel: CR2: 000055f06d2ce7b0 CR3: 000000000600a002 CR4: 00000000000606e0
Apr 21 15:25:56 Pegasus kernel: Call Trace:

Server Info:

Supermicro 4U CSE-846 24 Bay SAS2 BP

Unraid server Pro, version 6.9.2

Supermicro - X9DR3-F

Intel@ Xeon@ CPU E5-2670

32 GB DDR3 Multi-bit ECC

Eth2 10GB Ethernet Card

Thanks in advanced for your help!

Gary

pegasus-diagnostics-20210421-1756.zip pegasus-syslog-20210421-1754.zip

JorgeB · April 23, 2021

Likely related to this, see if it applies to you:

https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/

See also here:

https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/

gbozarth · April 23, 2021

Thank you sir. I'll take a look and let you know what I found and did.

I have stopped all Docker containers which either use br0 or host with the exception of below as I've been running them for well over an 1.5 years with no issues.

UniFI br0

Pi-Hole br0

Duckdns Host

Glances Host

Hope that solves the problem.

Edited April 23, 2021 by gbozarth

gbozarth · April 24, 2021

Hi, All,

Latest Update:

12:20:00 AM

Event: Unraid Status

Subject: Notice [PEGASUS] - array health report [PASS]

Description: Array has 15 disks (including parity & cache)

Importance: normal

Parity - ST12000VN0007-2GS116_ZJV4YFYA (sdg) - active 86 F [OK]

Parity 2 - WDC_WD120EMFZ-11A6JA0_9JH3VDWT (sdl) - active 91 F [OK]

Disk 1 - WDC_WD30EFRX-68EUZN0_WD-WCC4N0SH4NLZ (sdb) - active 82 F [OK]

Disk 2 - WDC_WD30EFRX-68EUZN0_WD-WCC4N0XUA4DV (sde) - active 81 F [OK]

Disk 3 - WDC_WD80EMAZ-00WJTA0_2SG3JNMJ (sdf) - active 90 F [OK]

Disk 4 - ST2000VN004-2E4164_Z528MMXJ (sdd) - active 75 F [OK]

Disk 5 - WDC_WD100EMAZ-00WJTA0_2YKBGPED (sdk) - active 93 F [OK]

Disk 6 - WDC_WD120EMFZ-11A6JA0_9JH3TV9T (sdj) - active 95 F [OK]

Disk 7 - WDC_WD100EMAZ-00WJTA0_2YK2503D (sdi) - active 91 F [OK]

Disk 8 - WDC_WD120EMFZ-11A6JA0_9JKNYR7T (sdo) - active 88 F [OK]

Disk 9 - WDC_WD120EMFZ-11A6JA0_9JH0MGVT (sdn) - active 90 F [OK]

Appdatassd - Samsung_SSD_870_EVO_500GB_S62ANJ0R150794Z (sdp) - active 68 F [OK]

Cache - Samsung_SSD_860_EVO_500GB_S59UNJ0MC03069W (sdc) - active 75 F [OK]

Cache 2 - Samsung_SSD_860_EVO_500GB_S598NE0M870240P (sdh) - active 77 F [OK]

Newsnet - ST1000VN002-2EY102_Z9CCNMA7 (sdm) - active 75 F [OK]

Parity check in progress.

Total size: 12 TB

Elapsed time: 14 hours, 54 minutes

Current position: 5.01 TB (41.7 %)

Estimated speed: 100.5 MB/sec

Estimated finish: 19 hours, 20 minutes

Sync errors corrected: 0

Server Crashed around 3:00:00 AM

Error Message from console

Kernel panic – not syncing: Fatal exception in interrupt

Kernel Offset: disabled

---{ end Kernel panic – not syncing: Fatal exception in interrupt ]---

Server Event Logs.

Event,Type,Timestamp,Sensor Type,Sensor,Event Type
1,System Event,2021/04/24 00:29:22 Sat,OS Stop Shutdown,FAN6,Assertion: OS Stop Shutdown| Event = Run-time Stop (a.k.a. core dump, blue screen)
2,System Event,2021/09/18 15:05:04 Sat,OS Stop Shutdown,#0x65,OEM
3,System Event,2026/01/23 23:15:44 Fri,undefined,#0x6E,reserved
4,System Event,2030/10/26 05:18:24 Sat,undefined,#0x00,unspecified

I have attached a screen shot of the terminal below.

I rebooted the server and stopped the parity check. Deleted Docker image and switched to directory. Recreated needed dockers but only the ones using bridge for network. Disabled docker and rebooted the server. Server came up this time saying parity is valid. Rebooted the server again in an attempt to go full cycle and insure I had a clean prior shutdown. Server came up normal. I then manually run the mover. Once done, I started a parity check.

Time: 6:06:00 AM

Event: Unraid Parity check

Subject: Notice [PEGASUS] - Parity check started

Description: Size: 12.0 TB

Importance: warning

Not sure what the next step is but the server is again up with another parity check in progress. Maybe this time it will stay up?

Also been thinking of downgrading sense this issue started occurring after my initial upgrade to 6.9.1.

Thanks and looking forward on next steps if needed.

syslog.txt vfio-pci.txt network.cfg network-rules.cfg docker.cfg

Edited April 24, 2021 by gbozarth
Added additional log info to post

JorgeB · April 25, 2021

On 4/23/2021 at 5:08 PM, gbozarth said:

with the exception of below as I've been running them for well over an 1.5 years with no issues.

Many users didn't have issues running them with v6.8.x but started having issues with 6.9.x, so you'd need to stop all of them to test.

gbozarth · April 26, 2021

Hi All,

The parity checked finished successfully last night. I then did a system reboot , and started a few Docker containers. Those that only use bridge for networking. This morning the server was still up and running but I did notice the following error.

Apr 26 00:50:05 Pegasus kernel: WARNING: CPU: 2 PID: 28059 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack]

Apr 26 00:50:05 Pegasus kernel: Modules linked in: xt_nat xt_tcpudp veth macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink.

I’ll attach the log files below. I’m assuming that sense this was only a Kernel Warning message the system didn’t crash. At this point I then turned off auto start for 2 Dockers that were previously running and rebooted the server. The server came up normal. At this point I just leave what’s running and see what happens.

I’ll continue to report progress in a day or so assuming the server stays up.

Thanks everyone so far for your help and suggestions on this issue.

syslog.txt docker.txt

gbozarth · May 2, 2021

Hi All,

I like to report that my server have been up now for nearly 6 days now and the last parity check finished. As of right now I'm only running 5 docker containers all with only use the Bridge Network. As far as the other docker containers that I was previously running prior to the OS upgrade to 6.9.2 I'm in the process of migrating them to a Docker for Windows environment. This server has an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz, 3600 Mhz, 8 Core(s), 16 Logical Processor(s) with 64.0 GB of RAM. Much more suited hardware that my main storage array.

I just wanted to thank everyone for all there help with this issue.

Thanks again.

Gary

kernel: WARNING: CPU: 6 PID - Server Crashes shortly after. (SOLVED)

Recommended Posts

gbozarth

Link to comment

JorgeB

Link to comment

gbozarth

Link to comment

gbozarth

Link to comment

JorgeB

Link to comment

gbozarth

Link to comment

gbozarth

Link to comment

Join the conversation