Unraid 6.8.2 crashing/kernel panic


tehtide

Recommended Posts

It appears that I'm having sporadic crashing/kernel panic issues on my 6.8.2 installation. This is a Dell R530.

No response to ping, to ability to ssh and the console was locked up. I did take a picture of the screen before powering off the server and powering it back on.

 

I ran the Dell diagnostics and nothing came back as bad in the hardware.

 

Any logs or anything I can enable to capture the full panic for help in troubleshooting? Anything I can post after a reboot to capture the issue?

IMG_20200214_175931.thumb.jpg.971298cf07f68981f51868387e7364ae.jpg 

Link to comment
  • 2 weeks later...
5 minutes ago, tehtide said:

Just crashed again overnight. Kernel panic.

 

Attached is the syslog. I rebooted it around 09:00 am... there doesn't seem to be anything in there. I've also included the diagnostics file as well.coruscant-diagnostics-20200225-0932.zipsyslog-10.10.20.10.log

 

Anything else I can try? 

Don't see anything in the logs, but I'm not that experienced running through unraid logs.

 

First thing I'd do is run a mem test on the ram just to rule out bad ram. Unlikely as I believe the R530 uses ECC Ram, but it's possible. Also, re-seat the ram and blow out any dust around the ram slots.

 

List up the system specs here as well.

 

Fianlly run unraid for a day or so with dockers and vms off to see if a bad docker is causing this. You can boot the server up and select safe mode at the boot menu. If the system runs stable in safe mode boot back up in normal mode and disable the dockers/vms and power them back on one at a time, and test. 

 

Is this a new build, or one that has been running for awhile? Any new hardware or changes before the panics started?

Link to comment
On 2/25/2020 at 9:47 AM, Chess said:

Don't see anything in the logs, but I'm not that experienced running through unraid logs.

 

First thing I'd do is run a mem test on the ram just to rule out bad ram. Unlikely as I believe the R530 uses ECC Ram, but it's possible. Also, re-seat the ram and blow out any dust around the ram slots.

 

List up the system specs here as well.

 

Fianlly run unraid for a day or so with dockers and vms off to see if a bad docker is causing this. You can boot the server up and select safe mode at the boot menu. If the system runs stable in safe mode boot back up in normal mode and disable the dockers/vms and power them back on one at a time, and test. 

 

Is this a new build, or one that has been running for awhile? Any new hardware or changes before the panics started?

 

 

OK so this takes about 5~ days to crash everytime. I've stopped every Docker and just had another crash.

 

This is a Dell R530.

Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

64GB DDR-4 ECC Memory

Embedded NIC is Broadcom Gigabit Ethernet BCM5720

Disks are a mix of 4 and 8 TB.

 

I've run the Dell Diagnostics and all the memory checked out fine. There were no issues detected in the HW scan.

 

I've been running UnRAID on this for almost a year now. Troubles seems to have started just a few weeks ago.

 

I'm going to roll back to 6.8.1 and see if that helps.

 

Anything else I can provide to help out?

 

This seems to be one of the kernel panics.

It seems to be revolving around the macvlan stuff maybe?

Feb 25 12:14:09 Coruscant kernel: WARNING: CPU: 16 PID: 6741 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e
Feb 25 12:14:09 Coruscant kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat iptable_filter xfs nfsd lockd grace sunrpc md_mod iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat ip_tables wireguard ip6_udp_tunnel udp_tunnel bonding tg3 sr_mod cdrom mxm_wmi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_uncore intel_rapl_perf ipmi_ssif i2c_core ahci megaraid_sas libahci ipmi_si wmi pcc_cpufreq button acpi_power_meter [last unloaded: tg3]
Feb 25 12:14:09 Coruscant kernel: CPU: 16 PID: 6741 Comm: kworker/16:0 Not tainted 4.19.98-Unraid #1
Feb 25 12:14:09 Coruscant kernel: Hardware name: Dell Inc. PowerEdge R530/03XKDV, BIOS 2.7.1 01/26/2018
Feb 25 12:14:09 Coruscant kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Feb 25 12:14:09 Coruscant kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e
Feb 25 12:14:09 Coruscant kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 94 f1 ff ff be 00 02 00 00 48
Feb 25 12:14:09 Coruscant kernel: RSP: 0018:ffff88885fa03d58 EFLAGS: 00010202
Feb 25 12:14:09 Coruscant kernel: RAX: 0000000000000188 RBX: ffff888570f1d300 RCX: ffff8883e36396d8
Feb 25 12:14:09 Coruscant kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e08fcc
Feb 25 12:14:09 Coruscant kernel: RBP: ffff8883e3639680 R08: 000000006547e895 R09: ffffffff81c8a9c0
Feb 25 12:14:09 Coruscant kernel: R10: 0000000000000158 R11: ffffffff81e91140 R12: 000000000000fa73
Feb 25 12:14:09 Coruscant kernel: R13: ffffffff81e91140 R14: 0000000000000000 R15: 0000000000004da5
Feb 25 12:14:09 Coruscant kernel: FS:  0000000000000000(0000) GS:ffff88885fa00000(0000) knlGS:0000000000000000
Feb 25 12:14:09 Coruscant kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 12:14:09 Coruscant kernel: CR2: 00001490681a8000 CR3: 0000000001e0a001 CR4: 00000000001606e0
Feb 25 12:14:09 Coruscant kernel: Call Trace:
Feb 25 12:14:09 Coruscant kernel: <IRQ>
Feb 25 12:14:09 Coruscant kernel: ipv4_confirm+0xaf/0xb9
Feb 25 12:14:09 Coruscant kernel: nf_hook_slow+0x3a/0x90
Feb 25 12:14:09 Coruscant kernel: ip_local_deliver+0xad/0xdc
Feb 25 12:14:09 Coruscant kernel: ? ip_sublist_rcv_finish+0x54/0x54
Feb 25 12:14:09 Coruscant kernel: ip_sabotage_in+0x38/0x3e
Feb 25 12:14:09 Coruscant kernel: nf_hook_slow+0x3a/0x90
Feb 25 12:14:09 Coruscant kernel: ip_rcv+0x8e/0xbe
Feb 25 12:14:09 Coruscant kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1
Feb 25 12:14:09 Coruscant kernel: __netif_receive_skb_one_core+0x53/0x6f
Feb 25 12:14:09 Coruscant kernel: process_backlog+0x77/0x10e
Feb 25 12:14:09 Coruscant kernel: net_rx_action+0x107/0x26c
Feb 25 12:14:09 Coruscant kernel: __do_softirq+0xc9/0x1d7
Feb 25 12:14:09 Coruscant kernel: do_softirq_own_stack+0x2a/0x40
Feb 25 12:14:09 Coruscant kernel: </IRQ>
Feb 25 12:14:09 Coruscant kernel: do_softirq+0x4d/0x5a
Feb 25 12:14:09 Coruscant kernel: netif_rx_ni+0x1c/0x22
Feb 25 12:14:09 Coruscant kernel: macvlan_broadcast+0x111/0x156 [macvlan]
Feb 25 12:14:09 Coruscant kernel: ? __switch_to_asm+0x41/0x70
Feb 25 12:14:09 Coruscant kernel: macvlan_process_broadcast+0xea/0x128 [macvlan]
Feb 25 12:14:09 Coruscant kernel: process_one_work+0x16e/0x24f
Feb 25 12:14:09 Coruscant kernel: worker_thread+0x1e2/0x2b8
Feb 25 12:14:09 Coruscant kernel: ? rescuer_thread+0x2a7/0x2a7
Feb 25 12:14:09 Coruscant kernel: kthread+0x10c/0x114
Feb 25 12:14:09 Coruscant kernel: ? kthread_park+0x89/0x89
Feb 25 12:14:09 Coruscant kernel: ret_from_fork+0x35/0x40
Feb 25 12:14:09 Coruscant kernel: ---[ end trace 949816e8afd00bfc ]---

This week I also plan on upgrading the server BIOS etc... to get everything up to snuff as well.

Link to comment
6 minutes ago, johnnie.black said:

Macvlan call traces are usually caused by dockers with a custom IP address.

The only dockers I have running right now are on br0 with custom IP's. Is that a support configuration? Should I drop everything back to bridge mode and see if that is the issue? Or is there something else going on here?

 

Thanks!

Link to comment
On 2/25/2020 at 6:30 AM, tehtide said:

Were you able to correct it?

I haven't even gotten a chance to try yet. I ended up rolling back to 6.8.0 and found that I had memory problems just finally got those figured out enough to move on to other things I hope that the ram was also my problem though. I did try to upgrade last night and the check for updates button seems to be missing from my upgrade os menu. So that will be the next thing I will have to figure out.

Link to comment
1 hour ago, Sentie said:

I haven't even gotten a chance to try yet. I ended up rolling back to 6.8.0 and found that I had memory problems just finally got those figured out enough to move on to other things I hope that the ram was also my problem though. I did try to upgrade last night and the check for updates button seems to be missing from my upgrade os menu. So that will be the next thing I will have to figure out.

Sorry, been away with a minor illness. 

 

Memory issues will cause a lot of random crashes in Linux that you might not see in Windows.

 

There is a way to do a manually upgrade by downloading the zip and extracting some of the files to the USB stick. Do a search on the forums and see if you can find it. If not, let me know and I'll write out what I did when I had to do a manual downgrade. Should work the same.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.