tehtide Posted February 15, 2020 Share Posted February 15, 2020 It appears that I'm having sporadic crashing/kernel panic issues on my 6.8.2 installation. This is a Dell R530. No response to ping, to ability to ssh and the console was locked up. I did take a picture of the screen before powering off the server and powering it back on. I ran the Dell diagnostics and nothing came back as bad in the hardware. Any logs or anything I can enable to capture the full panic for help in troubleshooting? Anything I can post after a reboot to capture the issue? Quote Link to comment
JorgeB Posted February 15, 2020 Share Posted February 15, 2020 Try this, it might catch something. Quote Link to comment
tehtide Posted February 16, 2020 Author Share Posted February 16, 2020 On 2/15/2020 at 2:36 AM, johnnie.black said: Try this, it might catch something. OK thanks. I enabled it... hopefully it'll catch something. Quote Link to comment
Sentie Posted February 17, 2020 Share Posted February 17, 2020 I had this exact same problem upgrading to 6.8.2 Quote Link to comment
tehtide Posted February 25, 2020 Author Share Posted February 25, 2020 On 2/17/2020 at 7:10 AM, Sentie said: I had this exact same problem upgrading to 6.8.2 Were you able to correct it? Quote Link to comment
tehtide Posted February 25, 2020 Author Share Posted February 25, 2020 Just crashed again overnight. Kernel panic. Attached is the syslog. I rebooted it around 09:00 am... there doesn't seem to be anything in there. I've also included the diagnostics file as well.coruscant-diagnostics-20200225-0932.zipsyslog-10.10.20.10.log Anything else I can try? Quote Link to comment
JorgeB Posted February 25, 2020 Share Posted February 25, 2020 Don't see nothing on the log, you can try safe mode with all docker/VMs disabled and/or downgrading to a previously known working release, if still issues it's likely a hardware problem. Quote Link to comment
Chess Posted February 25, 2020 Share Posted February 25, 2020 5 minutes ago, tehtide said: Just crashed again overnight. Kernel panic. Attached is the syslog. I rebooted it around 09:00 am... there doesn't seem to be anything in there. I've also included the diagnostics file as well.coruscant-diagnostics-20200225-0932.zipsyslog-10.10.20.10.log Anything else I can try? Don't see anything in the logs, but I'm not that experienced running through unraid logs. First thing I'd do is run a mem test on the ram just to rule out bad ram. Unlikely as I believe the R530 uses ECC Ram, but it's possible. Also, re-seat the ram and blow out any dust around the ram slots. List up the system specs here as well. Fianlly run unraid for a day or so with dockers and vms off to see if a bad docker is causing this. You can boot the server up and select safe mode at the boot menu. If the system runs stable in safe mode boot back up in normal mode and disable the dockers/vms and power them back on one at a time, and test. Is this a new build, or one that has been running for awhile? Any new hardware or changes before the panics started? Quote Link to comment
tehtide Posted February 29, 2020 Author Share Posted February 29, 2020 On 2/25/2020 at 9:47 AM, Chess said: Don't see anything in the logs, but I'm not that experienced running through unraid logs. First thing I'd do is run a mem test on the ram just to rule out bad ram. Unlikely as I believe the R530 uses ECC Ram, but it's possible. Also, re-seat the ram and blow out any dust around the ram slots. List up the system specs here as well. Fianlly run unraid for a day or so with dockers and vms off to see if a bad docker is causing this. You can boot the server up and select safe mode at the boot menu. If the system runs stable in safe mode boot back up in normal mode and disable the dockers/vms and power them back on one at a time, and test. Is this a new build, or one that has been running for awhile? Any new hardware or changes before the panics started? OK so this takes about 5~ days to crash everytime. I've stopped every Docker and just had another crash. This is a Dell R530. Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 64GB DDR-4 ECC Memory Embedded NIC is Broadcom Gigabit Ethernet BCM5720 Disks are a mix of 4 and 8 TB. I've run the Dell Diagnostics and all the memory checked out fine. There were no issues detected in the HW scan. I've been running UnRAID on this for almost a year now. Troubles seems to have started just a few weeks ago. I'm going to roll back to 6.8.1 and see if that helps. Anything else I can provide to help out? This seems to be one of the kernel panics. It seems to be revolving around the macvlan stuff maybe? Feb 25 12:14:09 Coruscant kernel: WARNING: CPU: 16 PID: 6741 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e Feb 25 12:14:09 Coruscant kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat iptable_filter xfs nfsd lockd grace sunrpc md_mod iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat ip_tables wireguard ip6_udp_tunnel udp_tunnel bonding tg3 sr_mod cdrom mxm_wmi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_uncore intel_rapl_perf ipmi_ssif i2c_core ahci megaraid_sas libahci ipmi_si wmi pcc_cpufreq button acpi_power_meter [last unloaded: tg3] Feb 25 12:14:09 Coruscant kernel: CPU: 16 PID: 6741 Comm: kworker/16:0 Not tainted 4.19.98-Unraid #1 Feb 25 12:14:09 Coruscant kernel: Hardware name: Dell Inc. PowerEdge R530/03XKDV, BIOS 2.7.1 01/26/2018 Feb 25 12:14:09 Coruscant kernel: Workqueue: events macvlan_process_broadcast [macvlan] Feb 25 12:14:09 Coruscant kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e Feb 25 12:14:09 Coruscant kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 94 f1 ff ff be 00 02 00 00 48 Feb 25 12:14:09 Coruscant kernel: RSP: 0018:ffff88885fa03d58 EFLAGS: 00010202 Feb 25 12:14:09 Coruscant kernel: RAX: 0000000000000188 RBX: ffff888570f1d300 RCX: ffff8883e36396d8 Feb 25 12:14:09 Coruscant kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e08fcc Feb 25 12:14:09 Coruscant kernel: RBP: ffff8883e3639680 R08: 000000006547e895 R09: ffffffff81c8a9c0 Feb 25 12:14:09 Coruscant kernel: R10: 0000000000000158 R11: ffffffff81e91140 R12: 000000000000fa73 Feb 25 12:14:09 Coruscant kernel: R13: ffffffff81e91140 R14: 0000000000000000 R15: 0000000000004da5 Feb 25 12:14:09 Coruscant kernel: FS: 0000000000000000(0000) GS:ffff88885fa00000(0000) knlGS:0000000000000000 Feb 25 12:14:09 Coruscant kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 25 12:14:09 Coruscant kernel: CR2: 00001490681a8000 CR3: 0000000001e0a001 CR4: 00000000001606e0 Feb 25 12:14:09 Coruscant kernel: Call Trace: Feb 25 12:14:09 Coruscant kernel: <IRQ> Feb 25 12:14:09 Coruscant kernel: ipv4_confirm+0xaf/0xb9 Feb 25 12:14:09 Coruscant kernel: nf_hook_slow+0x3a/0x90 Feb 25 12:14:09 Coruscant kernel: ip_local_deliver+0xad/0xdc Feb 25 12:14:09 Coruscant kernel: ? ip_sublist_rcv_finish+0x54/0x54 Feb 25 12:14:09 Coruscant kernel: ip_sabotage_in+0x38/0x3e Feb 25 12:14:09 Coruscant kernel: nf_hook_slow+0x3a/0x90 Feb 25 12:14:09 Coruscant kernel: ip_rcv+0x8e/0xbe Feb 25 12:14:09 Coruscant kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 Feb 25 12:14:09 Coruscant kernel: __netif_receive_skb_one_core+0x53/0x6f Feb 25 12:14:09 Coruscant kernel: process_backlog+0x77/0x10e Feb 25 12:14:09 Coruscant kernel: net_rx_action+0x107/0x26c Feb 25 12:14:09 Coruscant kernel: __do_softirq+0xc9/0x1d7 Feb 25 12:14:09 Coruscant kernel: do_softirq_own_stack+0x2a/0x40 Feb 25 12:14:09 Coruscant kernel: </IRQ> Feb 25 12:14:09 Coruscant kernel: do_softirq+0x4d/0x5a Feb 25 12:14:09 Coruscant kernel: netif_rx_ni+0x1c/0x22 Feb 25 12:14:09 Coruscant kernel: macvlan_broadcast+0x111/0x156 [macvlan] Feb 25 12:14:09 Coruscant kernel: ? __switch_to_asm+0x41/0x70 Feb 25 12:14:09 Coruscant kernel: macvlan_process_broadcast+0xea/0x128 [macvlan] Feb 25 12:14:09 Coruscant kernel: process_one_work+0x16e/0x24f Feb 25 12:14:09 Coruscant kernel: worker_thread+0x1e2/0x2b8 Feb 25 12:14:09 Coruscant kernel: ? rescuer_thread+0x2a7/0x2a7 Feb 25 12:14:09 Coruscant kernel: kthread+0x10c/0x114 Feb 25 12:14:09 Coruscant kernel: ? kthread_park+0x89/0x89 Feb 25 12:14:09 Coruscant kernel: ret_from_fork+0x35/0x40 Feb 25 12:14:09 Coruscant kernel: ---[ end trace 949816e8afd00bfc ]--- This week I also plan on upgrading the server BIOS etc... to get everything up to snuff as well. Quote Link to comment
JorgeB Posted February 29, 2020 Share Posted February 29, 2020 Macvlan call traces are usually caused by dockers with a custom IP address. Quote Link to comment
tehtide Posted February 29, 2020 Author Share Posted February 29, 2020 6 minutes ago, johnnie.black said: Macvlan call traces are usually caused by dockers with a custom IP address. The only dockers I have running right now are on br0 with custom IP's. Is that a support configuration? Should I drop everything back to bridge mode and see if that is the issue? Or is there something else going on here? Thanks! Quote Link to comment
JorgeB Posted March 2, 2020 Share Posted March 2, 2020 Sorry the late replay, was away for the weekend, see here for more info: Quote Link to comment
Sentie Posted March 5, 2020 Share Posted March 5, 2020 On 2/25/2020 at 6:30 AM, tehtide said: Were you able to correct it? I haven't even gotten a chance to try yet. I ended up rolling back to 6.8.0 and found that I had memory problems just finally got those figured out enough to move on to other things I hope that the ram was also my problem though. I did try to upgrade last night and the check for updates button seems to be missing from my upgrade os menu. So that will be the next thing I will have to figure out. Quote Link to comment
Chess Posted March 5, 2020 Share Posted March 5, 2020 1 hour ago, Sentie said: I haven't even gotten a chance to try yet. I ended up rolling back to 6.8.0 and found that I had memory problems just finally got those figured out enough to move on to other things I hope that the ram was also my problem though. I did try to upgrade last night and the check for updates button seems to be missing from my upgrade os menu. So that will be the next thing I will have to figure out. Sorry, been away with a minor illness. Memory issues will cause a lot of random crashes in Linux that you might not see in Windows. There is a way to do a manually upgrade by downloading the zip and extracting some of the files to the USB stick. Do a search on the forums and see if you can find it. If not, let me know and I'll write out what I did when I had to do a manual downgrade. Should work the same. Quote Link to comment
tehtide Posted March 8, 2020 Author Share Posted March 8, 2020 OK so use by reverting to 6.8.1 I've had no more issues with kernel panics. So is there anything in 6.8.2 that would have changed to cause issues? And would I be better off sitting at 6.8.1 or jump to 6.8.3? Quote Link to comment
Sentie Posted March 9, 2020 Share Posted March 9, 2020 managed to have a few free moments today and I was able to upgrade to 6.8.3 just fine Quote Link to comment
tehtide Posted March 14, 2020 Author Share Posted March 14, 2020 On 3/9/2020 at 2:57 AM, Sentie said: managed to have a few free moments today and I was able to upgrade to 6.8.3 just fine How are things going? Any crashes? Quote Link to comment
Sentie Posted March 14, 2020 Share Posted March 14, 2020 (edited) Nope no problems. With the upgrade atleat. Just have to figure out the ram problem now. I'm trying to work with the manufacturer on that but there response times are bad right now. Edited March 14, 2020 by Sentie Words Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.