DuzAwe Posted April 18, 2021 Share Posted April 18, 2021 So on and off since my jump to 6.9-RC2 I have had a freeze issue. I have replaced the mother board and ram in that time. Ram is now ECC, looking at the logs in my understanding it looks like a drive crash? Either MacVlan or Nvidia. Syslog attached, Help is much appreciated. Server must be hard reset to get any access so diags arent possible. Apr 18 04:13:38 thelibrary kernel: NETDEV WATCHDOG: eth1 (igb): transmit queue 2 timed out Apr 18 04:13:38 thelibrary kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0xcf/0x12b Apr 18 04:13:38 thelibrary kernel: Modules linked in: macvlan md_mod nvidia_uvm(PO) veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) nct6775 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb i2c_algo_bit ipmi_ssif amd64_edac_mod edac_mce_amd kvm_amd kvm wmi_bmof crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd mpt3sas cryptd nvme nvme_core ccp ahci i2c_piix4 wmi raid_class glue_helper scsi_transport_sas rapl k10temp i2c_core acpi_ipmi libahci button ipmi_si acpi_cpufreq [last unloaded: md_mod] syslog Quote Link to comment
JorgeB Posted April 18, 2021 Share Posted April 18, 2021 Macvlan call traces are usually the result of having dockers with a custom IP address, more info below. https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ Quote Link to comment
DuzAwe Posted April 18, 2021 Author Share Posted April 18, 2021 Had two br0 dockers both of which where disabled at the time of the panic. I have removed them completely this morning. Would disabled dockers still cause issues? Quote Link to comment
JorgeB Posted April 18, 2021 Share Posted April 18, 2021 If there are still macvlan call traces there's still a problem with that, unless they were old, syslog is spammed with NVIDIA errors so more difficult to analyze. Quote Link to comment
DuzAwe Posted April 18, 2021 Author Share Posted April 18, 2021 (edited) OK makes sense, Dockers are gone now. Attached is what remains. I have also changed form a bonded interface for my NIC to a bridge Edited April 18, 2021 by DuzAwe Quote Link to comment
John_M Posted April 18, 2021 Share Posted April 18, 2021 FWIW the Nvidia messages are caused by the GPU Stats plugin calling the nvidia-smi utility every second. If you want to get rid of the messages for troubleshooting purposes you'll need to remove the plugin temporarily. See here. Quote Link to comment
DuzAwe Posted April 18, 2021 Author Share Posted April 18, 2021 Already removed, Saw that info during this week. Only GPU stuff installed is the driver plugin, a user script to allow more then three streams and plex. Far as I am aware nothing else is touching nvidia-smi or the gpu. Quote Link to comment
DuzAwe Posted April 18, 2021 Author Share Posted April 18, 2021 (edited) Looks like I have had another kernal panic/macvlan crash but no lock up this time. I am able to export diags as a result, hopefully it shows something that can stop this all together. As I said earlier, I don't have any Dockers any more with static custom set ips in my doicker set up. Apr 18 14:54:38 thelibrary kernel: ------------[ cut here ]------------ Apr 18 14:54:38 thelibrary kernel: WARNING: CPU: 6 PID: 13151 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack] Apr 18 14:54:38 thelibrary kernel: Modules linked in: macvlan nvidia_uvm(PO) veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) nct6775 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb i2c_algo_bit ipmi_ssif amd64_edac_mod edac_mce_amd kvm_amd wmi_bmof kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel mpt3sas crypto_simd i2c_piix4 cryptd i2c_core nvme raid_class glue_helper ccp nvme_core scsi_transport_sas rapl ahci wmi acpi_ipmi k10temp libahci button ipmi_si acpi_cpufreq [last unloaded: i2c_algo_bit] Apr 18 14:54:38 thelibrary kernel: CPU: 6 PID: 13151 Comm: kworker/6:0 Tainted: P O 5.10.28-Unraid #1 Apr 18 14:54:38 thelibrary kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470D4U, BIOS P3.50 11/02/2020 Apr 18 14:54:38 thelibrary kernel: Workqueue: events macvlan_process_broadcast [macvlan] Apr 18 14:54:38 thelibrary kernel: RIP: 0010:__nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack] Apr 18 14:54:38 thelibrary kernel: Code: e8 dc f8 ff ff 44 89 fa 89 c6 41 89 c4 48 c1 eb 20 89 df 41 89 de e8 36 f6 ff ff 84 c0 75 bb 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 89 df 44 89 e6 31 db e8 6d f3 ff ff e8 35 f5 ff ff e9 22 01 Apr 18 14:54:38 thelibrary kernel: RSP: 0018:ffffc9000031cd38 EFLAGS: 00010202 Apr 18 14:54:38 thelibrary kernel: RAX: 0000000000000188 RBX: 0000000000006b19 RCX: 00000000455a00ea Apr 18 14:54:38 thelibrary kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa01e8e64 Apr 18 14:54:38 thelibrary kernel: RBP: ffff8881c977ea80 R08: 000000001f21a935 R09: ffff8881965b4c20 Apr 18 14:54:38 thelibrary kernel: R10: 0000000000000158 R11: ffff888195fdbf00 R12: 0000000000008cdd Apr 18 14:54:38 thelibrary kernel: R13: ffffffff8210b440 R14: 0000000000006b19 R15: 0000000000000000 Apr 18 14:54:38 thelibrary kernel: FS: 0000000000000000(0000) GS:ffff8887fe980000(0000) knlGS:0000000000000000 Apr 18 14:54:38 thelibrary kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 18 14:54:38 thelibrary kernel: CR2: 000015146e169000 CR3: 0000000246280000 CR4: 0000000000350ee0 Apr 18 14:54:38 thelibrary kernel: Call Trace: Apr 18 14:54:38 thelibrary kernel: <IRQ> Apr 18 14:54:38 thelibrary kernel: nf_conntrack_confirm+0x2f/0x36 [nf_conntrack] Apr 18 14:54:38 thelibrary kernel: nf_hook_slow+0x39/0x8e Apr 18 14:54:38 thelibrary kernel: nf_hook.constprop.0+0xb1/0xd8 Apr 18 14:54:38 thelibrary kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe Apr 18 14:54:38 thelibrary kernel: ip_local_deliver+0x49/0x75 Apr 18 14:54:38 thelibrary kernel: ip_sabotage_in+0x43/0x4d [br_netfilter] Apr 18 14:54:38 thelibrary kernel: nf_hook_slow+0x39/0x8e Apr 18 14:54:38 thelibrary kernel: nf_hook.constprop.0+0xb1/0xd8 Apr 18 14:54:38 thelibrary kernel: ? l3mdev_l3_rcv.constprop.0+0x50/0x50 Apr 18 14:54:38 thelibrary kernel: ip_rcv+0x41/0x61 Apr 18 14:54:38 thelibrary kernel: __netif_receive_skb_one_core+0x74/0x95 Apr 18 14:54:38 thelibrary kernel: process_backlog+0xa3/0x13b Apr 18 14:54:38 thelibrary kernel: net_rx_action+0xf4/0x29d Apr 18 14:54:38 thelibrary kernel: __do_softirq+0xc4/0x1c2 Apr 18 14:54:38 thelibrary kernel: asm_call_irq_on_stack+0x12/0x20 Apr 18 14:54:38 thelibrary kernel: </IRQ> Apr 18 14:54:38 thelibrary kernel: do_softirq_own_stack+0x2c/0x39 Apr 18 14:54:38 thelibrary kernel: do_softirq+0x3a/0x44 Apr 18 14:54:38 thelibrary kernel: netif_rx_ni+0x1c/0x22 Apr 18 14:54:38 thelibrary kernel: macvlan_broadcast+0x10e/0x13c [macvlan] Apr 18 14:54:38 thelibrary kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan] Apr 18 14:54:38 thelibrary kernel: process_one_work+0x13c/0x1d5 Apr 18 14:54:38 thelibrary kernel: worker_thread+0x18b/0x22f Apr 18 14:54:38 thelibrary kernel: ? process_scheduled_works+0x27/0x27 Apr 18 14:54:38 thelibrary kernel: kthread+0xe5/0xea Apr 18 14:54:38 thelibrary kernel: ? __kthread_bind_mask+0x57/0x57 Apr 18 14:54:38 thelibrary kernel: ret_from_fork+0x22/0x30 Apr 18 14:54:38 thelibrary kernel: ---[ end trace d16416a764eaff38 ]--- thelibrary-diagnostics-20210418-2211.zip Edited April 18, 2021 by DuzAwe Quote Link to comment
JorgeB Posted April 19, 2021 Share Posted April 19, 2021 12 hours ago, DuzAwe said: I don't have any Dockers any more with static custom set ips in my doicker set up. That's very strange, don't remember ever seing macvlan issues without that. Quote Link to comment
DuzAwe Posted April 19, 2021 Author Share Posted April 19, 2021 So bug? My crashes lock ups started around rc2 and have been pretty random since. Since I swapped to ECC memory I had a pretty smooth run of it about two weeks and then they started up again. Quote Link to comment
JorgeB Posted April 19, 2021 Share Posted April 19, 2021 One thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.