sirkuz Posted October 14, 2020 Share Posted October 14, 2020 Before I revert back to standard build to see if anything changes I thought I would first post. I have had the nvidia custom build running for months without any issues and then a month or so ago I started crashing. It typically happens randomly as far as I can tell anywhere from a few hours to 3-5 days max. Attached is the console output as well as pertinent part of the logs. Perhaps someone more familiar with them could let me know if it looks more hardware related (failing mem/cpu) or software. Thank you in advance! Oct 9 03:25:51 Tower root: mover: finished Oct 9 03:29:53 Tower kernel: mdcmd (336): spindown 2 Oct 9 03:30:52 Tower kernel: mdcmd (337): spindown 10 Oct 9 03:31:07 Tower kernel: mdcmd (338): spindown 7 Oct 9 03:31:09 Tower kernel: mdcmd (339): spindown 9 Oct 9 03:31:10 Tower kernel: mdcmd (340): spindown 11 Oct 9 03:32:22 Tower kernel: mdcmd (341): spindown 8 Oct 9 03:32:51 Tower kernel: mdcmd (342): spindown 4 Oct 9 03:37:14 Tower kernel: mdcmd (343): spindown 3 Oct 9 03:40:17 Tower kernel: mdcmd (344): spindown 13 Oct 9 03:47:39 Tower kernel: mdcmd (345): spindown 15 Oct 9 03:49:22 Tower kernel: mdcmd (346): spindown 16 Oct 9 03:57:26 Tower kernel: mdcmd (347): spindown 1 Oct 9 04:10:39 Tower kernel: mdcmd (348): spindown 0 Oct 9 04:10:42 Tower kernel: mdcmd (349): spindown 6 Oct 9 04:10:43 Tower kernel: mdcmd (350): spindown 29 Oct 9 09:57:03 Tower nginx: 2020/10/09 09:57:03 [error] 13476#13476: *1269818 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.199.101, server: , request: "POST /webGui/include/DeviceList.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "8a3d2bfb48855a17ee15048f7164ef99e10efe5f.unraid.net:4443", referrer: "https://8a3d2bfb48855a17ee15048f7164ef99e10efe5f.unraid.net:4443/Main" Oct 9 09:57:03 Tower php-fpm[13435]: [WARNING] [pool www] child 6660 exited on signal 7 (SIGBUS) after 124.006379 seconds from start Oct 9 10:15:25 Tower kernel: mdcmd (351): spindown 1 Oct 9 10:15:35 Tower kernel: mdcmd (352): spindown 7 Oct 9 10:32:22 Tower kernel: mdcmd (353): spindown 11 Oct 9 11:14:57 Tower kernel: mdcmd (354): spindown 5 Oct 9 11:51:28 Tower kernel: mdcmd (355): spindown 4 Oct 9 11:51:43 Tower kernel: mdcmd (356): spindown 14 Oct 9 11:53:14 Tower kernel: mdcmd (357): spindown 12 Oct 9 11:53:15 Tower kernel: mdcmd (358): spindown 13 Oct 9 11:54:17 Tower kernel: mdcmd (359): spindown 9 Oct 9 11:54:17 Tower kernel: mdcmd (360): spindown 10 Oct 9 11:54:18 Tower kernel: mdcmd (361): spindown 11 Oct 9 11:55:18 Tower kernel: mdcmd (362): spindown 16 Oct 9 11:55:56 Tower kernel: mdcmd (363): spindown 3 Oct 9 11:56:15 Tower kernel: mdcmd (364): spindown 2 Oct 9 13:29:26 Tower kernel: WARNING: CPU: 16 PID: 27141 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e Oct 9 13:29:26 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_intel kvm nvidia_uvm(O) xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables veth macvlan xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod mlx4_en mlx4_core igb(O) nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crc32_pclmul intel_rapl_perf intel_uncore pcbc aesni_intel aes_x86_64 glue_helper crypto_simd ghash_clmulni_intel cryptd intel_cstate coretemp crct10dif_pclmul intel_powerclamp drm_kms_helper crc32c_intel sb_edac drm syscopyarea mpt3sas isci x86_pkg_temp_thermal sysfillrect rsnvme(PO) sysimgblt fb_sys_fops ahci raid_class libsas nvme i2c_i801 nvme_core libahci agpgart wmi scsi_transport_sas ipmi_ssif pcc_cpufreq button Oct 9 13:29:26 Tower kernel: i2c_core ipmi_si [last unloaded: tun] Oct 9 13:29:26 Tower kernel: CPU: 16 PID: 27141 Comm: kworker/16:1 Tainted: P O 4.19.107-Unraid #1 Oct 9 13:29:26 Tower kernel: Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.3 05/23/2018 Oct 9 13:29:26 Tower kernel: Workqueue: events macvlan_process_broadcast [macvlan] Oct 9 13:29:26 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e Oct 9 13:29:26 Tower kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 95 f1 ff ff be 00 02 00 00 48 Oct 9 13:29:26 Tower kernel: RSP: 0018:ffff889fffa03d58 EFLAGS: 00010202 Oct 9 13:29:26 Tower kernel: RAX: 0000000000000188 RBX: ffff888139ac1300 RCX: ffff88a0dc239098 Oct 9 13:29:26 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e09150 Oct 9 13:29:26 Tower kernel: RBP: ffff88a0dc239040 R08: 0000000029e50e39 R09: ffffffff81c8aa80 Oct 9 13:29:26 Tower kernel: R10: 0000000000000158 R11: ffffffff81e91080 R12: 000000000000aad4 Oct 9 13:29:26 Tower kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 000000000000f92c Oct 9 13:29:26 Tower kernel: FS: 0000000000000000(0000) GS:ffff889fffa00000(0000) knlGS:0000000000000000 Oct 9 13:29:26 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 9 13:29:26 Tower kernel: CR2: 0000148269c74000 CR3: 0000000001e0a001 CR4: 00000000001606e0 Oct 9 13:29:26 Tower kernel: Call Trace: Oct 9 13:29:26 Tower kernel: <IRQ> Oct 9 13:29:26 Tower kernel: ipv4_confirm+0xaf/0xb9 Oct 9 13:29:26 Tower kernel: nf_hook_slow+0x3a/0x90 Oct 9 13:29:26 Tower kernel: ip_local_deliver+0xad/0xdc Oct 9 13:29:26 Tower kernel: ? ip_sublist_rcv_finish+0x54/0x54 Oct 9 13:29:26 Tower kernel: ip_sabotage_in+0x38/0x3e Oct 9 13:29:26 Tower kernel: nf_hook_slow+0x3a/0x90 Oct 9 13:29:26 Tower kernel: ip_rcv+0x8e/0xbe Oct 9 13:29:26 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 Oct 9 13:29:26 Tower kernel: __netif_receive_skb_one_core+0x53/0x6f Oct 9 13:29:26 Tower kernel: process_backlog+0x77/0x10e Oct 9 13:29:26 Tower kernel: net_rx_action+0x107/0x26c Oct 9 13:29:26 Tower kernel: __do_softirq+0xc9/0x1d7 Oct 9 13:29:26 Tower kernel: do_softirq_own_stack+0x2a/0x40 Oct 9 13:29:26 Tower kernel: </IRQ> Oct 9 13:29:26 Tower kernel: do_softirq+0x4d/0x5a Oct 9 13:29:26 Tower kernel: netif_rx_ni+0x1c/0x22 Oct 9 13:29:26 Tower kernel: macvlan_broadcast+0x111/0x156 [macvlan] Oct 9 13:29:26 Tower kernel: ? __switch_to_asm+0x41/0x70 Oct 9 13:29:26 Tower kernel: macvlan_process_broadcast+0xea/0x128 [macvlan] Oct 9 13:29:26 Tower kernel: process_one_work+0x16e/0x24f Oct 9 13:29:26 Tower kernel: worker_thread+0x1e2/0x2b8 Oct 9 13:29:26 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Oct 9 13:29:26 Tower kernel: kthread+0x10c/0x114 Oct 9 13:29:26 Tower kernel: ? kthread_park+0x89/0x89 Oct 9 13:29:26 Tower kernel: ret_from_fork+0x35/0x40 Oct 9 13:29:26 Tower kernel: ---[ end trace de4fa2592551a7a5 ]--- Oct 9 13:34:02 Tower kernel: mdcmd (365): spindown 6 Oct 9 14:30:16 Tower kernel: mdcmd (366): spindown 4 Oct 9 14:30:33 Tower kernel: mdcmd (367): spindown 2 Oct 9 14:31:28 Tower kernel: mdcmd (368): spindown 8 Oct 9 14:31:29 Tower kernel: mdcmd (369): spindown 9 Oct 9 14:31:34 Tower kernel: mdcmd (370): spindown 1 Oct 9 14:32:16 Tower kernel: mdcmd (371): spindown 10 Oct 9 14:33:12 Tower kernel: mdcmd (372): spindown 12 Oct 9 16:04:08 Tower kernel: mdcmd (373): spindown 1 Oct 9 19:50:17 Tower kernel: mdcmd (374): spindown 10 Oct 9 19:50:27 Tower kernel: mdcmd (375): spindown 11 Oct 9 19:50:28 Tower kernel: mdcmd (376): spindown 13 Oct 9 19:50:30 Tower kernel: mdcmd (377): spindown 8 Oct 9 19:50:32 Tower kernel: mdcmd (378): spindown 2 Oct 9 19:50:33 Tower kernel: mdcmd (379): spindown 4 Oct 9 19:50:35 Tower kernel: mdcmd (380): spindown 9 Oct 9 19:50:37 Tower kernel: mdcmd (381): spindown 5 Oct 9 19:50:39 Tower kernel: mdcmd (382): spindown 14 Oct 9 19:50:41 Tower kernel: mdcmd (383): spindown 15 Oct 9 19:50:48 Tower kernel: mdcmd (384): spindown 16 Oct 9 19:50:50 Tower kernel: mdcmd (385): spindown 17 Oct 9 19:50:52 Tower kernel: mdcmd (386): spindown 1 Oct 9 19:50:54 Tower kernel: mdcmd (387): spindown 3 Oct 9 19:50:56 Tower kernel: mdcmd (388): spindown 6 Oct 9 20:12:25 Tower kernel: mdcmd (389): spindown 12 Oct 9 20:59:14 Tower kernel: mdcmd (390): spindown 4 Oct 9 20:59:25 Tower kernel: mdcmd (391): spindown 2 Oct 9 20:59:38 Tower kernel: mdcmd (392): spindown 9 Oct 9 21:00:00 Tower kernel: mdcmd (393): spindown 10 Oct 9 23:15:03 Tower kernel: mdcmd (394): spindown 17 Quote Link to comment
JorgeB Posted October 14, 2020 Share Posted October 14, 2020 Macvlan call traces are usually caused by having dockers with a custom IP address, more info here: Quote Link to comment
sirkuz Posted October 14, 2020 Author Share Posted October 14, 2020 Thank you kindly Jorge! Will be looking that over and adjusting as needed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.