Jump to content

mc_866

Members
  • Posts

    50
  • Joined

  • Last visited

Everything posted by mc_866

  1. OK turned on my Unifi-controller docker and within an hour I got one of the CPU tainted errors with call trace. This was the only docker running. This docker is running with a defined static IP. Is that an issue. I thought this controller app would be very low risk but appears it may be causing my issues. Should I not run this with a static IP? Aug 2 16:30:08 Unraid kernel: WARNING: CPU: 2 PID: 0 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e Aug 2 16:30:08 Unraid kernel: Modules linked in: xt_nat macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod bonding ixgbe(O) igb(O) sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper mxm_wmi intel_cstate intel_uncore intel_rapl_perf mpt3sas i2c_i801 i2c_core nvme raid_class ahci libahci scsi_transport_sas nvme_core pcc_cpufreq wmi ipmi_si acpi_pad button [last unloaded: ixgbe] Aug 2 16:30:08 Unraid kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G O 4.19.107-Unraid #1 Aug 2 16:30:08 Unraid kernel: Hardware name: Supermicro Super Server/X10SDV-2C-TP4F, BIOS 2.1 11/08/2019 Aug 2 16:30:08 Unraid kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e Aug 2 16:30:08 Unraid kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 95 f1 ff ff be 00 02 00 00 48 Aug 2 16:30:08 Unraid kernel: RSP: 0018:ffff88885fb038d0 EFLAGS: 00010202 Aug 2 16:30:08 Unraid kernel: RAX: 0000000000000188 RBX: ffff888219cd2600 RCX: ffff8888064a31d8 Aug 2 16:30:08 Unraid kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e090bc Aug 2 16:30:08 Unraid kernel: RBP: ffff8888064a3180 R08: 00000000b609fd86 R09: ffffffff81c8aa80 Aug 2 16:30:08 Unraid kernel: R10: 0000000000000158 R11: ffffffff81e91080 R12: 0000000000000cdc Aug 2 16:30:08 Unraid kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 000000000000e2af Aug 2 16:30:08 Unraid kernel: FS: 0000000000000000(0000) GS:ffff88885fb00000(0000) knlGS:0000000000000000 Aug 2 16:30:08 Unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 2 16:30:08 Unraid kernel: CR2: 000000000053ce00 CR3: 0000000001e0a001 CR4: 00000000003606e0 Aug 2 16:30:08 Unraid kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 2 16:30:08 Unraid kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 2 16:30:08 Unraid kernel: Call Trace: Aug 2 16:30:08 Unraid kernel: <IRQ> Aug 2 16:30:08 Unraid kernel: ipv4_confirm+0xaf/0xb9 Aug 2 16:30:08 Unraid kernel: nf_hook_slow+0x3a/0x90 Aug 2 16:30:08 Unraid kernel: ip_local_deliver+0xad/0xdc Aug 2 16:30:08 Unraid kernel: ? ip_sublist_rcv_finish+0x54/0x54 Aug 2 16:30:08 Unraid kernel: ip_sabotage_in+0x38/0x3e Aug 2 16:30:08 Unraid kernel: nf_hook_slow+0x3a/0x90 Aug 2 16:30:08 Unraid kernel: ip_rcv+0x8e/0xbe Aug 2 16:30:08 Unraid kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 Aug 2 16:30:08 Unraid kernel: __netif_receive_skb_one_core+0x53/0x6f Aug 2 16:30:08 Unraid kernel: netif_receive_skb_internal+0x79/0x94 Aug 2 16:30:08 Unraid kernel: br_pass_frame_up+0x128/0x14a Aug 2 16:30:08 Unraid kernel: ? br_port_flags_change+0x29/0x29 Aug 2 16:30:08 Unraid kernel: br_handle_frame_finish+0x342/0x383 Aug 2 16:30:08 Unraid kernel: ? br_pass_frame_up+0x14a/0x14a Aug 2 16:30:08 Unraid kernel: br_nf_hook_thresh+0xa3/0xc3 Aug 2 16:30:08 Unraid kernel: ? br_pass_frame_up+0x14a/0x14a Aug 2 16:30:08 Unraid kernel: br_nf_pre_routing_finish+0x24a/0x271 Aug 2 16:30:08 Unraid kernel: ? br_pass_frame_up+0x14a/0x14a Aug 2 16:30:08 Unraid kernel: ? br_handle_local_finish+0xe/0xe Aug 2 16:30:08 Unraid kernel: ? nf_nat_ipv4_in+0x1e/0x62 [nf_nat_ipv4] Aug 2 16:30:08 Unraid kernel: ? br_handle_local_finish+0xe/0xe Aug 2 16:30:08 Unraid kernel: br_nf_pre_routing+0x31c/0x343 Aug 2 16:30:08 Unraid kernel: ? br_nf_forward_ip+0x362/0x362 Aug 2 16:30:08 Unraid kernel: nf_hook_slow+0x3a/0x90 Aug 2 16:30:08 Unraid kernel: br_handle_frame+0x27e/0x2bd Aug 2 16:30:08 Unraid kernel: ? br_pass_frame_up+0x14a/0x14a Aug 2 16:30:08 Unraid kernel: __netif_receive_skb_core+0x4a7/0x7b1 Aug 2 16:30:08 Unraid kernel: ? udp_gro_receive+0x4b/0x136 Aug 2 16:30:08 Unraid kernel: __netif_receive_skb_one_core+0x35/0x6f Aug 2 16:30:08 Unraid kernel: netif_receive_skb_internal+0x79/0x94 Aug 2 16:30:08 Unraid kernel: napi_gro_receive+0x44/0x7b Aug 2 16:30:08 Unraid kernel: ixgbe_poll+0xb97/0xce4 [ixgbe] Aug 2 16:30:08 Unraid kernel: net_rx_action+0x107/0x26c Aug 2 16:30:08 Unraid kernel: __do_softirq+0xc9/0x1d7 Aug 2 16:30:08 Unraid kernel: irq_exit+0x5e/0x9d Aug 2 16:30:08 Unraid kernel: do_IRQ+0xb2/0xd0 Aug 2 16:30:08 Unraid kernel: common_interrupt+0xf/0xf Aug 2 16:30:08 Unraid kernel: </IRQ> Aug 2 16:30:08 Unraid kernel: RIP: 0010:cpuidle_enter_state+0xe8/0x141 Aug 2 16:30:08 Unraid kernel: Code: ff 45 84 f6 74 1d 9c 58 0f 1f 44 00 00 0f ba e0 09 73 09 0f 0b fa 66 0f 1f 44 00 00 31 ff e8 7a 8d bb ff fb 66 0f 1f 44 00 00 <48> 2b 2c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00 48 39 cd Aug 2 16:30:08 Unraid kernel: RSP: 0018:ffffc900031d3e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9 Aug 2 16:30:08 Unraid kernel: RAX: ffff88885fb1fac0 RBX: ffff88885fb2a200 RCX: 000000000000001f Aug 2 16:30:08 Unraid kernel: RDX: 0000000000000000 RSI: 000000003a2e90d6 RDI: 0000000000000000 Aug 2 16:30:08 Unraid kernel: RBP: 00014d89d0db269e R08: 00014d89d0db269e R09: 0000000000000354 Aug 2 16:30:08 Unraid kernel: R10: 0000000000376ec0 R11: 071c71c71c71c71c R12: 0000000000000003 Aug 2 16:30:08 Unraid kernel: R13: ffffffff81e5b120 R14: 0000000000000000 R15: ffffffff81e5b258 Aug 2 16:30:08 Unraid kernel: ? cpuidle_enter_state+0xbf/0x141 Aug 2 16:30:08 Unraid kernel: do_idle+0x17e/0x1fc Aug 2 16:30:08 Unraid kernel: cpu_startup_entry+0x6a/0x6c Aug 2 16:30:08 Unraid kernel: start_secondary+0x197/0x1b2 Aug 2 16:30:08 Unraid kernel: secondary_startup_64+0xa4/0xb0 Aug 2 16:30:08 Unraid kernel: ---[ end trace 35a5cd2fd10ccce9 ]--- Aug 2 17:01:01 Unraid kernel: device br0 left promiscuous mode
  2. Thanks for sharing! It's like chasing a ghost. Stability has been good with no dockers. Also wondering if PSU would be a factor. It's connected to UPS. Would that be an issue?
  3. I'm allowing the rebuild to go right now, all dockers stopped. Presently 1 day 3 hr runtime, 6 hours left for rebuild. I don't want to jinx it but it seems like this is the window when it has been locking up. If it runs for a couple more days with stability without the dockers, does it point to a docker issue?
  4. I see you fixed this but I recently replaced some hardware too. If you can root to the console the command that helped me was dhcpcd If I entered that it would refresh my IP and get me one so I could login to the web gui
  5. I don't disagree that it looks like hardware. Just not sure what else to test to prove out hardware vs software issue. Also I just visited the KB/M console and it appears that the machine is still running and I'm attempting to pull diagnostics. Still can't access GUI or shares.
  6. No worries, my build has been all over the place lately as I replace things that I think are causing the issue. Right now I'm at the spot where the only things I haven't changed are the controller card and the USB drive. That's why I'm thinking something here has to be a software issue because the trouble persists with new hardware.
  7. Thanks, I did set that up as mirror to flash. Trouble has been when I copy from flash using MC in terminal the zip is read only. I think I was able to grab a couple logs and will post them as soon as I can get the shares back online. Right now with a directly connected monitor and keyboard I typed the command diagnostics in. Hoping that will complete but I can't presently access the web gui or shares.
  8. Yes I did thank you. I moved off the Ryzen build entirely to an Intel based Pentium D which is basically a embedded Xeon on a Supermicro board. So no longer running AMD. But yes had read through that thread and up until a few weeks ago had been fairly stable for some months.
  9. After adding back my containers I also added my second new drive. I was adding one to parity and then the old parity to the array. This was scheduled to take 2+ days because it was a 12Tb drive. I was roughly ~6 hours from being done last night and the system froze again. Unfortunately I don't think I caught it on the logs. Here is the latest diagnostic from after my hard restart last night. I've already changed the mobo, proc and memory thinking it was all tied to the AMD platform but issues persist. Could it be my USB drive? Any other items to consider? Actually as I type this and look to add the diagnostics zips I realized I can't reach the webui and the shares seem to be down so I crashed AGAIN!
  10. I did a reformat on the cache nvme and added some active cooling for the CPU. The new proc was passively cooled. I then restored my dockers and everything is back up and running. So far so good now for the past ~12 hours.
  11. Thanks! I did have one VM back on the Ryzen system but deleted it when I first started trouble shooting issues on that system. I believe it was my VM that was overrunning my cache drive. I thought I had set it to only use a single disk and not cache but that wasn't the case. So no VM's presently.
  12. I'm not sure what to focus on next so I'm looking for some help. I started having issues over a month ago on a Ryzen 1600 system with B450 board. My cache drive kept filling up and for some reason, I'm not certain it was related the server would hard lock randomly. I attempted to do at least 3 parity checks after those lock ups but it would see to crash before the 2 day check could complete on my 10TB parity. So I decided to change up some hardware thinking that may be the issue. My previous cache drive was a 1TB nvme, I changed that out for a 2TB nvme drive. After making that change I continued to have the hard locks and unclean shutdowns with the Ryzen setup. I did a memtest per a recommendation I had in another post I made and that came back clean after 4 passes so I thought it may have something to do with the Ryzen platform. I then decided to do a motherboard swap and get back to something in the Intel camp and something a bit more "enterprise" so I grabbed a supermicro Petium-D board with integrated 10Gbe and 32GB of ECC. I also grabbed a couple new 12 TB drives to expand my storage. I installed the new board and memory without much issue. I had some trouble with the static IP but I think I'm past that but I do want to revisit the bonding settings because I now have 2 10Gbe ports and 2 1G ports. I added one of the new drives as a parity and that add/rebuild went great. Because my old cache drive was filling up oddly with my old setup the next step I took was to delete my docker image and reinstall the dockers I was using. I was able to complete that yesterday. Once that was done I used my old parity drive to replace a 3TB drive in my array. That rebuild is happening now. During that rebuild I started to see a good amount of errors in the log and also noticed some of my dockers weren't working correctly. I also noticed that the cache drive seems to show write only. Lots to unpack here but the errors that I've been seeing in the log look like they point to the cache drive and maybe disk 2. Unraid kernel: BTRFS error (device loop2): bad tree block start, want 6996787200 have 0 Also Unraid kernel: XFS (md2): Metadata corruption detected at xfs_dinode_verify+0xa5/0x52e [xfs], inode 0x10d8ed11d dinode I've attached the full diagnostics!
×
×
  • Create New...