Diesel

Members
  • Posts

    12
  • Joined

  • Last visited

Everything posted by Diesel

  1. Thanks for this. This issue/solution also applies to the tdarr_node container.
  2. Just as a further follow-up, everything has been working smoothly on the kernel panic front since taking the containers off of the br0 interface. I haven't had a chance to test it again running a container on another VLAN since turning off "host access to custom networks", but I may still get to that at some point. For now, I'm managing without that particular container. I've also upgraded to 6.9.2, as @bonienl mentioned. I'll retest that as well as soon as I'm able.
  3. Mainly posting for posterity now, in case this ends up helping someone with the same problem in the future. I've replaced some of the containers (ie. torrent client) on the VLAN with different apps that don't use the same ports, so no overlap with others and they can live on the bridge network. The remaining containers that were on the vlan have been turned off to see if that stops the crashes. I saw another post that mentioned, in addition to putting br0 containers on their own vlan, also turning off "host access to custom networks", which I needed to enable to allow my apps on the bridge network to talk to those with custom IPs. So, that's on the troubleshooting checklist as well. For now, I'll just leave the VLAN hosts turned off. If I can at least get this box to stay up successfully without crashes, I can live with a workaround, as frustrating as that may be. Then I can tweak things with additional suggested fixes if needed. But I'd at least like to make it to the end of the trial without another crash, if possible.
  4. Looks like it crashed again. Hard locked this time. I couldn't even get the local console or IPMI KVM to wake up to see what happened. Checked the syslog, and it looks like there's still some macvlan issues going on. I had it down to 2 containers on the vlan, and still... Apr 5 15:39:11 unraid kernel: ------------[ cut here ]------------ Apr 5 15:39:11 unraid kernel: WARNING: CPU: 0 PID: 2716 at net/netfilter/nf_nat_core.c:614 nf_nat_setup_info+0x6c/0x652 [nf_nat] Apr 5 15:39:11 unraid kernel: Modules linked in: nvidia_uvm(PO) nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper drm backlight agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) iptable_mangle iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha veth xt_nat macvlan xt_MASQUERADE iptable_nat nf_nat xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables bonding igb i2c_algo_bit intel_powerclamp coretemp kvm_intel kvm mpt3sas mptsas mptscsih mptbase raid_class i2c_i801 crc32c_intel ahci scsi_transport_sas i2c_smbus input_leds intel_cstate i2c_core led_class i5500_temp intel_uncore ipmi_si libahci i7core_edac button acpi_cpufreq [last unloaded: i2c_algo_bit] Apr 5 15:39:11 unraid kernel: CPU: 0 PID: 2716 Comm: kworker/0:3 Tainted: P W O 5.10.21-Unraid #1 Apr 5 15:39:11 unraid kernel: Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.2 07/09/2018 Apr 5 15:39:11 unraid kernel: Workqueue: events macvlan_process_broadcast [macvlan] Apr 5 15:39:11 unraid kernel: RIP: 0010:nf_nat_setup_info+0x6c/0x652 [nf_nat] Apr 5 15:39:11 unraid kernel: Code: 89 fb 49 89 f6 41 89 d4 76 02 0f 0b 48 8b 93 80 00 00 00 89 d0 25 00 01 00 00 45 85 e4 75 07 89 d0 25 80 00 00 00 85 c0 74 07 <0f> 0b e9 1f 05 00 00 48 8b 83 90 00 00 00 4c 8d 6c 24 20 48 8d 73 Apr 5 15:39:11 unraid kernel: RSP: 0000:ffffc90000003c38 EFLAGS: 00010202 Apr 5 15:39:11 unraid kernel: RAX: 0000000000000080 RBX: ffff888707aeba40 RCX: ffff888646fcc540 Apr 5 15:39:11 unraid kernel: RDX: 0000000000000180 RSI: ffffc90000003d14 RDI: ffff888707aeba40 Apr 5 15:39:11 unraid kernel: RBP: ffffc90000003d00 R08: 0000000000000000 R09: ffff88865e1802a0 Apr 5 15:39:11 unraid kernel: R10: 0000000000000158 R11: ffff88874ac92000 R12: 0000000000000000 Apr 5 15:39:11 unraid kernel: R13: 0000000000000000 R14: ffffc90000003d14 R15: 0000000000000001 Apr 5 15:39:11 unraid kernel: FS: 0000000000000000(0000) GS:ffff888627a00000(0000) knlGS:0000000000000000 Apr 5 15:39:11 unraid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 5 15:39:11 unraid kernel: CR2: 00001488a04d6000 CR3: 00000006fee4c002 CR4: 00000000000206f0 Apr 5 15:39:11 unraid kernel: Call Trace: Apr 5 15:39:11 unraid kernel: <IRQ> Apr 5 15:39:11 unraid kernel: ? ip_route_input_slow+0x5e9/0x754 Apr 5 15:39:11 unraid kernel: ? ipt_do_table+0x49b/0x5c0 [ip_tables] Apr 5 15:39:11 unraid kernel: nf_nat_alloc_null_binding+0x71/0x88 [nf_nat] Apr 5 15:39:11 unraid kernel: nf_nat_inet_fn+0x91/0x182 [nf_nat] Apr 5 15:39:11 unraid kernel: nf_hook_slow+0x39/0x8e Apr 5 15:39:11 unraid kernel: nf_hook.constprop.0+0xb1/0xd8 Apr 5 15:39:11 unraid kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe Apr 5 15:39:11 unraid kernel: ip_local_deliver+0x49/0x75 Apr 5 15:39:11 unraid kernel: ip_sabotage_in+0x43/0x4d Apr 5 15:39:11 unraid kernel: nf_hook_slow+0x39/0x8e Apr 5 15:39:11 unraid kernel: nf_hook.constprop.0+0xb1/0xd8 Apr 5 15:39:11 unraid kernel: ? l3mdev_l3_rcv.constprop.0+0x50/0x50 Apr 5 15:39:11 unraid kernel: ip_rcv+0x41/0x61 Apr 5 15:39:11 unraid kernel: __netif_receive_skb_one_core+0x74/0x95 Apr 5 15:39:11 unraid kernel: process_backlog+0xa3/0x13b Apr 5 15:39:11 unraid kernel: net_rx_action+0xf4/0x29d Apr 5 15:39:11 unraid kernel: __do_softirq+0xc4/0x1c2 Apr 5 15:39:11 unraid kernel: asm_call_irq_on_stack+0x12/0x20 Apr 5 15:39:11 unraid kernel: </IRQ> Apr 5 15:39:11 unraid kernel: do_softirq_own_stack+0x2c/0x39 Apr 5 15:39:11 unraid kernel: do_softirq+0x3a/0x44 Apr 5 15:39:11 unraid kernel: netif_rx_ni+0x1c/0x22 Apr 5 15:39:11 unraid kernel: macvlan_broadcast+0x10e/0x13c [macvlan] Apr 5 15:39:11 unraid kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan] Apr 5 15:39:11 unraid kernel: process_one_work+0x13c/0x1d5 Apr 5 15:39:11 unraid kernel: worker_thread+0x18b/0x22f Apr 5 15:39:11 unraid kernel: ? process_scheduled_works+0x27/0x27 Apr 5 15:39:11 unraid kernel: kthread+0xe5/0xea Apr 5 15:39:11 unraid kernel: ? __kthread_bind_mask+0x57/0x57 Apr 5 15:39:11 unraid kernel: ret_from_fork+0x22/0x30 Apr 5 15:39:11 unraid kernel: ---[ end trace 958c8b9071653523 ]--- Apr 5 16:08:03 unraid kernel: eth0: renamed from veth90b5415 Apr 5 16:10:31 unraid kernel: ------------[ cut here ]------------
  5. To clarify, I have UniFi switches and APs, but my router is pFsense on it's own hardware. I did setup the container VLAN to be open to my LAN, but I also set the UniFi controller container to bridge instead of br0 or on the vlan. Like I said, I wanted it to have it's own IP, but for now, I'm managing that with a host override in the pfSense DNS Resolver. However, this isn't my ideal setup. Assuming that I've gotten the kernel panics under control, I too will be eagerly awaiting 6.9.2. But I'm still in wait-and-see mode. My current uptime is barely 25 hours, and I've only got 8 days left on my trial. Yes, I'm aware I can extend the trial, but it honestly shouldn't take me 30 days and numerous fixes just to achieve basic functionality of network storage and Docker containers. If I can get >4 days with this fix, I'll consider it a success and finish migrating the rest of my drives into the array and purchase a license. If not, I fear it's back to the drawing board.
  6. Follow-up: I setup a VLAN for the containers this morning, set the containers to use the new interface, and I'm currently hammering the box via Tdarr, which previously seemed to exacerbate whatever was causing the kernel panics. So, fingers crossed that resolved it. Wait and see... Follow-up question: One of the containers I was using br0 for was just to assign my unifi-controller it's own IP on my LAN subnet. Obviously, putting it on the VLAN puts it in a different subnet, but my preference would be to have the controller on the same subnet as the devices it's controlling. I realize there's no necessity for this if the networking is setup properly, but I'm also experienced enough to know that should problems with my Unifi setup arise in the future, having fewer variables in the mix makes for easier troubleshooting. Given that I seem to be one of the ones randomly afflicted by this issue, am I correct in assuming at this point that there's no way for me to recover this functionality without risking reintroducing the call traces and kernel panics?
  7. I have 3 containers using br0 due to port overlap with other containers. Looking at the thread that @Hoopster mentioned, it looks like setting up a vlan for the containers might be needed to resolve this. Curious though... it this functionality considered broken? I thought the whole point of using br0 was for this use case. Not to my knowledge. I'm not familiar with blake2. The only plugins I've installed (so far) were to try to resolve issues I've been having. Community Applications - self-explanatory. Tips & Tweaks - to kill ssh before stopping the array, since I was having issues with the process hanging when I tried to stop the array. Open Files - to see what files were holding up the array from stopping. NerdPack - to use CLI locate to troubleshoot an issue where a container was putting config files, only to find out that container hadn't been marked as deprecated, but was so out of date, it should have been. Probably no longer needed, but none of the tools in there should cause problems because they're only being used on-demand. CA.Backup - to backup my appdata apps to the array once I installed a cache drive. Nvidia Driver - now removed for troubleshooting, but used to allow Tdarr_node to do hardware transcoding. I haven't installed any other plugins beyond those.
  8. Still crashing, even after removing the Nvidia drivers. syslog
  9. I can try disabling them for troubleshooting, but my hope was to run Tdarr on this server to transcode my downloaded files to maintain consistency, and I got a Quadro P400 specifically for that task. Edit: Driver removed... let's see.
  10. Another one last night. I attached the syslog. syslog
  11. Oh geez, I thought I enabled that previously. It's enabled now, and I'll repost after the next crash. Thanks.
  12. I'm having a persistent issue with kernel panics on Unraid, and I'm trying to troubleshoot my way out of this, but it's not working so I'm turning to you guys. At best, I get about 3 days of uptime before it hits. I've tried a ton of things suggested on this forum and reddit. I'm not ruling out a hardware problem, since the server (Supermicro X8DT3-LN4F, 2x Xeon X5650, 48GB RAM) was bought used on Ebay, but video card and SAS controller are new. Memtest comes up clean. Hard drives are a mix of new and ones I used in my Synology for a bit, but all test clean. I had some issues with UDMA CRC SMART errors when I first setup Unraid, but figured out it was the onboard SATA ports being flaky, which is why I switched to the SAS controller. I'm about halfway through my trial, and I really want to get this working because I love what Unraid offers, but it's getting to be sink-or-swim time to decide if I stick with the platform or cut my loses and move to something else. I can't pinpoint what's causing it, so any suggestions would be appreciated. unraid-diagnostics-20210401-0911.zip