gdeyoung

Members
  • Content Count

    30
  • Joined

  • Last visited

Community Reputation

2 Neutral

About gdeyoung

  • Rank
    Newbie

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. So a quick update to close this out as fixed. - Ran all three servers with issues in safe mode and had no kernel panics and were stable. - Began trying to troubleshoot down which docker container or plugin was causing the instability - Found out it was the Technitum DNS docker container in community apps. With one wrinkle, on a 1G connection it was stable. On a 10G connection it generated kernel panics and spin lock errors. The stock install was on BR0 so wonder if it was in the dockers config for networking. - Once I removed the Technitum DNS server docker from t
  2. The 3rd server is in safe mode and still going I switched the 2nd server back to 10G in normal mode with no file copies and it panic'd in 20 minutes. diag's attached mediaserver-diagnostics-20210122-1520.zip
  3. Ok, I put one of the servers in safe mode on 10G and doing some file copies.
  4. They have the default 1500 MTU on the 10G nic. Should I be using 9000 for the jumbo frames?
  5. Yes, the Intel NIC seem to be more stable. I was also having issues with some of my Windows PC with Aquantia 10G and transfers, so I switched to Intel 10G across the board. Yes, rolled back to 6.8.3 and had the same issues for both nics. My other observation is the panics are happening on the ingest servers where I copy files to more often.
  6. @JorgeB Thank for continuing to engage, I really appreciate it. I have completed troubleshooting to try and localize down the issues. I have completely rebuilt two of the four servers with new components the only remaining thing is the Drives and still get the panic issues. I have swapped out all of the all of the network gear, three different 10G switches, new cables. I have removed all external items or replaced several times with new and still get the panics. In the last couple of days I switched two of the servers back to 1G and they are rock solid with no issue
  7. So 2 days ago I switched the my 2nd server from 10g to 1G. 1 day ago I switched my 3rd server to 1G from 10G. These are all different hardware machines Intel & Ryzen running a combo of 6.8.3 and 6.9rc2. All of my servers on 10G (all on their swapped out/2nd 10G NIC) kernel panics under heavy/sustained file copy within 24hrs. Without heavy file load they will panic under 72hrs. I have reworked network and simplified network configs. I have up to date bios on mobo's. It all comes down to sustained load on the 10G Intel and Aquantia nics. I have even three 3 different 10G switches, n
  8. Server 3 just panic'd again. again this is a 10G server. also on it's second 10G Intel nic. It appears the panics happen more under large file copy loads on the 10G connection. Will move it back to 1G to see if it makes a difference. Jan 19 16:27:05 Homeserver kernel: Call Trace: Jan 19 16:27:05 Homeserver kernel: <IRQ> Jan 19 16:27:05 Homeserver kernel: dump_stack+0x67/0x83 Jan 19 16:27:05 Homeserver kernel: nmi_cpu_backtrace+0x71/0x83 Jan 19 16:27:05 Homeserver kernel: ? lapic_can_unplug_cpu+0x97/0x97 Jan 19 16:27:05 Homeserver kernel: nmi_trigger_cpumask_back
  9. Ok to update this thread. I tried going back to 6.8.3 on the 2nd and 3rd of my 4 servers that are kernel panicking and they still having panics and crashes daily. My only server that is not experiencing any issues is my 4thone that is 1G connected one. All of my 10G are panicking, and I have replaced the nics to intel server class 10g nics. I finally took my 2nd server back to a 1G connection to see if that stays stable. I have more log snippets from the 10G servers. It looks like they are also having a native_queued_spin_lock_slowpath error in the panic. Call Trac
  10. So my second server just crashed with a kernel panic, all three are having panics and they are all different hardware. Any idea from this trace? Jan 15 22:36:42 Mediaserver kernel: rcu: INFO: rcu_sched self-detected stall on CPU Jan 15 22:36:42 Mediaserver kernel: rcu: #0110-....: (59999 ticks this GP) idle=e7a/1/0x4000000000000000 softirq=11770626/11770626 fqs=14993 Jan 15 22:36:42 Mediaserver kernel: #011(t=60000 jiffies g=13660245 q=3404623) Jan 15 22:36:42 Mediaserver kernel: NMI backtrace for cpu 0 Jan 15 22:36:42 Mediaserver kernel: CPU: 0 PID: 28592 Comm: kworker/u2
  11. Yes, I have the GPU stat plugin installed. Any insight on the kernel panic trace above?
  12. Does the above traces connect with the Nvidia driver at all? I'm seeing this in the log this morning after a reboot repeated a lot. Jan 15 10:12:30 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Jan 15 10:12:30 Homeserver kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs Jan 15 10:12:32 Homeserver kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] J
  13. Happened again 24hrs later. Page faulted: Jan 11 08:30:00 Homeserver kernel: BUG: unable to handle page fault for address: 00000000000053d8 Whole trace: an 11 04:07:30 Homeserver kernel: br0: port 1(bond0) entered forwarding state Jan 11 04:08:25 Homeserver flash_backup: adding task: php /usr/local/emhttp/plugins/dynamix.unraid.net/include/UpdateFlashBackup.php update Jan 11 04:15:05 Homeserver kernel: br0: received packet on bond0 with own address as source address (addr:30:9c:23:af:51:e0, vlan:0) Jan 11 04:15:05 Homeserver kernel: br0: received pac
  14. Can anyone take a look and let me know what this kernel panic is caused by. Here is the trace from the syslog. I'm actually getting these somewhat regularlly on three different 6.9-RC@ unraid servers. Jan 12 07:15:08 Homeserver kernel: ------------[ cut here ]------------ Jan 12 07:15:08 Homeserver kernel: WARNING: CPU: 0 PID: 0 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x99/0x1e1 Jan 12 07:15:08 Homeserver kernel: Modules linked in: xt_CHECKSUM ipt_REJECT macvlan ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost