elbweb Posted April 10, 2020 Share Posted April 10, 2020 Hardware: x399, Threadripper 2920X 64gb ram SAS 9211-8I 12x8tb drives 1tb, 512gb and 512gb nvme for btrfs cache Background - bought a new server case (supermicro 846). Tried the new server backplane / swapable bays and everything worked fine. In moving over to the new case and (I'm assuming) I broke the MB - would no longer post. Bought a new MB, replaced it, and it started booting, but lots of strange issues - so far I've tried a few things: Reinstalled 6.8.3 (instead of the nvidia driver version) Every combination of disabled cstates and psu idle power states (some kernel dumps were referencing CPUIDLE in the error) Disabled all docker autostarts (seemed like I got some docker related errors?) Removed references to old pass-through GPU from my plex container (thinking GUIDs might be different?) Latest BIOS on the motherboard Currently there are no autostart VMs or Dockers, I autostart the array and it crashes after a few minutes. Occasionally will not boot at all. There is apparently a tower diagnostics zip in the logs fodler from about 90 minutes prior to the log before, I'm not sure what I did to trigger it, though. I've attached it. It had once been on long enough that I had enabled mirroring the syslog to flash, got this chunk before it died: Apr 9 21:23:04 Tower kernel: traps: notify[7202] general protection ip:68d370 sp:7ffeee22bfb8 error:0 in php[433000+2b4000] Apr 9 21:23:05 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="7171" x-info="https://www.rsyslog.com"] start Apr 9 21:24:07 Tower kernel: notify[7605]: segfault at 502 ip 000000000065caae sp 00007ffebc8df200 error 4 in php[433000+2b4000] Apr 9 21:24:07 Tower kernel: Code: 15 81 8c 24 b4 00 00 00 00 00 00 01 83 ea 01 89 94 24 d0 00 00 00 48 8b 40 08 48 83 f8 01 76 28 48 3d ff 01 00 00 76 15 31 ed <80> 38 3f 40 0f 94 c5 48 01 c5 4d 85 e4 0f 84 7f 05 00 00 81 8c 24 Apr 9 21:24:07 Tower kernel: mdcmd (49): nocheck Pause Apr 9 21:24:50 Tower init: Switching to runlevel: 0 Apr 9 21:24:50 Tower init: Trying to re-exec init Apr 9 21:25:34 Tower ntpd[2465]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Apr 9 21:25:48 Tower kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Apr 9 21:25:48 Tower kernel: rcu: 21-...0: (67 ticks this GP) idle=836/1/0x4000000000000000 softirq=3068/3068 fqs=58815 Apr 9 21:25:48 Tower kernel: rcu: (detected by 18, t=240007 jiffies, g=14481, q=73080) Apr 9 21:25:48 Tower kernel: Sending NMI from CPU 18 to CPUs 21: Apr 9 21:25:48 Tower kernel: NMI backtrace for cpu 21 Apr 9 21:25:48 Tower kernel: CPU: 21 PID: 5299 Comm: unraidd0 Tainted: G D O 4.19.107-Unraid #1 Apr 9 21:25:48 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019 Apr 9 21:25:48 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x6b/0x171 Apr 9 21:25:48 Tower kernel: Code: 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 00 65 48 03 15 80 6a f8 Apr 9 21:25:48 Tower kernel: RSP: 0018:ffffc9000730bd80 EFLAGS: 00000002 Apr 9 21:25:48 Tower kernel: RAX: 0000000000000101 RBX: ffff888ff8830b08 RCX: 0000000000000000 Apr 9 21:25:48 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff889031c3fd70 Apr 9 21:25:48 Tower kernel: RBP: ffff889031c3fd70 R08: 0000000000000000 R09: ffffc9000730bd48 Apr 9 21:25:48 Tower kernel: R10: 0000000000000fe0 R11: ffff888ff8830b88 R12: ffff888ff8830af8 Apr 9 21:25:48 Tower kernel: R13: ffff889031c3f800 R14: ffff888ff8831540 R15: ffff888ffc16e800 Apr 9 21:25:48 Tower kernel: FS: 0000000000000000(0000) GS:ffff88903d340000(0000) knlGS:0000000000000000 Apr 9 21:25:48 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 9 21:25:48 Tower kernel: CR2: 0000000000514e54 CR3: 0000000fec62c000 CR4: 00000000003406e0 Apr 9 21:25:48 Tower kernel: Call Trace: Apr 9 21:25:48 Tower kernel: _raw_spin_lock_irq+0x1d/0x20 Apr 9 21:25:48 Tower kernel: release_stripe+0x1b/0x3d [md_mod] Apr 9 21:25:48 Tower kernel: unraidd+0x12d7/0x136e [md_mod] Apr 9 21:25:48 Tower kernel: ? __switch_to_asm+0x35/0x70 Apr 9 21:25:48 Tower kernel: ? __schedule+0x4f7/0x548 Apr 9 21:25:48 Tower kernel: ? md_thread+0xee/0x115 [md_mod] Apr 9 21:25:48 Tower kernel: md_thread+0xee/0x115 [md_mod] Apr 9 21:25:48 Tower kernel: ? wait_woken+0x6a/0x6a Apr 9 21:25:48 Tower kernel: ? md_open+0x2c/0x2c [md_mod] Apr 9 21:25:48 Tower kernel: kthread+0x10c/0x114 Apr 9 21:25:48 Tower kernel: ? kthread_park+0x89/0x89 Apr 9 21:25:48 Tower kernel: ret_from_fork+0x22/0x40 Apr 9 21:26:33 Tower root: Status of all loop devices Apr 9 21:26:33 Tower root: /dev/loop1: [2049]:4 (/boot/bzfirmware) Apr 9 21:26:33 Tower root: /dev/loop2: [0037]:260 (/mnt/cache/system/docker/docker.img) Apr 9 21:26:33 Tower root: /dev/loop0: [2049]:3 (/boot/bzmodules) Apr 9 21:26:33 Tower root: Active pids left on /mnt/* Apr 9 21:26:33 Tower root: USER PID ACCESS COMMAND Apr 9 21:26:33 Tower root: /mnt/cache: root kernel mount /mnt/cache Apr 9 21:26:33 Tower root: /mnt/disk1: root kernel mount /mnt/disk1 Apr 9 21:26:33 Tower root: /mnt/disk10: root kernel mount /mnt/disk10 Apr 9 21:26:33 Tower root: /mnt/disk2: root kernel mount /mnt/disk2 Apr 9 21:26:33 Tower root: /mnt/disk3: root kernel mount /mnt/disk3 Apr 9 21:26:33 Tower root: /mnt/disk4: root kernel mount /mnt/disk4 Apr 9 21:26:33 Tower root: /mnt/disk5: root kernel mount /mnt/disk5 Apr 9 21:26:33 Tower root: /mnt/disk6: root kernel mount /mnt/disk6 Apr 9 21:26:33 Tower root: /mnt/disk7: root kernel mount /mnt/disk7 Apr 9 21:26:33 Tower root: /mnt/disk8: root kernel mount /mnt/disk8 Apr 9 21:26:33 Tower root: /mnt/disk9: root kernel mount /mnt/disk9 Apr 9 21:26:33 Tower root: /mnt/user: root kernel mount /mnt/user Apr 9 21:26:33 Tower root: /mnt/user0: root kernel mount /mnt/user0 Apr 9 21:26:33 Tower root: Active pids left on /dev/md* Apr 9 21:26:33 Tower root: USER PID ACCESS COMMAND Apr 9 21:26:33 Tower root: /dev/md1: root kernel mount /mnt/disk1 Apr 9 21:26:33 Tower root: /dev/md10: root kernel mount /mnt/disk10 Apr 9 21:26:33 Tower root: /dev/md2: root kernel mount /mnt/disk2 Apr 9 21:26:33 Tower root: /dev/md3: root kernel mount /mnt/disk3 Apr 9 21:26:33 Tower root: /dev/md4: root kernel mount /mnt/disk4 Apr 9 21:26:33 Tower root: /dev/md5: root kernel mount /mnt/disk5 Apr 9 21:26:33 Tower root: /dev/md6: root kernel mount /mnt/disk6 Apr 9 21:26:33 Tower root: /dev/md7: root kernel mount /mnt/disk7 Apr 9 21:26:33 Tower root: /dev/md8: root kernel mount /mnt/disk8 Apr 9 21:26:33 Tower root: /dev/md9: root kernel mount /mnt/disk9 Apr 9 21:26:33 Tower root: Generating diagnostics... Apr 9 21:26:39 Tower kernel: BUG: unable to handle kernel paging request at 00000000000096d1 Apr 9 21:26:39 Tower kernel: PGD fdf792067 P4D fdf792067 PUD fdf178067 PMD 0 Apr 9 21:26:39 Tower kernel: Oops: 0000 [#2] SMP NOPTI Apr 9 21:26:39 Tower kernel: CPU: 5 PID: 207 Comm: kworker/u256:6 Tainted: G D O 4.19.107-Unraid #1 Apr 9 21:26:39 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019 Apr 9 21:26:39 Tower kernel: Workqueue: events_power_efficient gc_worker Apr 9 21:26:39 Tower kernel: RIP: 0010:gc_worker+0x8c/0x270 Apr 9 21:26:39 Tower kernel: Code: 93 00 48 8b 15 e4 9a 93 00 3b 05 c2 9a 93 00 75 dd 39 cd 72 02 31 ed 89 e8 48 8d 04 c2 4c 8b 30 41 f6 c6 01 0f 85 4a 01 00 00 <41> 0f b6 46 37 49 c7 c0 f0 ff ff ff 41 ff c5 48 6b c0 38 49 29 c0 Apr 9 21:26:39 Tower kernel: RSP: 0018:ffffc90006ecbe60 EFLAGS: 00010246 Apr 9 21:26:39 Tower kernel: RAX: ffff889031125bb0 RBX: 0000000000000000 RCX: 0000000000010000 Apr 9 21:26:39 Tower kernel: RDX: ffff889031100000 RSI: 0000000000000175 RDI: ffffffff822aa760 Apr 9 21:26:39 Tower kernel: RBP: 0000000000004b76 R08: ffffffffffffffb8 R09: 0000746e65696369 Apr 9 21:26:39 Tower kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: ffffffff822aa760 Apr 9 21:26:39 Tower kernel: R13: 0000000000000001 R14: 000000000000969a R15: ffff888fe4180000 Apr 9 21:26:39 Tower kernel: FS: 0000000000000000(0000) GS:ffff88903cf40000(0000) knlGS:0000000000000000 Apr 9 21:26:39 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 9 21:26:39 Tower kernel: CR2: 00000000000096d1 CR3: 0000001035c16000 CR4: 00000000003406e0 Apr 9 21:26:39 Tower kernel: Call Trace: Apr 9 21:26:39 Tower kernel: process_one_work+0x16e/0x24f Apr 9 21:26:39 Tower kernel: worker_thread+0x1e2/0x2b8 Apr 9 21:26:39 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Apr 9 21:26:39 Tower kernel: kthread+0x10c/0x114 Apr 9 21:26:39 Tower kernel: ? kthread_park+0x89/0x89 Apr 9 21:26:39 Tower kernel: ret_from_fork+0x22/0x40 Apr 9 21:26:39 Tower kernel: Modules linked in: ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod bonding igb(O) edac_mce_amd kvm_amd kvm btusb btrtl btbcm btintel bluetooth crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 i2c_piix4 crypto_simd wmi_bmof mxm_wmi i2c_core k10temp mpt3sas ecdh_generic cryptd glue_helper raid_class ccp scsi_transport_sas nvme ahci nvme_core libahci wmi pcc_cpufreq button acpi_cpufreq [last unloaded: igb] Apr 9 21:26:39 Tower kernel: CR2: 00000000000096d1 Apr 9 21:26:39 Tower kernel: ---[ end trace 0d847ac0fcfecec6 ]--- tower-diagnostics-20200409-1959.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.