February 8, 20251 yr New unraid build migrated from older system. I have had an intermittent issue where the server disappears and becomes unresponsive to everything (web UI, shares, dockers - nothing accessible). This happened late last night and I was hoping someone could look at the syslog and weight in? Below is a snippet where I believe the server went quiet at 21:42:38, then the next line is when I rebooted it. Feb 7 19:37:38 Tower kernel: Console: switching to colour dummy device 80x25 Feb 7 19:37:38 Tower acpid: input device has been disconnected, fd 11 Feb 7 19:37:39 Tower kernel: vfio-pci 0000:00:02.0: vgaarb: deactivate vga console Feb 7 19:37:39 Tower kernel: vfio-pci 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem Feb 7 19:37:39 Tower kernel: br0: port 2(vnet2) entered blocking state Feb 7 19:37:39 Tower kernel: br0: port 2(vnet2) entered disabled state Feb 7 19:37:39 Tower kernel: vnet2: entered allmulticast mode Feb 7 19:37:39 Tower kernel: vnet2: entered promiscuous mode Feb 7 19:37:39 Tower kernel: br0: port 2(vnet2) entered blocking state Feb 7 19:37:39 Tower kernel: br0: port 2(vnet2) entered forwarding state Feb 7 19:37:41 Tower kernel: vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x8d92 Feb 7 19:56:56 Tower monitor_nchan: Stop running nchan processes Feb 7 20:01:42 Tower kernel: x86/split lock detection: #AC: CPU 0/KVM/2844839 took a split_lock trap at address: 0xfffff8063ae0e1dd Feb 7 21:42:38 Tower emhttpd: spinning down /dev/sdj Feb 8 10:37:30 Tower rc.rsyslogd: Syslog server daemon... Started. syslog-192.168.1.3.log
February 9, 20251 yr Community Expert Quote Feb 7 19:37:41 Tower kernel: vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x8d92 When the system is on temp stop and disable and vfio binds and confirm your getting the correct device. as it appears you have a vfio issue with your vfio.
February 11, 20251 yr Author I think you may be right. It just locked up again and I've disabled VMs for now. Attached is the log from the last lock which looks even more odd to me...lots of php-fpm messages about pool www or something. Noticed this in there, but seems to run for a bit longer after this... Feb 10 20:07:14 Tower php-fpm[9697]: [WARNING] [pool www] child 3374477 exited on signal 6 (SIGABRT) after 11.017071 seconds from start Feb 10 20:07:15 Tower php-fpm[9697]: [WARNING] [pool www] child 3374550 exited on signal 6 (SIGABRT) after 11.016988 seconds from start Feb 10 20:07:17 Tower kernel: BUG: unable to handle page fault for address: ffff880066690f18 Feb 10 20:07:17 Tower kernel: #PF: supervisor read access in kernel mode Feb 10 20:07:17 Tower kernel: #PF: error_code(0x0000) - not-present page Feb 10 20:07:17 Tower kernel: PGD 0 P4D 0 Feb 10 20:07:17 Tower kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Feb 10 20:07:17 Tower kernel: CPU: 14 PID: 3375745 Comm: lsof Tainted: P D W O 6.6.68-Unraid #1 Feb 10 20:07:17 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790 UD AC/Z790 UD AC, BIOS F9 12/14/2023 Feb 10 20:07:17 Tower kernel: RIP: 0010:__d_lookup+0x34/0x9d Feb 10 20:07:17 Tower kernel: Code: 55 41 54 49 89 f4 55 48 89 fd 53 44 8b 36 44 89 f7 e8 ac cf ff ff 48 89 c3 e8 b2 e3 e2 ff 48 8b 1b 48 83 e3 fe 48 85 db 74 52 <44> 39 73 18 75 47 4c 8d 6b 50 4c 89 ef e8 8d 7e 77 00 48 39 6b 10 Feb 10 20:07:17 Tower kernel: RSP: 0018:ffffc90021a0fd70 EFLAGS: 00010286 Feb 10 20:07:17 Tower kernel: RAX: 0000000000000001 RBX: ffff880066690f00 RCX: 0000000000000009 Feb 10 20:07:17 Tower kernel: RDX: ffff8881cf1b9000 RSI: ffffc90021a0fdd8 RDI: 000000000073d3a9 Feb 10 20:07:17 Tower kernel: RBP: ffff888af8018a80 R08: ffffffff8131d41c R09: ffff8881b26f1000 Feb 10 20:07:17 Tower kernel: R10: 00000000ffff0a00 R11: 0000000000000000 R12: ffffc90021a0fdd8 Feb 10 20:07:17 Tower kernel: R13: ffffffff8131d41c R14: 00000000e7a752cd R15: ffff888af8018a80 Feb 10 20:07:17 Tower kernel: FS: 0000149663067f00(0000) GS:ffff88907f980000(0000) knlGS:0000000000000000 Feb 10 20:07:17 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 10 20:07:17 Tower kernel: CR2: ffff880066690f18 CR3: 0000000afbe18000 CR4: 0000000000750ee0 Feb 10 20:07:17 Tower kernel: PKRU: 55555554 Feb 10 20:07:17 Tower kernel: Call Trace: Feb 10 20:07:17 Tower kernel: <TASK> Feb 10 20:07:17 Tower kernel: ? __die_body+0x1a/0x5c Feb 10 20:07:17 Tower kernel: ? page_fault_oops+0x329/0x376 Feb 10 20:07:17 Tower kernel: ? search_bpf_extables+0x5d/0x68 Feb 10 20:07:17 Tower kernel: ? exc_page_fault+0xf2/0x116 Feb 10 20:07:17 Tower kernel: ? asm_exc_page_fault+0x22/0x30 Feb 10 20:07:17 Tower kernel: ? __pfx_proc_fd_instantiate+0x10/0x10 Feb 10 20:07:17 Tower kernel: ? __pfx_proc_fd_instantiate+0x10/0x10 Feb 10 20:07:17 Tower kernel: ? __d_lookup+0x34/0x9d Feb 10 20:07:17 Tower kernel: ? __pfx_proc_fd_instantiate+0x10/0x10 Feb 10 20:07:17 Tower kernel: d_lookup+0x21/0x39 Feb 10 20:07:17 Tower kernel: proc_fill_cache+0x5e/0x157 Feb 10 20:07:17 Tower kernel: ? __pfx_filldir64+0x10/0x10 Feb 10 20:07:17 Tower kernel: proc_readfd_common+0x178/0x1c9 Feb 10 20:07:17 Tower kernel: ? __pfx_proc_fd_instantiate+0x10/0x10 Feb 10 20:07:17 Tower kernel: iterate_dir+0x75/0x12f Feb 10 20:07:17 Tower kernel: __do_sys_getdents64+0x6b/0xd8 Feb 10 20:07:17 Tower kernel: ? __pfx_filldir64+0x10/0x10 Feb 10 20:07:17 Tower kernel: do_syscall_64+0x57/0x7b Feb 10 20:07:17 Tower kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2 Feb 10 20:07:17 Tower kernel: RIP: 0033:0x1496632d1053 Feb 10 20:07:17 Tower kernel: Code: 00 16 00 00 00 31 c0 eb c6 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 81 6d 11 00 f7 d8 Feb 10 20:07:17 Tower kernel: RSP: 002b:00007ffc43ce9848 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 Feb 10 20:07:17 Tower kernel: RAX: ffffffffffffffda RBX: 00000000004c6bd0 RCX: 00001496632d1053 Feb 10 20:07:17 Tower kernel: RDX: 0000000000008000 RSI: 00000000004c6c00 RDI: 0000000000000006 Feb 10 20:07:17 Tower kernel: RBP: 00000000004c6bd4 R08: 0000000000000000 R09: 0000000000000003 Feb 10 20:07:17 Tower kernel: R10: 00001496633e9200 R11: 0000000000000293 R12: 00000000004c6c00 Feb 10 20:07:17 Tower kernel: R13: ffffffffffffff88 R14: 0000000000000002 R15: 0000149663470000 Feb 10 20:07:17 Tower kernel: </TASK> Feb 10 20:07:17 Tower kernel: Modules linked in: xt_connmark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) spl(O) ntfs3 tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc bonding tls mlx4_en mlx4_core r8169 realtek i915 intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iosf_mbi drm_buddy kvm ttm i2c_algo_bit btusb drm_display_helper btrtl btbcm crct10dif_pclmul btintel crc32_pclmul crc32c_intel ghash_clmulni_intel drm_kms_helper sha512_ssse3 sha256_ssse3 Feb 10 20:07:17 Tower kernel: bluetooth sha1_ssse3 drm aesni_intel crypto_simd cryptd mei_hdcp mei_pxp intel_gtt rapl ecdh_generic input_leds led_class ecc intel_cstate gigabyte_wmi wmi_bmof mpt3sas nvme i2c_i801 mei_me agpgart i2c_smbus intel_uncore nvme_core ahci raid_class mei scsi_transport_sas i2c_core libahci tpm_crb thermal fan tpm_tis video tpm_tis_core tpm wmi backlight acpi_tad acpi_pad button [last unloaded: mlx4_core] Feb 10 20:07:17 Tower kernel: CR2: ffff880066690f18 Feb 10 20:07:17 Tower kernel: ---[ end trace 0000000000000000 ]--- Feb 10 20:07:17 Tower kernel: RIP: 0010:buffer_check_dirty_writeback+0x2b/0x4f Feb 10 20:07:17 Tower kernel: Code: 1f 44 00 00 c6 06 00 c6 02 00 48 8b 07 a8 01 75 02 0f 0b 48 8b 4f 28 48 85 c9 74 2d 48 8b 07 48 d1 e8 88 02 48 89 c8 80 22 01 <48> 8b 38 83 e7 04 74 03 c6 02 01 48 8b 38 83 e7 02 74 03 c6 06 01 Feb 10 20:07:17 Tower kernel: RSP: 0000:ffffc900049c7a80 EFLAGS: 00010217 Feb 10 20:07:17 Tower kernel: RAX: 733d5cd6b20c98b5 RBX: ffffc900049c7c28 RCX: ffff888860c237f8 Feb 10 20:07:17 Tower kernel: RDX: ffffc900049c7aac RSI: ffffc900049c7aab RDI: 0000000000000002 Feb 10 20:07:17 Tower kernel: RBP: ffffc900049c7db0 R08: 0000000000000238 R09: 000000008027001e Feb 10 20:07:17 Tower kernel: R10: 0000000000000246 R11: 0000000000000000 R12: ffffc900049c7ba0 Feb 10 20:07:17 Tower kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffffea00229719c8 Feb 10 20:07:17 Tower kernel: FS: 0000149663067f00(0000) GS:ffff88907f980000(0000) knlGS:0000000000000000 Feb 10 20:07:17 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 10 20:07:17 Tower kernel: CR2: ffff880066690f18 CR3: 0000000afbe18000 CR4: 0000000000750ee0 Feb 10 20:07:17 Tower kernel: PKRU: 55555554 Feb 10 20:07:17 Tower kernel: note: lsof[3375745] exited with irqs disabled Feb 10 20:07:17 Tower php-fpm[9697]: [WARNING] [pool www] child 3374644 exited on signal 6 (SIGABRT) after 11.017078 seconds from start Feb 10 20:07:18 Tower php-fpm[9697]: [WARNING] [pool www] child 3374745 exited on signal 6 (SIGABRT) after 11.017161 seconds from start Feb 10 20:07:19 Tower php-fpm[9697]: [WARNING] [pool www] child 3375163 exited on signal 6 (SIGABRT) after 11.017372 seconds from start syslog-192.168.1.3.log Edited February 11, 20251 yr by onyx00
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.