Hi guys,
almost 2 weeks now with 6.11.5 and I got a couple of freezes/crashes of one of my servers. I skipped the first couple 6.11.xx releases and directly upgraded from 6.10.3 to 6.11.5. So far so good. After 2-3 days I found my main server unaccessible in the morning. Main VM with GPU passthrough, idle over night, didn't show any output in the morning and webui also wasn't reachable at this point. SSH access also not possible.
Force restart the server and everything worked as usual for another couple days until it crashed again. No access. I than started to log the syslog to my other server and let it run. Now 5 days later I found my main VM again not accessible. The webui was reachable, almost half of the isolated cores from the VM maxed to 100%
Restarting the VM didn't work and I had to force shutdown it. Starting it up again brought the following error:
Looks like the GPU wasn't reset correctly and I had to restart the whole server again. Navigation in the webui also seemed a bit slow and unreliable. I didn't pulled the diagnostics at this point. Looks like something crashed yesterday evening. Syslog showed the following:
Dec 8 19:15:36 UNRAID emhttpd: spinning down /dev/sdd Dec 8 19:15:36 UNRAID emhttpd: spinning down /dev/sdf Dec 8 19:15:36 UNRAID emhttpd: spinning down /dev/sdc Dec 8 19:27:12 UNRAID emhttpd: spinning down /dev/sdb Dec 8 19:37:11 UNRAID emhttpd: read SMART /dev/sdb Dec 8 19:52:51 UNRAID kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) Dec 8 19:52:51 UNRAID kernel: BUG: unable to handle page fault for address: ffff8885fb440000 Dec 8 19:52:51 UNRAID kernel: #PF: supervisor instruction fetch in kernel mode Dec 8 19:52:51 UNRAID kernel: #PF: error_code(0x0011) - permissions violation Dec 8 19:52:51 UNRAID kernel: PGD 2a01067 P4D 2a01067 PUD 63c04a063 PMD 80000005fb4001e3 Dec 8 19:52:51 UNRAID kernel: Oops: 0011 [#1] PREEMPT SMP NOPTI Dec 8 19:52:51 UNRAID kernel: CPU: 17 PID: 60526 Comm: CPU 10/KVM Not tainted 5.19.17-Unraid #2 Dec 8 19:52:51 UNRAID kernel: Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS XTREME/TRX40 AORUS XTREME, BIOS F4d 03/05/2020 Dec 8 19:52:51 UNRAID kernel: RIP: 0010:0xffff8885fb440000 Dec 8 19:52:51 UNRAID kernel: Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 10 31 04 00 c9 ff ff 00 00 00 00 00 00 00 00 c0 02 a8 07 81 88 Dec 8 19:52:51 UNRAID kernel: RSP: 0018:ffffc90004397d20 EFLAGS: 00010246 Dec 8 19:52:51 UNRAID kernel: RAX: 0000000000000000 RBX: ffff8885fb440000 RCX: 0000000000000001 Dec 8 19:52:51 UNRAID kernel: RDX: 0000000000000800 RSI: 0000000000000000 RDI: ffff8888e940a300 Dec 8 19:52:51 UNRAID kernel: RBP: ffff888107a80000 R08: ffff88817aa16488 R09: 0000000000000000 Dec 8 19:52:51 UNRAID kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 Dec 8 19:52:51 UNRAID kernel: R13: ffff8885fb440070 R14: 00017bcf3ea92684 R15: 0000000000000000 Dec 8 19:52:51 UNRAID kernel: FS: 0000148f763ff6c0(0000) GS:ffff88902d440000(0000) knlGS:0000000000000000 Dec 8 19:52:51 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 8 19:52:51 UNRAID kernel: CR2: ffff8885fb440000 CR3: 0000000967ea0000 CR4: 0000000000350ee0 Dec 8 19:52:51 UNRAID kernel: Call Trace: Dec 8 19:52:51 UNRAID kernel: <TASK> Dec 8 19:52:51 UNRAID kernel: ? kvm_arch_vcpu_runnable+0xce/0x149 [kvm] Dec 8 19:52:51 UNRAID kernel: ? kvm_vcpu_check_block+0x26/0x8b [kvm] Dec 8 19:52:51 UNRAID kernel: ? kvm_vcpu_block+0x72/0xcb [kvm] Dec 8 19:52:51 UNRAID kernel: ? kvm_vcpu_halt+0x95/0x23d [kvm] Dec 8 19:52:51 UNRAID kernel: ? kvm_arch_vcpu_ioctl_run+0x12f3/0x1506 [kvm] Dec 8 19:52:51 UNRAID kernel: ? pollwake+0x61/0x7f Dec 8 19:52:51 UNRAID kernel: ? wake_up_q+0x44/0x44 Dec 8 19:52:51 UNRAID kernel: ? __wake_up_common+0xae/0x11c Dec 8 19:52:51 UNRAID kernel: ? kvm_vcpu_ioctl+0x192/0x5a4 [kvm] Dec 8 19:52:51 UNRAID kernel: ? wake_up_q+0x44/0x44 Dec 8 19:52:51 UNRAID kernel: ? __seccomp_filter+0x89/0x313 Dec 8 19:52:51 UNRAID kernel: ? vfs_ioctl+0x1e/0x2f Dec 8 19:52:51 UNRAID kernel: ? __do_sys_ioctl+0x52/0x78 Dec 8 19:52:51 UNRAID kernel: ? do_syscall_64+0x6b/0x81 Dec 8 19:52:51 UNRAID kernel: ? entry_SYSCALL_64_after_hwframe+0x63/0xcd Dec 8 19:52:51 UNRAID kernel: </TASK> Dec 8 19:52:51 UNRAID kernel: Modules linked in: nfsv3 nfs cmac cifs asn1_decoder cifs_arc4 cifs_md4 dns_resolver dm_mod dax xt_CHECKSUM ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc ixgbe xfrm_algo mdio btusb btrtl btbcm gigabyte_wmi wmi_bmof mxm_wmi btintel bluetooth edac_mce_amd edac_core kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl ecdh_generic ecc corsair_psu ahci libahci ccp nvme i2c_piix4 input_leds led_class nvme_core joydev i2c_core k10temp thermal wmi button acpi_cpufreq unix [last unloaded: xfrm_algo] Dec 8 19:52:51 UNRAID kernel: CR2: ffff8885fb440000 Dec 8 19:52:51 UNRAID kernel: ---[ end trace 0000000000000000 ]--- Dec 8 19:52:51 UNRAID kernel: RIP: 0010:0xffff8885fb440000 Dec 8 19:52:51 UNRAID kernel: Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 10 31 04 00 c9 ff ff 00 00 00 00 00 00 00 00 c0 02 a8 07 81 88 Dec 8 19:52:51 UNRAID kernel: RSP: 0018:ffffc90004397d20 EFLAGS: 00010246 Dec 8 19:52:51 UNRAID kernel: RAX: 0000000000000000 RBX: ffff8885fb440000 RCX: 0000000000000001 Dec 8 19:52:51 UNRAID kernel: RDX: 0000000000000800 RSI: 0000000000000000 RDI: ffff8888e940a300 Dec 8 19:52:51 UNRAID kernel: RBP: ffff888107a80000 R08: ffff88817aa16488 R09: 0000000000000000 Dec 8 19:52:51 UNRAID kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 Dec 8 19:52:51 UNRAID kernel: R13: ffff8885fb440070 R14: 00017bcf3ea92684 R15: 0000000000000000 Dec 8 19:52:51 UNRAID kernel: FS: 0000148f763ff6c0(0000) GS:ffff88902d440000(0000) knlGS:0000000000000000 Dec 8 19:52:51 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 8 19:52:51 UNRAID kernel: CR2: ffff8885fb440000 CR3: 0000000967ea0000 CR4: 0000000000350ee0 Dec 8 20:07:12 UNRAID emhttpd: spinning down /dev/sdb Dec 9 01:40:11 UNRAID crond[2357]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Dec 9 02:40:22 UNRAID emhttpd: read SMART /dev/sdd Dec 9 02:40:22 UNRAID emhttpd: read SMART /dev/sdf Dec 9 02:40:22 UNRAID emhttpd: read SMART /dev/sdc
Again, server restart, VM started fine and now for the first time server crashed while using the VM, only serching the web for this issue. During earlier crashes I wasn't using the VM. And again no access at all. This time syslog showed the following:
Dec 9 11:05:57 UNRAID emhttpd: spinning down /dev/sde Dec 9 11:07:08 UNRAID emhttpd: spinning down /dev/nvme0n1 Dec 9 11:07:08 UNRAID emhttpd: sdspin /dev/nvme0n1 down: 25 Dec 9 11:09:11 UNRAID emhttpd: spinning down /dev/sdb Dec 9 11:10:50 UNRAID webGUI: Successful login user root from 10.0.0.7 Dec 9 11:11:43 UNRAID emhttpd: read SMART /dev/sdb Dec 9 11:17:52 UNRAID kernel: general protection fault, probably for non-canonical address 0x65894c085589e79d: 0000 [#1] PREEMPT SMP NOPTI Dec 9 11:17:52 UNRAID kernel: CPU: 42 PID: 32551 Comm: CPU 13/KVM Not tainted 5.19.17-Unraid #2 Dec 9 11:17:52 UNRAID kernel: Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS XTREME/TRX40 AORUS XTREME, BIOS F4d 03/05/2020 Dec 9 11:17:52 UNRAID kernel: RIP: 0010:se_update_runnable+0xc/0x1b Dec 9 11:17:52 UNRAID kernel: Code: 14 fd e0 6a 16 82 8b 04 02 e9 95 80 b6 00 66 90 31 c0 e9 8c 80 b6 00 b0 01 e9 85 80 b6 00 48 8b 87 80 00 00 00 48 85 c0 74 0a <8b> 40 14 48 89 87 88 00 00 00 e9 6a 80 b6 00 48 63 ff 48 c7 c0 00 Dec 9 11:17:52 UNRAID kernel: RSP: 0018:ffffc900014efc58 EFLAGS: 00010002 Dec 9 11:17:52 UNRAID kernel: RAX: 65894c085589e789 RBX: ffff8881ac251c00 RCX: 00000000000000a1 Dec 9 11:17:52 UNRAID kernel: RDX: 0000000000000000 RSI: 000000000000000c RDI: ffffffff8109fc2b Dec 9 11:17:52 UNRAID kernel: RBP: ffffffff8109fc2b R08: ffff88810ab53f80 R09: 0000000000000095 Dec 9 11:17:52 UNRAID kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810ab53f00 Dec 9 11:17:52 UNRAID kernel: R13: 0000000000000009 R14: 0000000000000009 R15: 0000000000000001 Dec 9 11:17:52 UNRAID kernel: FS: 00001543ecdff6c0(0000) GS:ffff88902da80000(0000) knlGS:0000000000000000 Dec 9 11:17:52 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 11:17:52 UNRAID kernel: CR2: 00000253018bf000 CR3: 00000001ade8a000 CR4: 0000000000350ee0 Dec 9 11:17:52 UNRAID kernel: Call Trace: Dec 9 11:17:52 UNRAID kernel: <TASK> Dec 9 11:17:52 UNRAID kernel: dequeue_entity+0x35/0x215 Dec 9 11:17:52 UNRAID kernel: dequeue_task_fair+0x91/0x282 Dec 9 11:17:52 UNRAID kernel: __schedule+0x15a/0x5f6 Dec 9 11:17:52 UNRAID kernel: ? kvm_apic_has_interrupt+0x37/0x7a [kvm] Dec 9 11:17:52 UNRAID kernel: schedule+0x8e/0xc3 Dec 9 11:17:52 UNRAID kernel: kvm_vcpu_block+0x7b/0xcb [kvm] Dec 9 11:17:52 UNRAID kernel: kvm_vcpu_halt+0x95/0x23d [kvm] Dec 9 11:17:52 UNRAID kernel: kvm_arch_vcpu_ioctl_run+0x12f3/0x1506 [kvm] Dec 9 11:17:52 UNRAID kernel: ? pollwake+0x61/0x7f Dec 9 11:17:52 UNRAID kernel: ? wake_up_q+0x44/0x44 Dec 9 11:17:52 UNRAID kernel: ? __wake_up_common+0xae/0x11c Dec 9 11:17:52 UNRAID kernel: kvm_vcpu_ioctl+0x192/0x5a4 [kvm] Dec 9 11:17:52 UNRAID kernel: ? wake_up_q+0x44/0x44 Dec 9 11:17:52 UNRAID kernel: ? __seccomp_filter+0x89/0x313 Dec 9 11:17:52 UNRAID kernel: vfs_ioctl+0x1e/0x2f Dec 9 11:17:52 UNRAID kernel: __do_sys_ioctl+0x52/0x78 Dec 9 11:17:52 UNRAID kernel: do_syscall_64+0x6b/0x81 Dec 9 11:17:52 UNRAID kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd Dec 9 11:17:52 UNRAID kernel: RIP: 0033:0x1547f9ed8d38 Dec 9 11:17:52 UNRAID kernel: Code: 00 00 48 8d 44 24 08 48 89 54 24 e0 48 89 44 24 c0 48 8d 44 24 d0 48 89 44 24 c8 b8 10 00 00 00 c7 44 24 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 07 89 d0 c3 0f 1f 40 00 48 8b 15 91 d0 0d Dec 9 11:17:52 UNRAID kernel: RSP: 002b:00001543ecdfdc48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Dec 9 11:17:52 UNRAID kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00001547f9ed8d38 Dec 9 11:17:52 UNRAID kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000029 Dec 9 11:17:52 UNRAID kernel: RBP: 00001543f7a29f00 R08: 000055faab99b9a0 R09: 0000000000000000 Dec 9 11:17:52 UNRAID kernel: R10: 00007ffdf47e9080 R11: 0000000000000246 R12: 0000000000000000 Dec 9 11:17:52 UNRAID kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Dec 9 11:17:52 UNRAID kernel: </TASK> Dec 9 11:17:52 UNRAID kernel: Modules linked in: dm_mod dax nfsv3 nfs ip6t_REJECT nf_reject_ipv6 xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat xt_nat xt_tcpudp iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc ixgbe xfrm_algo mdio btusb btrtl btbcm btintel gigabyte_wmi wmi_bmof mxm_wmi bluetooth edac_mce_amd edac_core kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl ecdh_generic ecc corsair_psu ahci libahci ccp nvme i2c_piix4 input_leds led_class joydev nvme_core i2c_core k10temp thermal wmi button acpi_cpufreq unix [last unloaded: xfrm_algo] Dec 9 11:17:52 UNRAID kernel: ---[ end trace 0000000000000000 ]--- Dec 9 11:17:52 UNRAID kernel: RIP: 0010:se_update_runnable+0xc/0x1b Dec 9 11:17:52 UNRAID kernel: Code: 14 fd e0 6a 16 82 8b 04 02 e9 95 80 b6 00 66 90 31 c0 e9 8c 80 b6 00 b0 01 e9 85 80 b6 00 48 8b 87 80 00 00 00 48 85 c0 74 0a <8b> 40 14 48 89 87 88 00 00 00 e9 6a 80 b6 00 48 63 ff 48 c7 c0 00 Dec 9 11:17:52 UNRAID kernel: RSP: 0018:ffffc900014efc58 EFLAGS: 00010002 Dec 9 11:17:52 UNRAID kernel: RAX: 65894c085589e789 RBX: ffff8881ac251c00 RCX: 00000000000000a1 Dec 9 11:17:52 UNRAID kernel: RDX: 0000000000000000 RSI: 000000000000000c RDI: ffffffff8109fc2b Dec 9 11:17:52 UNRAID kernel: RBP: ffffffff8109fc2b R08: ffff88810ab53f80 R09: 0000000000000095 Dec 9 11:17:52 UNRAID kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810ab53f00 Dec 9 11:17:52 UNRAID kernel: R13: 0000000000000009 R14: 0000000000000009 R15: 0000000000000001 Dec 9 11:17:52 UNRAID kernel: FS: 00001543ecdff6c0(0000) GS:ffff88902da80000(0000) knlGS:0000000000000000 Dec 9 11:17:52 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 11:17:52 UNRAID kernel: CR2: 00000253018bf000 CR3: 00000001ade8a000 CR4: 0000000000350ee0 Dec 9 11:17:52 UNRAID kernel: note: CPU 13/KVM[32551] exited with preempt_count 2 Dec 9 11:18:26 UNRAID webGUI: Successful login user root from 10.0.10.107 Dec 9 11:23:58 UNRAID unassigned.devices: Successfully mounted 'sde1' on '/mnt/disks/VMs_backup_hdd'. Dec 9 11:23:58 UNRAID rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ] Dec 9 11:23:58 UNRAID unassigned.devices: Adding SMB share 'VMs_backup_hdd'. Dec 9 11:23:58 UNRAID webGUI: Successful login user root from 10.0.10.107 Dec 9 11:23:59 UNRAID emhttpd: /usr/local/emhttp/plugins/user.scripts/backgroundScript.sh "/tmp/user.scripts/tmpScripts/icon sync/script" >/dev/null 2>&1/usr/local/emhttp/plugins/user.scripts/backgroundScript.sh "/tmp/user.scripts/tmpScripts/ping Diskstation on boot/script" >/dev/null 2>&1 Dec 9 11:23:59 UNRAID emhttpd: Starting services... Dec 9 11:23:59 UNRAID emhttpd: shcmd (50): /etc/rc.d/rc.samba restart Dec 9 11:23:59 UNRAID wsdd2[8800]: 'Terminated' signal received. Dec 9 11:23:59 UNRAID wsdd2[8800]: terminating. Dec 9 11:24:01 UNRAID root: Starting Samba: /usr/sbin/smbd -D Dec 9 11:24:01 UNRAID root: /usr/sbin/nmbd -D Dec 9 11:24:01 UNRAID root: /usr/sbin/wsdd2 -d
I found a couple reports from users using some torrent dockers with some similar freezes which I can kinda exclude. The only dockers I run on startup are duplicati and binhex-urbackup which basically do nothing when the server crashes. Scheduled backup jobs don't overlap with the server crashing. At least not with the last 2 crashes today. Such crashes and freezes never happened before to the server. Nothing changed software or hardware wise. It all started with the upgrade from 6.10.3 to 6.11.5.
Maybe someone can have a look into the syslog and the diagnostics and can point me in the right direction to fix this.
Thanks
Recommended Comments
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.