System frozen / not response after running several days in [6.12.2]


Recommended Posts

Following Screen is captured before mannual reset.

The time perform hard reset is Jul 10 20:51:14, please check log before this time stamp 

Recent logs before system not response is here: syslog.txt

And diagnostic files is here: aio-diagnostics-20230710-2109.zip

Many thanks for any help!

 

 

image.thumb.png.dd8744301113d81c370ebe73b903137a.png

 

 

Repeating kernel message like:

Jul 10 20:47:49 AIO kernel: RSP: 0018:ffffc9000c6bfce0 EFLAGS: 00000202
Jul 10 20:47:49 AIO kernel: RAX: 0000000000700101 RBX: ffff8891d57404d4 RCX: 0000000000000a20
Jul 10 20:47:49 AIO kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8891d57404d4
Jul 10 20:47:49 AIO kernel: RBP: 000000003c4a0000 R08: 00000000ffffffff R09: 0000000000000000
Jul 10 20:47:49 AIO kernel: R10: 0000000000000000 R11: ffff888877d7eff0 R12: 0000000000000004
Jul 10 20:47:49 AIO kernel: R13: 0000000000000000 R14: 000000003c59ffff R15: ffffc9000c6bfd80
Jul 10 20:47:49 AIO kernel: FS:  0000000000000000(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000
Jul 10 20:47:49 AIO kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 10 20:47:49 AIO kernel: CR2: 000014f4013a49dc CR3: 000000000620a004 CR4: 00000000001706e0
Jul 10 20:47:49 AIO kernel: Call Trace:
Jul 10 20:47:49 AIO kernel: <NMI>
Jul 10 20:47:49 AIO kernel: ? nmi_cpu_backtrace+0xd3/0x104
Jul 10 20:47:49 AIO kernel: ? nmi_cpu_backtrace_handler+0xd/0x15
Jul 10 20:47:49 AIO kernel: ? nmi_handle+0x57/0x131
Jul 10 20:47:49 AIO kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf
Jul 10 20:47:49 AIO kernel: ? default_do_nmi+0x66/0x15b
Jul 10 20:47:49 AIO kernel: ? exc_nmi+0xbf/0x130
Jul 10 20:47:49 AIO kernel: ? end_repeat_nmi+0x16/0x67
Jul 10 20:47:49 AIO kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf
Jul 10 20:47:49 AIO kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf
Jul 10 20:47:49 AIO kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf
Jul 10 20:47:49 AIO kernel: </NMI>
Jul 10 20:47:49 AIO kernel: <TASK>
Jul 10 20:47:49 AIO kernel: do_raw_spin_lock+0x14/0x1a
Jul 10 20:47:49 AIO kernel: __clear_extent_bit+0xe6/0x329
Jul 10 20:47:49 AIO kernel: ? preempt_latency_start+0x2b/0x46
Jul 10 20:47:49 AIO kernel: endio_readpage_release_extent+0x81/0xb7
Jul 10 20:47:49 AIO kernel: end_bio_extent_readpage+0x4a5/0x4fe
Jul 10 20:47:49 AIO kernel: ? update_load_avg+0x46/0x398
Jul 10 20:47:49 AIO kernel: ? process_one_work+0x1ab/0x295
Jul 10 20:47:49 AIO kernel: process_one_work+0x1ab/0x295
Jul 10 20:47:49 AIO kernel: worker_thread+0x18b/0x244
Jul 10 20:47:49 AIO kernel: ? rescuer_thread+0x281/0x281
Jul 10 20:47:49 AIO kernel: kthread+0xe7/0xef
Jul 10 20:47:49 AIO kernel: ? kthread_complete_and_exit+0x1b/0x1b
Jul 10 20:47:49 AIO kernel: ret_from_fork+0x22/0x30
Jul 10 20:47:49 AIO kernel: </TASK>
Jul 10 20:48:52 AIO shutdown[7551]: shutting down for system reboot
Jul 10 20:48:54 AIO kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-.... } 8278300 jiffies s: 879449 root: 0x1/.
Jul 10 20:48:54 AIO kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x2/.
Jul 10 20:48:54 AIO kernel: Sending NMI from CPU 16 to CPUs 1:
Jul 10 20:48:54 AIO kernel: NMI backtrace for cpu 1
Jul 10 20:48:54 AIO kernel: CPU: 1 PID: 9828 Comm: kworker/u65:7 Tainted: P      D W  O       6.1.36-Unraid #1
Jul 10 20:48:54 AIO kernel: Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.4 06/30/2020
Jul 10 20:48:54 AIO kernel: Workqueue: btrfs-endio btrfs_end_bio_work
Jul 10 20:48:54 AIO kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x80/0x1cf
Jul 10 20:48:54 AIO kernel: Code: 2b 08 8b 03 0f 92 c2 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 76 0c 0f ba e0 08 72 1e c6 43 01 00 eb 18 85 c0 74 0a 8b 03 <84> c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 32 01 00 00 e8 4a 3f ff

 

 

 

 

 

 

Link to comment
10 hours ago, JorgeB said:

Server is constantly crashing in the syslog, looks more like a hardware issue.

The error log starts here, looks like it is a kernel / driver bug, rather than hardware defect. (also considering the system is runing stable for a while before)

 

Jul 10 18:30:56 AIO kernel: BUG: unable to handle page fault for address: 000000003c740000
Jul 10 18:30:56 AIO kernel: #PF: supervisor write access in kernel mode
Jul 10 18:30:56 AIO kernel: #PF: error_code(0x0002) - not-present page
Jul 10 18:30:56 AIO kernel: PGD 0 P4D 0 
Jul 10 18:30:56 AIO kernel: Oops: 0002 [#1] PREEMPT SMP PTI
Jul 10 18:30:56 AIO kernel: CPU: 19 PID: 19342 Comm: shfs Tainted: P        W  O       6.1.36-Unraid #1
Jul 10 18:30:56 AIO kernel: Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.4 06/30/2020
Jul 10 18:30:56 AIO kernel: RIP: 0010:rb_insert_color+0x105/0x11a
Jul 10 18:30:56 AIO kernel: Code: 74 0a 48 89 f1 48 83 c9 01 48 89 0a 48 89 06 48 8b 50 10 49 89 c0 48 85 d2 48 89 57 08 49 89 78 10 74 0a 48 89 f8 48 83 c8 01 <48> 89 02 31 c9 4c 89 ca 4c 89 c6 e9 07 fc ff ff c3 cc cc cc cc 41

 

Link to comment

another bug happened when try to restart system:

Jul 11 09:29:39 AIO kernel: BUG: unable to handle page fault for address: ffffffffa0671260
Jul 11 09:29:39 AIO kernel: #PF: supervisor read access in kernel mode
Jul 11 09:29:39 AIO kernel: #PF: error_code(0x0000) - not-present page
Jul 11 09:29:39 AIO kernel: PGD 620e067 P4D 620e067 PUD 620f063 PMD 106af4067 PTE 0
Jul 11 09:29:39 AIO kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jul 11 09:29:39 AIO kernel: CPU: 25 PID: 5739 Comm: btrfs Tainted: P        W  O       6.1.36-Unraid #1
Jul 11 09:29:39 AIO kernel: Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.4 06/30/2020
Jul 11 09:29:39 AIO kernel: RIP: 0010:blkdev_get_by_dev+0xbb/0x264
Jul 11 09:29:39 AIO kernel: Code: b8 d8 00 00 00 00 0f 84 75 01 00 00 eb 17 4c 89 f6 48 89 c7 e8 09 f6 ff ff 85 c0 41 89 c5 74 b9 e9 83 01 00 00 49 8b 44 24 48 <48> 8b b8 80 00 00 00 e8 5b cd cf ff 84 c0 0f 84 43 01 00 00 80 bb
Jul 11 09:29:39 AIO kernel: RSP: 0018:ffffc9000823fc80 EFLAGS: 00010282
Jul 11 09:29:39 AIO kernel: RAX: ffffffffa06711e0 RBX: ffff8890d76a0000 RCX: 0000000000000008
Jul 11 09:29:39 AIO kernel: RDX: ffff88811dd11f80 RSI: ffff8890d76a0418 RDI: ffff889085ae79f8
Jul 11 09:29:39 AIO kernel: RBP: 000000004800005d R08: 0000000000000002 R09: 0000000000000009
Jul 11 09:29:39 AIO kernel: R10: 0000000000000001 R11: 0000000000000fe0 R12: ffff889085ae7800
Jul 11 09:29:39 AIO kernel: R13: ffff8890b676e710 R14: ffff8890b676e700 R15: 0000000000000000
Jul 11 09:29:39 AIO kernel: FS:  000014e9aa66ad80(0000) GS:ffff88a03fc40000(0000) knlGS:0000000000000000
Jul 11 09:29:39 AIO kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 11 09:29:39 AIO kernel: CR2: ffffffffa0671260 CR3: 00000010cf556006 CR4: 00000000001706e0
Jul 11 09:29:39 AIO kernel: Call Trace:
Jul 11 09:29:39 AIO kernel: <TASK>
Jul 11 09:29:39 AIO kernel: ? __die_body+0x1a/0x5c
Jul 11 09:29:39 AIO kernel: ? page_fault_oops+0x329/0x376
Jul 11 09:29:39 AIO kernel: ? fixup_exception+0x22/0x24b
Jul 11 09:29:39 AIO kernel: ? exc_page_fault+0xf4/0x11d
Jul 11 09:29:39 AIO kernel: ? asm_exc_page_fault+0x22/0x30
Jul 11 09:29:39 AIO kernel: ? blkdev_get_by_dev+0xbb/0x264
Jul 11 09:29:39 AIO kernel: ? blkdev_close+0x1d/0x1d
Jul 11 09:29:39 AIO kernel: blkdev_open+0x58/0x90
Jul 11 09:29:39 AIO kernel: do_dentry_open+0x195/0x304
Jul 11 09:29:39 AIO kernel: path_openat+0x8f4/0xa4d
Jul 11 09:29:39 AIO kernel: do_filp_open+0x55/0xb8
Jul 11 09:29:39 AIO kernel: ? getname_flags+0x29/0x152
Jul 11 09:29:39 AIO kernel: ? kmem_cache_alloc+0x122/0x14d
Jul 11 09:29:39 AIO kernel: ? _raw_spin_unlock+0x14/0x29
Jul 11 09:29:39 AIO kernel: do_sys_openat2+0x6c/0xd9
Jul 11 09:29:39 AIO kernel: do_sys_open+0x3a/0x5a
Jul 11 09:29:39 AIO kernel: do_syscall_64+0x6b/0x81
Jul 11 09:29:39 AIO kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jul 11 09:29:39 AIO kernel: RIP: 0033:0x14e9aa7748a1
Jul 11 09:29:39 AIO kernel: Code: 75 37 89 f0 25 00 00 41 00 3d 00 00 41 00 74 29 80 3d 4a cd 0e 00 00 74 4d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 77 48 83 c4 68 5b 5d c3 48 8d 84 24 80 00 00
Jul 11 09:29:39 AIO kernel: RSP: 002b:00007ffd32858f70 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
Jul 11 09:29:39 AIO kernel: RAX: ffffffffffffffda RBX: 0000000000080800 RCX: 000014e9aa7748a1
Jul 11 09:29:39 AIO kernel: RDX: 0000000000080800 RSI: 00000000004fc050 RDI: 00000000ffffff9c
Jul 11 09:29:39 AIO kernel: RBP: 00000000004fc050 R08: 0000000000000007 R09: 00000000004fbf80
Jul 11 09:29:39 AIO kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000064acb07f
Jul 11 09:29:39 AIO kernel: R13: 0000000064acb066 R14: 000014e9aa9e4b44 R15: 0000000000000019
Jul 11 09:29:39 AIO kernel: </TASK>
Jul 11 09:29:39 AIO kernel: Modules linked in: ipmi_devintf xt_comment macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag nfsd auth_rpcgss oid_registry lockd grace sunrpc ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc bonding tls ixgbe xfrm_algo mdio igb x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200 sha512_ssse3 drm_shmem_helper aesni_intel drm_kms_helper crypto_simd ipmi_ssif cryptd i2c_i801 mxm_wmi drm rapl mpt3sas isci nvme intel_cstate i2c_smbus mei_me backlight i2c_algo_bit libsas acpi_ipmi ahci syscopyarea input_leds raid_class sysfillrect sysimgblt intel_uncore libahci joydev fb_sys_fops led_class mei nvme_core
Jul 11 09:29:39 AIO kernel: i2c_core scsi_transport_sas ipmi_si wmi button unix [last unloaded: md_mod]
Jul 11 09:29:39 AIO kernel: CR2: ffffffffa0671260
Jul 11 09:29:39 AIO kernel: ---[ end trace 0000000000000000 ]---
Jul 11 09:29:39 AIO kernel: RIP: 0010:blkdev_get_by_dev+0xbb/0x264
Jul 11 09:29:39 AIO kernel: Code: b8 d8 00 00 00 00 0f 84 75 01 00 00 eb 17 4c 89 f6 48 89 c7 e8 09 f6 ff ff 85 c0 41 89 c5 74 b9 e9 83 01 00 00 49 8b 44 24 48 <48> 8b b8 80 00 00 00 e8 5b cd cf ff 84 c0 0f 84 43 01 00 00 80 bb
Jul 11 09:29:39 AIO kernel: RSP: 0018:ffffc9000823fc80 EFLAGS: 00010282
Jul 11 09:29:39 AIO kernel: RAX: ffffffffa06711e0 RBX: ffff8890d76a0000 RCX: 0000000000000008
Jul 11 09:29:39 AIO kernel: RDX: ffff88811dd11f80 RSI: ffff8890d76a0418 RDI: ffff889085ae79f8
Jul 11 09:29:39 AIO kernel: RBP: 000000004800005d R08: 0000000000000002 R09: 0000000000000009
Jul 11 09:29:39 AIO kernel: R10: 0000000000000001 R11: 0000000000000fe0 R12: ffff889085ae7800
Jul 11 09:29:39 AIO kernel: R13: ffff8890b676e710 R14: ffff8890b676e700 R15: 0000000000000000
Jul 11 09:29:39 AIO kernel: FS:  000014e9aa66ad80(0000) GS:ffff88a03fc40000(0000) knlGS:0000000000000000
Jul 11 09:29:39 AIO kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 11 09:29:39 AIO kernel: CR2: ffffffffa0671260 CR3: 00000010cf556006 CR4: 00000000001706e0
Jul 11 09:29:39 AIO kernel: note: btrfs[5739] exited with irqs disabled
Jul 11 09:31:04 AIO root: ACPI action up is not defined
Jul 11 09:38:57 AIO root: ACPI action up is not defined
Jul 11 09:39:08 AIO shutdown[6862]: shutting down for system reboot
Jul 11 09:42:02 AIO kernel: Linux version 6.1.36-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.40-slack151) #1 SMP PREEMPT_DYNAMIC Wed Jun 28 07:51:54 PDT 2023

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.