meep

November 8, 2023

Hi, sorry for the late reply. I’m not sure why I didn’t receive a thread update notification.

Yes, I still have the card, and managed to get it working reliably in my system by shuffling around some cards / slots.

However, it really is a nice to have as I’ve lots of other USB controllers I’m not using. Would consider sale.

September 26, 2023

So after a full weekend, and re-enabling all the various hardware and apps, the server has been stable. It's reasonable to say that disabling c-states was the fix for the issues I was encountering.

These have been enabled since I built the server in August 2019, and the system has worked perfectly right through to 6.11.5. Only when I upgraded to 6.12 did regular crashes start, and these seemed to escalate in frequency with every point release I installed.

At least now, thanks to the exhaustive testing, I've identified and removed a few CPU intensive plug ins I didn't really need, and identified a faulty SSD, so there's that at least.

September 22, 2023

2 minutes ago, JorgeB said:

If it works you can try this instead, if there's a BIOS option for it, so c-states can still be enabled and it will use less power.

I believe power supply idle control was already set to normal, but I'll double check when I next reboot .

I want to have the server running for 24 hours without issue before restarting and adding back some if my expansion cards.

September 22, 2023

So we might have a winner here.....

It's an oldie, but a goldie. Working through any and all possible troubleshooting, I disabled c-states in my bios and have had my first error free night in several weeks

Not fully confirmed yet, and super confusing why this would impact my system now and not several years ago when this was actually a thing.

Will report back and hopefully close out the thread if this turns out to be the actual solution.

September 21, 2023

36 minutes ago, JorgeB said:

I would suggest first rolling back to v6.11.5 to see if the issue persists, if not it could be some kernel issue that manifests itself with your hardware.

That's the plan. Though I see quite a few similar reports here but also on Reddit. I think there's something fundamentally problematic in 6.12.x

September 21, 2023

I asked CHAT GPT for an analysis. Here's what I got back.....

It seems like you're experiencing a recurring issue on your UNRAID server, and I'll walk you through the analysis of the provided logs.

The logs indicate that there is a problem with your system's kernel, and it's encountering a "general protection fault." This is a type of error that occurs when the kernel detects that a process is attempting to access a part of memory it's not allowed to, often due to a software bug.

Let's break down the information in the log:

Timestamp and Process Information:

The log entry starts with a timestamp indicating when the issue occurred.

It mentions the process name (disk_load) and the PID (Process ID) of the process encountering the issue.

Hardware Information:

It provides information about the hardware, including the motherboard model and BIOS version. This is important because hardware-related issues can sometimes lead to kernel faults.

Stack Trace:

The RIP line indicates the Instruction Pointer at which the issue occurred. In this case, it points to the list_del function in the kernel code.

The Code section shows the assembly code at the RIP location.

The RSP register contains the Stack Pointer at the time of the fault.

Registers like RAX, RBX, RCX, etc., show the values stored in those registers when the fault happened.

Call Trace:

This section provides a call stack trace, showing the sequence of function calls that led to the issue. It starts from the top-level function (list_del) and goes deeper into the kernel functions.

Modules Linked In:

This lists the kernel modules that are currently loaded. Kernel modules are pieces of code that can be loaded into the kernel to add functionality or drivers.

Additional Information:

The logs also indicate that this issue is affecting CPU 21.

There are mentions of "rcu_preempt detected stalls." RCU (Read-Copy-Update) is a synchronization mechanism in the Linux kernel. These stalls might be related to the issue you're experiencing.

Based on this information, it's clear that the issue is occurring in the kernel code, specifically in the list_del function, which is part of a linked list manipulation. The most likely cause of this issue is a software bug in the kernel or a kernel module. Kernel bugs can be very complex to diagnose and often require the expertise of kernel developers.

Here are some steps you can take to further diagnose and potentially resolve the issue:

Check for Updates: Ensure that your UNRAID server is running the latest kernel and kernel modules. Sometimes, updating the kernel can resolve known issues.
Check for Specific Software: Determine if the issue started happening after installing or updating specific software or kernel modules. Try disabling or reverting those changes if possible.
Kernel Debugging: Consider enabling kernel debugging and collecting more detailed logs. Kernel developers can use these logs to identify the exact cause of the issue.
Community Forums: Check UNRAID or Linux kernel community forums for similar issues. Others may have encountered and solved similar problems.
Contact Support: If the issue persists, consider contacting UNRAID support or the support channels for your specific hardware components. They may have specific insights or solutions.
Hardware Testing: In some cases, hardware issues can manifest as kernel faults. Test your hardware components (e.g., RAM, CPU, motherboard) for any problems.

Keep in mind that diagnosing and resolving kernel-level issues can be challenging, and it's essential to take precautions to avoid data loss during troubleshooting. If you're not comfortable with these steps, seeking assistance from experienced professionals or support channels is advisable.

September 21, 2023

@Dimtar I'm on IPVlan (not macvlan)

@SirLupus I haven't tried that, but like you, I need SMB so not really an option

I've now spent DAYS peeling back the onion, removing all cards etc. and adding them back in one at a time. I thought I'd solved it in that I identified a bad SSD (that I wasn't even using) and removed that and seemed to gain some stability. However, overnight last night, it all came tumbling down again (during a parity check).

I'll paste a bit of the log captured below, but it looks like something is tripping up the kernel and then continues to generate a kernel exception every 3 mins exactly. In a previous version of this, I could see these were reporting issues in smartctl, which led me down the path to find the bad drive, but here I have my 3 minute exceptions back, and no smartctl references.

Really stumped. Next thing is move back to 6.11.x, but I cant be staying on that forever (assuming it works). What's up here @unraid ???

Here's the start of the issue overnight. The 3rd one just keeps repeating every 3 mins exactly until the whole system locks up, or at the very least, the GUI freezes out and becomes unusable.

Quote

Sep 21 03:40:17 UNRAID kernel: general protection fault, probably for non-canonical address 0xfff8ea005ad8da10: 0000 [#1] PREEMPT SMP NOPTI Sep 21 03:40:17 UNRAID kernel: CPU: 21 PID: 2685 Comm: disk_load Tainted: P O 6.1.49-Unraid #1

Sep 21 03:40:17 UNRAID kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019

Sep 21 03:40:17 UNRAID kernel: RIP: 0010:list_del+0x7/0x28

Sep 21 03:40:17 UNRAID kernel: Code: 00 75 15 48 c1 eb 09 48 8b 42 08 83 e3 3f 48 0f a3 18 0f 92 c0 0f b6 c0 5b e9 71 6d a0 00 e9 6c 6d a0 00 48 8b 47 08 48 8b 17 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 07 48 83

Sep 21 03:40:17 UNRAID kernel: RSP: 0000:ffffc9002622fae8 EFLAGS: 00010087

Sep 21 03:40:17 UNRAID kernel: RAX: ffff88982d572068 RBX: 0000000000000000 RCX: 0000000000000981

Sep 21 03:40:17 UNRAID kernel: RDX: fff8ea005ad8da08 RSI: 0000000000000000 RDI: ffffea005aaf6cc8

Sep 21 03:40:17 UNRAID kernel: RBP: ffffea005aaf6cc0 R08: ffff88982d572040 R09: ffff88982d572068

Sep 21 03:40:17 UNRAID kernel: R10: ffff88841c371008 R11: ffff88841c37100c R12: 0000000000009fc3

Sep 21 03:40:17 UNRAID kernel: R13: ffff88982d572068 R14: ffff88982d572040 R15: ffff88982d572040

Sep 21 03:40:17 UNRAID kernel: FS: 00001504d0393740(0000) GS:ffff88982d540000(0000) knlGS:0000000000000000

Sep 21 03:40:17 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Sep 21 03:40:17 UNRAID kernel: CR2: 0000000000549e90 CR3: 00000004dc85e000 CR4: 00000000003506e0

Sep 21 03:40:17 UNRAID kernel: Call Trace:

Sep 21 03:40:17 UNRAID kernel: <TASK>

Sep 21 03:40:17 UNRAID kernel: ? __die_body+0x1a/0x5c

Sep 21 03:40:17 UNRAID kernel: ? die_addr+0x38/0x51

Sep 21 03:40:17 UNRAID kernel: ? exc_general_protection+0x30f/0x345

Sep 21 03:40:17 UNRAID kernel: ? asm_exc_general_protection+0x22/0x30

Sep 21 03:40:17 UNRAID kernel: ? list_del+0x7/0x28

Sep 21 03:40:17 UNRAID kernel: __rmqueue_pcplist+0x41/0x468

Sep 21 03:40:17 UNRAID kernel: ? post_alloc_hook+0x13/0x5f

Sep 21 03:40:17 UNRAID kernel: get_page_from_freelist+0x2cf/0x89f

Sep 21 03:40:17 UNRAID kernel: ? leave_mm+0x34/0x34

Sep 21 03:40:17 UNRAID kernel: __alloc_pages+0xfa/0x1e8

Sep 21 03:40:17 UNRAID kernel: __folio_alloc+0x15/0x36

Sep 21 03:40:17 UNRAID kernel: vma_alloc_folio+0x1d6/0x20c

Sep 21 03:40:17 UNRAID kernel: wp_page_copy+0xbb/0x4a3

Sep 21 03:40:17 UNRAID kernel: ? pte_pfn+0x11/0x31 Sep 21 03:40:17 UNRAID kernel: ? vm_normal_page+0x1b/0x9b

Sep 21 03:40:17 UNRAID kernel: __handle_mm_fault+0x71c/0xcf9

Sep 21 03:40:17 UNRAID kernel: ? resched_curr+0x4a/0x65

Sep 21 03:40:17 UNRAID kernel: handle_mm_fault+0x13d/0x20f

Sep 21 03:40:17 UNRAID kernel: do_user_addr_fault+0x2c3/0x48d

Sep 21 03:40:17 UNRAID kernel: exc_page_fault+0xfb/0x11d

Sep 21 03:40:17 UNRAID kernel: asm_exc_page_fault+0x22/0x30

Sep 21 03:40:17 UNRAID kernel: RIP: 0033:0x459dbf

Sep 21 03:40:17 UNRAID kernel: Code: 2b 53 00 41 83 7f 14 04 49 89 07 48 8b 44 24 10 49 c7 47 28 00 00 00 00 48 8b 12 49 c7 47 30 00 00 00 00 49 89 47 20 49 63 c5 <4c> 89 3c c2 41 8b 47 18 0f 84 c3 02 00 00 8b 74 24 0c 44 01 63 10

Sep 21 03:40:17 UNRAID kernel: RSP: 002b:00007ffed17c7430 EFLAGS: 00010293

Sep 21 03:40:17 UNRAID kernel: RAX: 0000000000000000 RBX: 0000000000528e40 RCX: 00000000fffa0000

Sep 21 03:40:17 UNRAID kernel: RDX: 0000000000549e90 RSI: 000000000053c290 RDI: 0000000000562850

Sep 21 03:40:17 UNRAID kernel: RBP: 00007ffed17c74e0 R08: 000000004d2dfb5a R09: 0000000000001000

Sep 21 03:40:17 UNRAID kernel: R10: 000000000053bef0 R11: 000000000053bf20 R12: 0000000000000001

Sep 21 03:40:17 UNRAID kernel: R13: 0000000000000000 R14: 00000000004eea9a R15: 0000000000563c10

Sep 21 03:40:17 UNRAID kernel: </TASK>

Sep 21 03:40:17 UNRAID kernel: Modules linked in: tun xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables macvtap macvlan tap bridge stp llc bonding tls atlantic igb amdgpu gpu_sched drm_buddy radeon edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi video drm_ttm_helper ttm kvm_amd drm_display_helper drm_kms_helper kvm drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi_bmof mxm_wmi sha512_ssse3 backlight aesni_intel mpt3sas crypto_simd i2c_piix4 agpgart nvme cryptd syscopyarea i2c_algo_bit ahci raid_class sysfillrect input_leds sysimgblt rapl led_class nvme_core scsi_transport_sas

Sep 21 03:40:17 UNRAID kernel: fb_sys_fops k10temp ccp i2c_core libahci wmi button acpi_cpufreq unix [last unloaded: atlantic]

Sep 21 03:40:17 UNRAID kernel: ---[ end trace 0000000000000000 ]---

Sep 21 03:40:17 UNRAID kernel: RIP: 0010:list_del+0x7/0x28

Sep 21 03:40:17 UNRAID kernel: Code: 00 75 15 48 c1 eb 09 48 8b 42 08 83 e3 3f 48 0f a3 18 0f 92 c0 0f b6 c0 5b e9 71 6d a0 00 e9 6c 6d a0 00 48 8b 47 08 48 8b 17 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 07 48 83

Sep 21 03:40:17 UNRAID kernel: RSP: 0000:ffffc9002622fae8 EFLAGS: 00010087

Sep 21 03:40:17 UNRAID kernel: RAX: ffff88982d572068 RBX: 0000000000000000 RCX: 0000000000000981

Sep 21 03:40:17 UNRAID kernel: RDX: fff8ea005ad8da08 RSI: 0000000000000000 RDI: ffffea005aaf6cc8

Sep 21 03:40:17 UNRAID kernel: RBP: ffffea005aaf6cc0 R08: ffff88982d572040 R09: ffff88982d572068

Sep 21 03:40:17 UNRAID kernel: R10: ffff88841c371008 R11: ffff88841c37100c R12: 0000000000009fc3

Sep 21 03:40:17 UNRAID kernel: R13: ffff88982d572068 R14: ffff88982d572040 R15: ffff88982d572040

Sep 21 03:40:17 UNRAID kernel: FS: 00001504d0393740(0000) GS:ffff88982d540000(0000) knlGS:0000000000000000

Sep 21 03:40:17 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Sep 21 03:40:17 UNRAID kernel: CR2: 0000000000549e90 CR3: 00000004dc85e000 CR4: 00000000003506e0

Sep 21 03:40:17 UNRAID kernel: note: disk_load[2685] exited with irqs disabled

Sep 21 03:40:17 UNRAID kernel: note: disk_load[2685] exited with preempt_count 2

Sep 21 03:41:17 UNRAID kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

Sep 21 03:41:17 UNRAID kernel: rcu: 21-...0: (1 GPs behind) idle=ee5c/1/0x4000000000000000 softirq=4387298/4387300 fqs=13358

Sep 21 03:41:17 UNRAID kernel: (detected by 13, t=60002 jiffies, g=14420985, q=248753 ncpus=32)

Sep 21 03:41:17 UNRAID kernel: Sending NMI from CPU 13 to CPUs 21:

Sep 21 03:41:17 UNRAID kernel: NMI backtrace for cpu 21

Sep 21 03:41:17 UNRAID kernel: CPU: 21 PID: 21928 Comm: TaskCon~ller #6 Tainted: P D O 6.1.49-Unraid #1

Sep 21 03:41:17 UNRAID kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019

Sep 21 03:41:17 UNRAID kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x80/0x1cf

Sep 21 03:41:17 UNRAID kernel: Code: 2b 08 8b 03 0f 92 c2 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 76 0c 0f ba e0 08 72 1e c6 43 01 00 eb 18 85 c0 74 0a 8b 03 <84> c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 32 01 00 00 e8 b1 40 ff

Sep 21 03:41:17 UNRAID kernel: RSP: 0018:ffffc9001da8fc28 EFLAGS: 00000002

Sep 21 03:41:17 UNRAID kernel: RAX: 0000000000000101 RBX: ffff88982d572040 RCX: 0000000000000024

Sep 21 03:41:17 UNRAID kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88982d572040

Sep 21 03:41:17 UNRAID kernel: RBP: ffff88988f2fad80 R08: 0000000000000002 R09: 000000000006fe8c

Sep 21 03:41:17 UNRAID kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9001da8fce0

Sep 21 03:41:17 UNRAID kernel: R13: ffffea005ad7e5c0 R14: 0000000000000282 R15: 0000000000000000

Sep 21 03:41:17 UNRAID kernel: FS: 000014c8f58046c0(0000) GS:ffff88982d540000(0000) knlGS:0000000000000000

Sep 21 03:41:17 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Sep 21 03:41:17 UNRAID kernel: CR2: 000014c8f3671000 CR3: 000000014fd64000 CR4: 00000000003506e0

Sep 21 03:41:17 UNRAID kernel: Call Trace:

Sep 21 03:41:17 UNRAID kernel: <NMI>

Sep 21 03:41:17 UNRAID kernel: ? nmi_cpu_backtrace+0xd3/0x104

Sep 21 03:41:17 UNRAID kernel: ? nmi_cpu_backtrace_handler+0xd/0x15

Sep 21 03:41:17 UNRAID kernel: ? nmi_handle+0x57/0x131

Sep 21 03:41:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x80/0x1cf

Sep 21 03:41:17 UNRAID kernel: ? default_do_nmi+0x66/0x15b

Sep 21 03:41:17 UNRAID kernel: ? exc_nmi+0xbf/0x130

Sep 21 03:41:17 UNRAID kernel: ? end_repeat_nmi+0x16/0x67

Sep 21 03:41:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x80/0x1cf

Sep 21 03:41:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x80/0x1cf

Sep 21 03:41:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x80/0x1cf

Sep 21 03:41:17 UNRAID kernel: </NMI> Sep 21 03:41:17 UNRAID kernel: <TASK>

Sep 21 03:41:17 UNRAID kernel: do_raw_spin_lock+0x14/0x1a

Sep 21 03:41:17 UNRAID kernel: _raw_spin_lock_irqsave+0x2c/0x37

Sep 21 03:41:17 UNRAID kernel: free_unref_page_list+0x122/0x23b

Sep 21 03:41:17 UNRAID kernel: release_pages+0x2e6/0x30e

Sep 21 03:41:17 UNRAID kernel: tlb_flush_mmu+0x6b/0x99

Sep 21 03:41:17 UNRAID kernel: tlb_finish_mmu+0x2c/0x5b

Sep 21 03:41:17 UNRAID kernel: zap_page_range_single+0xbb/0xe5

Sep 21 03:41:17 UNRAID kernel: ? futex_wait_queue+0x4e/0x82

Sep 21 03:41:17 UNRAID kernel: do_madvise+0x6cf/0xa10

Sep 21 03:41:17 UNRAID kernel: ? __seccomp_filter+0x89/0x2ff

Sep 21 03:41:17 UNRAID kernel: __x64_sys_madvise+0x28/0x2f

Sep 21 03:41:17 UNRAID kernel: do_syscall_64+0x6b/0x81

Sep 21 03:41:17 UNRAID kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce

Sep 21 03:41:17 UNRAID kernel: RIP: 0033:0x14c90837d0f7

Sep 21 03:41:17 UNRAID kernel: Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 15 19 ad 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f1 ac 0d 00 f7 d8 64 89 01 48

Sep 21 03:41:17 UNRAID kernel: RSP: 002b:000014c8f5803b48 EFLAGS: 00000246 ORIG_RAX: 000000000000001c

Sep 21 03:41:17 UNRAID kernel: RAX: ffffffffffffffda RBX: 000014c8fa50e0e8 RCX: 000014c90837d0f7

Sep 21 03:41:17 UNRAID kernel: RDX: 0000000000000004 RSI: 00000000000c0000 RDI: 00000bbbc2140000

Sep 21 03:41:17 UNRAID kernel: RBP: 000014c8f6f74a08 R08: 0000000000000001 R09: 000014c8edc87001

Sep 21 03:41:17 UNRAID kernel: R10: 00007ffda17a7080 R11: 0000000000000246 R12: 00000000000c0000

Sep 21 03:41:17 UNRAID kernel: R13: 0000000000000000 R14: 000014c905dd3170 R15: 00000bbbc2140000

Sep 21 03:41:17 UNRAID kernel: </TASK>

Sep 21 03:44:17 UNRAID kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

Sep 21 03:44:17 UNRAID kernel: rcu: 21-...0: (1 GPs behind) idle=ee5c/1/0x4000000000000000 softirq=4387298/4387300 fqs=86384

Sep 21 03:44:17 UNRAID kernel: (detected by 6, t=240008 jiffies, g=14420985, q=1022977 ncpus=32)

Sep 21 03:44:17 UNRAID kernel: Sending NMI from CPU 6 to CPUs 21:

Sep 21 03:44:17 UNRAID kernel: NMI backtrace for cpu 21

Sep 21 03:44:17 UNRAID kernel: CPU: 21 PID: 21928 Comm: TaskCon~ller #6 Tainted: P D O 6.1.49-Unraid #1

Sep 21 03:44:17 UNRAID kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019

Sep 21 03:44:17 UNRAID kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x86/0x1cf

Sep 21 03:44:17 UNRAID kernel: Code: c2 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 76 0c 0f ba e0 08 72 1e c6 43 01 00 eb 18 85 c0 74 0a 8b 03 84 c0 74 04 f3 90 <eb> f6 66 c7 03 01 00 e9 32 01 00 00 e8 b1 40 ff ff 49 c7 c4 80 e1

Sep 21 03:44:17 UNRAID kernel: RSP: 0018:ffffc9001da8fc28 EFLAGS: 00000002

Sep 21 03:44:17 UNRAID kernel: RAX: 0000000000000101 RBX: ffff88982d572040 RCX: 0000000000000024

Sep 21 03:44:17 UNRAID kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88982d572040

Sep 21 03:44:17 UNRAID kernel: RBP: ffff88988f2fad80 R08: 0000000000000002 R09: 000000000006fe8c

Sep 21 03:44:17 UNRAID kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9001da8fce0

Sep 21 03:44:17 UNRAID kernel: R13: ffffea005ad7e5c0 R14: 0000000000000282 R15: 0000000000000000

Sep 21 03:44:17 UNRAID kernel: FS: 000014c8f58046c0(0000) GS:ffff88982d540000(0000) knlGS:0000000000000000

Sep 21 03:44:17 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Sep 21 03:44:17 UNRAID kernel: CR2: 000014c8f3671000 CR3: 000000014fd64000 CR4: 00000000003506e0

Sep 21 03:44:17 UNRAID kernel: Call Trace:

Sep 21 03:44:17 UNRAID kernel: <NMI>

Sep 21 03:44:17 UNRAID kernel: ? nmi_cpu_backtrace+0xd3/0x104

Sep 21 03:44:17 UNRAID kernel: ? nmi_cpu_backtrace_handler+0xd/0x15

Sep 21 03:44:17 UNRAID kernel: ? nmi_handle+0x57/0x131

Sep 21 03:44:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf

Sep 21 03:44:17 UNRAID kernel: ? default_do_nmi+0x66/0x15b

Sep 21 03:44:17 UNRAID kernel: ? exc_nmi+0xbf/0x130

Sep 21 03:44:17 UNRAID kernel: ? end_repeat_nmi+0x16/0x67

Sep 21 03:44:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf

Sep 21 03:44:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf

Sep 21 03:44:17 UNRAID kernel: ? native_queued_spin_lock_slowpath+0x86/0x1cf

Sep 21 03:44:17 UNRAID kernel: </NMI>

Sep 21 03:44:17 UNRAID kernel: <TASK>

Sep 21 03:44:17 UNRAID kernel: do_raw_spin_lock+0x14/0x1a

Sep 21 03:44:17 UNRAID kernel: _raw_spin_lock_irqsave+0x2c/0x37

Sep 21 03:44:17 UNRAID kernel: free_unref_page_list+0x122/0x23b

Sep 21 03:44:17 UNRAID kernel: release_pages+0x2e6/0x30e

Sep 21 03:44:17 UNRAID kernel: tlb_flush_mmu+0x6b/0x99

Sep 21 03:44:17 UNRAID kernel: tlb_finish_mmu+0x2c/0x5b

Sep 21 03:44:17 UNRAID kernel: zap_page_range_single+0xbb/0xe5

Sep 21 03:44:17 UNRAID kernel: ? futex_wait_queue+0x4e/0x82

Sep 21 03:44:17 UNRAID kernel: do_madvise+0x6cf/0xa10

Sep 21 03:44:17 UNRAID kernel: ? __seccomp_filter+0x89/0x2ff

Sep 21 03:44:17 UNRAID kernel: __x64_sys_madvise+0x28/0x2f

Sep 21 03:44:17 UNRAID kernel: do_syscall_64+0x6b/0x81

Sep 21 03:44:17 UNRAID kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce

Sep 21 03:44:17 UNRAID kernel: RIP: 0033:0x14c90837d0f7

Sep 21 03:44:17 UNRAID kernel: Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 15 19 ad 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f1 ac 0d 00 f7 d8 64 89 01 48

Sep 21 03:44:17 UNRAID kernel: RSP: 002b:000014c8f5803b48 EFLAGS: 00000246 ORIG_RAX: 000000000000001c

Sep 21 03:44:17 UNRAID kernel: RAX: ffffffffffffffda RBX: 000014c8fa50e0e8 RCX: 000014c90837d0f7

Sep 21 03:44:17 UNRAID kernel: RDX: 0000000000000004 RSI: 00000000000c0000 RDI: 00000bbbc2140000

Sep 21 03:44:17 UNRAID kernel: RBP: 000014c8f6f74a08 R08: 0000000000000001 R09: 000014c8edc87001

Sep 21 03:44:17 UNRAID kernel: R10: 00007ffda17a7080 R11: 0000000000000246 R12: 00000000000c0000

Sep 21 03:44:17 UNRAID kernel: R13: 0000000000000000 R14: 000014c905dd3170 R15: 00000bbbc2140000

Sep 21 03:44:17 UNRAID kernel: </TASK>

REPEATS EVERY 3 MINS FROM HERE

.

.

.

.

September 15, 2023

20 minutes ago, JorgeB said:

Those errors look more hardware related to me, if you go back to the previous known working release is it stable?

I would need to go all the way back to the last 6.11.x release, as thats the last time I had stability.

I have the same inclination, which is why I'm currently focised on removing hardware, and will look into RAM, CPOU and Drive connections next.

September 15, 2023

So with Docker and VMs disabled, the system is still generating multiple GPFs and Kernel crashes.

I've attached todays Syslog that shows a boot up sequence around 8:40, with GPFs starting after 10:00. I did manage to do a clea shutdown, so there's that.

Next, I'm going to remove any additional non-essential hardware such as extra GPUs, USB conbtroller etc. If still problematic, I'll boot to safe mode to eliminate PlugIns and after that, it's going to be a CPU and RAM re-install.

Arrggghhhh.

syslog_Sept15

September 15, 2023

I'm on a slow boat right now switch off various bits of the system. I removed all bifurcation and it's still crashing. I removed Cache Dirs plugin which was pegging CPU, and its still crashing.

Next, I'm switching off VM and Docker support. I'll grab a log after that if iot';s still crashing.

September 14, 2023

No Joy

Still crashing frequently today. I've disassembled and rebuilt with some config changes and compromises and will see if that works. (I eliminate my bifurcated set up)

September 13, 2023

6 hours ago, JorgeB said:

All the call traces mention xfs, check filesystem on all xfs disks/pools, run it always without -n, or nothing will be done.

I ran this on all my array drives and it did it's clean up. I ran it a second time with no further issues addresses. Let's see if that made a difference. Thanks for taking a look. I appreciate it greatly.

September 13, 2023

On 9/10/2023 at 8:52 AM, JorgeB said:

This is not a common or known issue, you should start a new thread on the general support forum and post the diagnostics.

On 9/10/2023 at 12:20 PM, Zonediver said:

Maybe a hardware-problem? Cache-SSD? New? Old?

Some more infos...

On 9/11/2023 at 5:52 PM, nka said:

Does 6.12.4 still crashing? I reverted back to 6.11.5 because 6.12.2 was freezing all the time.

Shares came back after a reboot, but significant issues persist.

I've made an issue thread. 6.12.4 is crashing much more frequently, than 6.12.3 was (hourly rather than daily).

September 13, 2023

Hi folks. Ever since upgrading to 6.12.x, I've been having nothing but issues on my server.

I recently updated to 6.12.4 as 6.12.3 was hard crashing every couple of days. Now it's crashing every couple of hours. Any thoughts, help or insights appreciated.

There have been no changes to my system recently apart from an SSD swap to remove a flakey old drive, though the crash issues were occuring before this, and I did this change as part of troubleshooting.

I've attached here my diagnostics pack as well as most recent ssysog.

The behaviour I see is a standard start up, but after a few minutes or, at most, a couple of hours, I will observe a cascade of Kernel faults in logs that ultimately result in a system lock up necessitating a forced restart. The initial fault is never the same.

I have run 2 passes of memcheck all day yesterday with no faults. I have several times checked SATA and power cables to drives and have verified all expansion devices are correctly seated.

A fresh set of eyes would be appreciated as I'm at my wits end here.

syslog unraid-diagnostics-20230912-1557.zip

September 9, 2023

updated to 6.12.4 as 6.12.3 was hard crashing every few days. (6.12.2 before it lost me days tracking down some kind of plugin corruption).

Now all my shares have disappeared, meaning more hours of debugging and faffing around.

What's happened unRAID? I've used it for years and it's been rock solid. Now nothing but issues and problems and hassle????

August 8, 2023

Reporting this encountered for the first time today on 6.12.3

Manifests as empty pages on the Array, Pool and Boot device tabs.

Log just filled with this;

Aug 8 13:45:08 UNRAID nginx: 2023/08/08 13:45:08 [error] 9990#9990: nchan: Out of shared memory while allocating message of size 19386. Increase nchan_max_reserved_memory.

Aug 8 13:45:08 UNRAID nginx: 2023/08/08 13:45:08 [error] 9990#9990: *2742737 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost"

Aug 8 13:45:08 UNRAID nginx: 2023/08/08 13:45:08 [error] 9990#9990: MEMSTORE:01: can't create shared message for channel /disks

Aug 8 13:45:08 UNRAID nginx: 2023/08/08 13:45:08 [crit] 9990#9990: ngx_slab_alloc() failed: no memory

Aug 8 13:45:08 UNRAID nginx: 2023/08/08 13:45:08 [error] 9990#9990: shpool alloc failed

August 6, 2023

So I wiped my USB and reinstalled UNRAID. Copied my config directory and the issue persisted. (could not get GUI in safe mode).

I then embarked on copying my config sub-folders and files one-at-a-time to see which one was the offender. A painstaking process of reboot after reboot.

And would you believe - it was the VERY LAST FILE - the VFIO config. ARRRGGGGHHHHHH!!

With a fresh UNRAID install + all my config folder except VFIO, which I rebuilt, I was finally able to create the ZFS array. It's chugging away now doing some test transfers so I can evaluate performance, but it seems somewhat on a par with my main array. I'll use it for a secondary backup for now so I don't commit any files I don't have copies of elsewhere.

Thanks for looking.

August 1, 2023

In trying to troubleshoot my Pool configuration problem, I've been trying to boot my 6.12.3 server into safe mode bujt that's not working, so I'm hoping for a bit of guidance. Here's the link to the thread that gives rise to this one;

Anyway, when I boot to SAFE GUI mode, server boots up OK, I log in on local display. but no GUI is displayed. Firefox says unable to connect.

When I try to connect remotely, I get a connection refused in my remote browser, and SSH reports that key fo [IP] had changed and that host key verification failed.

I've read up a bunch of the 'No GUI' threads, but found nothing helpful.

Furtherest I got was finding NGNIX was not running. I started it and got a bad gateway. I restarted both it and rc.php-fpm and was directed to a blank page at /Tools/registration. (no other UI URLS would load either.

I completely emptied my plugins directory, but had the same issues on reboot to SAFE mode.

SSL is OFF

So am stuck. Diagnostics attached.

Current advice on other thread is to rebuild USB. Would like to no whats's wrong though.

unraid-diagnostics-20230801-1602.zip

July 31, 2023

7 hours ago, JorgeB said:

Please try booting in safe mode and/or using a different browser.

Im trying with both the built in GUI (monitor attached), as well as a remote connected Chrome browser. Same result in either case.

i tried booting safe mode but there are problems there. In GUI safe mode, the browser won’t load the UI. In GUI or standard mode, a remote browser won’t connect, and even ssh fails to connect with password problems.

I guess ive got to fix that problem before I can progress with my pools issue.

July 31, 2023

2 hours ago, Zeroeth said:

Have to you tried to Erase and then Apply?

Reference: SpaceInvader's video How to Reformat (ZFS) and or Upgrade a Cache Pool in Unraid

I can erase, but even after that the apply button remains disabled.

July 31, 2023

I am unable to configure new pools or modify existing ones under 6.12.3.

What are the reasons for then 'Apply' button in pool configuration to be disabled?

Discussion here;

July 31, 2023

26 minutes ago, JorgeB said:

This is very strange, don't remember seeing it before, if you create a new pool and assign one of those devices does apply work?

No.

If I create a new pool and add one or more devices, (any device) I cannot apply settings.

If I access an existing working pool, make a small change, still the apply button is disabled.

Essentially, I cannot make or edit pools, unless I'm willing to accept default settings for a new pool

July 31, 2023

Folks

Really stuck here. In playing around with this, I find that I can neither set up & configure a new pool nor can I make any changes to existing pools while array is stopped as the APPLY button remains greyed out on the pool settings page.

Diagnostics attached.

unraid-diagnostics-20230731-1155.zip unraid-syslog-20230731-1054.zip

July 30, 2023

No joy. Didn't have an impact on the issue.

1940378789_Screenshot2023-07-30at12_20_17.png.5b413fd2869d0043a078fa5ad256a760.png

I also remade the pool without the underscore character in the name, and reduced the number of drives to 3. But still the Apply button on pool editing remains resolutely disabled.

July 28, 2023

I'm running 6.12.3. I have a bunch of 'OLD' (ex-array) 4TB drives I've reinstalled to my server that I wanted to use to have a play with a ZFS pool. Here they are before I start. Unformatted, no FS etc.

I created a pool (I have 3x other pools previously set up) and added the disks;

So far, so good. Now, I click the link 'Zfs_pool_a' beside the first disk. (note, I DID NOT capitalise that name, I was careful to ensure I named the pool all lower case)

I set up my file system type as ZFS and configured my settings;

However, I cannot apply these settings. Note that the 'Apply' button remains stubbornly greyed out. In fact, it remains so regardless of what file system type I select. If I click 'Done' and start the array, it goes ahead and formats the pool BTRFS.

I must be missing something obvious here, but what? A procedure? A setting? A checkbox somewhere?

Pointers appreciated.

meep

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by meep

[FS] - EU - Sonnet USB3-Pro-4PM-E 4x Controller PCIe USB Adapter

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

System Crash Frequency escalating to unusable server (Solved)

Unraid OS version 6.12.4 available

System Crash Frequency escalating to unusable server (Solved)

Unraid OS version 6.12.4 available

nginx running out of shared memory

Trouble Creating ZFS (Or Any) Pool

SAFE MODE: No local GUI, No remote GUI, no remote SSH?

Trouble Creating ZFS (Or Any) Pool

Trouble Creating ZFS (Or Any) Pool

Unraid OS version 6.12.3 available

Trouble Creating ZFS (Or Any) Pool

Trouble Creating ZFS (Or Any) Pool

Trouble Creating ZFS (Or Any) Pool

Trouble Creating ZFS (Or Any) Pool