-
Kernel BUG/Oops in bio_associate_blkg_from_css during Parity Sync/Data Rebuild - Unraid 7.2.4
Thank you for the response. After further testing I've confirmed that core 20 appears to be degraded — I've had 4 segfaults from different processes (Unmanic, Authentik/Celery, generic Python) all on core 20. I've now isolated it with isolcpus and am testing stability. I also had crashes with parity paused, which further points to CPU rather than the md driver alone. I'll be filing an Intel RMA. Microcode 0x132 was applied at install about a year ago, so hopefully the degradation is limited. Appreciate the confirmation.
-
Kernel BUG/Oops in bio_associate_blkg_from_css during Parity Sync/Data Rebuild - Unraid 7.2.4
i9-14900k - but it's been on the approved microcode since day 1 about a year ago. I tried very hard to troubleshoot and ensure it's not a CPU degradation issue.
-
-
Kernel BUG/Oops in bio_associate_blkg_from_css during Parity Sync/Data Rebuild - Unraid 7.2.4
System crashes repeatedly during Parity Sync/Data Rebuild operations. The crash occurs consistently after several hours of rebuild operation. System becomes completely unresponsive and requires a hard power cycle. The array does not auto-start after reboot (unclean shutdown detected). I was able to capture the following kernel oops via remote syslog: kernel: BUG: unable to handle page fault for address: 0000000000001018 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI kernel: CPU: 10 UID: 0 PID: 11559 Comm: unraidd0 Tainted: P O 6.12.54-Unraid #1 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE kernel: Hardware name: ASRock Z790 PG Riptide/Z790 PG Riptide, BIOS 20.01 09/24/2025 kernel: RIP: 0010:bio_associate_blkg_from_css+0x148/0x190 kernel: Code: 8b 3c 24 e8 aa 6a 5a 00 eb 04 4d 8b 76 30 4d 85 f6 74 0c 4c 89 f7 e8 37 f4 ff ff 84 c0 74 eb e8 3e 57 b4 ff eb 27 48 8b 45 08 <48> 8b 40 18 48 8b b8 e0 01 00 00 48 83 c7 38 e8 74 f4 ff ff 48 8b kernel: RSP: 0018:ffffc90003eabcf8 EFLAGS: 00010246 kernel: RAX: 0000000000001000 RBX: ffffffff83063fe0 RCX: 0000000000000000 kernel: RDX: ffff8881174be480 RSI: ffffffff83063fe0 RDI: 0000000000000000 kernel: RBP: ffff8881ecc74048 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 kernel: R13: ffff8881ecc739a8 R14: ffff888106809860 R15: ffff8881ecc740f8 kernel: FS: 0000000000000000(0000) GS:ffff88a00f280000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000001018 CR3: 0000000005618002 CR4: 0000000000772ef0 kernel: PKRU: 55555554 kernel: Call Trace: kernel: <TASK> kernel: ? submit_bio_noacct_nocheck+0x152/0x2e0 kernel: bio_associate_blkg+0x3b/0x50 kernel: bio_init+0x5b/0xa0 kernel: unraidd+0x1257/0x13d0 [md_mod] kernel: ? preempt_latency_start+0x2b/0x50 kernel: ? md_thread+0xf1/0x120 [md_mod] kernel: ? kthread_should_park+0x12/0x30 kernel: md_thread+0xf1/0x120 [md_mod] kernel: ? __pfx_autoremove_wake_function+0x10/0x10 kernel: ? __pfx_md_thread+0x10/0x10 [md_mod] kernel: kthread+0xec/0x100 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork+0x21/0x40 kernel: ? __pfx_kthread+0x10/0x10 kernel: ret_from_fork_asm+0x1a/0x30 kernel: </TASK> Troubleshooting performed: Memtest86 passed multiple passes with zero errors All SMART checks passed on all drives Updated Samsung 990 PRO NVMe firmware to latest Disabled Intel VMD in BIOS Adjusted CPU voltage settings (Vcore Compensation, LLC, voltage offset) Reduced CPU P-Core and E-Core ratios All temperatures well within spec (CPU 34-41°C, NVMes 38-47°C) XMP not active, RAM running at JEDEC DDR5-3600 Crash captured via remote syslog (Python UDP listener on separate machine) Configuration: 1 parity drive (18TB), 13 data drives (8TB-18TB mix), 6 cache pools (mix of NVMe and SATA SSD) 60+ Docker containers running Crash occurs consistently during Parity Sync/Data Rebuild, typically after 4-8 hours of operation The unraidd0 process hits a null pointer dereference in bio_associate_blkg_from_css during block I/O submission Additional context: The crash was also preceded by an unmanic segfault earlier in the same session, though that appeared unrelated The system continued running after the kernel oops but the parity rebuild thread died, eventually leading to system unresponsiveness Has anyone else encountered this crash signature during parity rebuilds on kernel 6.12.54?
rhodesjo
Members
-
Joined
-
Last visited