March 12Mar 12 System crashes repeatedly during Parity Sync/Data Rebuild operations. The crash occurs consistently after several hours of rebuild operation. System becomes completely unresponsive and requires a hard power cycle. The array does not auto-start after reboot (unclean shutdown detected).I was able to capture the following kernel oops via remote syslog:kernel: BUG: unable to handle page fault for address: 0000000000001018kernel: #PF: supervisor read access in kernel modekernel: #PF: error_code(0x0000) - not-present pagekernel: PGD 0 P4D 0kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTIkernel: CPU: 10 UID: 0 PID: 11559 Comm: unraidd0 Tainted: P O 6.12.54-Unraid #1kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULEkernel: Hardware name: ASRock Z790 PG Riptide/Z790 PG Riptide, BIOS 20.01 09/24/2025kernel: RIP: 0010:bio_associate_blkg_from_css+0x148/0x190kernel: Code: 8b 3c 24 e8 aa 6a 5a 00 eb 04 4d 8b 76 30 4d 85 f6 74 0c 4c 89 f7 e8 37 f4 ff ff 84 c0 74 eb e8 3e 57 b4 ff eb 27 48 8b 45 08 <48> 8b 40 18 48 8b b8 e0 01 00 00 48 83 c7 38 e8 74 f4 ff ff 48 8bkernel: RSP: 0018:ffffc90003eabcf8 EFLAGS: 00010246kernel: RAX: 0000000000001000 RBX: ffffffff83063fe0 RCX: 0000000000000000kernel: RDX: ffff8881174be480 RSI: ffffffff83063fe0 RDI: 0000000000000000kernel: RBP: ffff8881ecc74048 R08: 0000000000000000 R09: 0000000000000000kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001kernel: R13: ffff8881ecc739a8 R14: ffff888106809860 R15: ffff8881ecc740f8kernel: FS: 0000000000000000(0000) GS:ffff88a00f280000(0000) knlGS:0000000000000000kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033kernel: CR2: 0000000000001018 CR3: 0000000005618002 CR4: 0000000000772ef0kernel: PKRU: 55555554kernel: Call Trace:kernel: <TASK>kernel: ? submit_bio_noacct_nocheck+0x152/0x2e0kernel: bio_associate_blkg+0x3b/0x50kernel: bio_init+0x5b/0xa0kernel: unraidd+0x1257/0x13d0 [md_mod]kernel: ? preempt_latency_start+0x2b/0x50kernel: ? md_thread+0xf1/0x120 [md_mod]kernel: ? kthread_should_park+0x12/0x30kernel: md_thread+0xf1/0x120 [md_mod]kernel: ? __pfx_autoremove_wake_function+0x10/0x10kernel: ? __pfx_md_thread+0x10/0x10 [md_mod]kernel: kthread+0xec/0x100kernel: ? __pfx_kthread+0x10/0x10kernel: ret_from_fork+0x21/0x40kernel: ? __pfx_kthread+0x10/0x10kernel: ret_from_fork_asm+0x1a/0x30kernel: </TASK>Troubleshooting performed:Memtest86 passed multiple passes with zero errorsAll SMART checks passed on all drivesUpdated Samsung 990 PRO NVMe firmware to latestDisabled Intel VMD in BIOSAdjusted CPU voltage settings (Vcore Compensation, LLC, voltage offset)Reduced CPU P-Core and E-Core ratiosAll temperatures well within spec (CPU 34-41°C, NVMes 38-47°C)XMP not active, RAM running at JEDEC DDR5-3600Crash captured via remote syslog (Python UDP listener on separate machine)Configuration:1 parity drive (18TB), 13 data drives (8TB-18TB mix), 6 cache pools (mix of NVMe and SATA SSD)60+ Docker containers runningCrash occurs consistently during Parity Sync/Data Rebuild, typically after 4-8 hours of operationThe unraidd0 process hits a null pointer dereference in bio_associate_blkg_from_css during block I/O submissionAdditional context:The crash was also preceded by an unmanic segfault earlier in the same session, though that appeared unrelatedThe system continued running after the kernel oops but the parity rebuild thread died, eventually leading to system unresponsivenessHas anyone else encountered this crash signature during parity rebuilds on kernel 6.12.54?
March 12Mar 12 1 hour ago, rhodesjo said:[md_mod]Unraid driver is crashing; this is almost always a hardware issue. Which CPU do you have?
March 12Mar 12 Author 1 hour ago, JorgeB said:Unraid driver is crashing; this is almost always a hardware issue. Which CPU do you have?i9-14900k - but it's been on the approved microcode since day 1 about a year ago. I tried very hard to troubleshoot and ensure it's not a CPU degradation issue.
March 12Mar 12 1 hour ago, rhodesjo said:i9-14900kI bet this is the problem. There have been dozens of confirmed cases with this and similar CPUs, especially the 13700K, 14700K, 13900K and 14900K.Some users are on their 3rd and 4th CPU.
March 12Mar 12 Author 25 minutes ago, JorgeB said:I bet this is the problem. There have been dozens of confirmed cases with this and similar CPUs, especially the 13700K, 14700K, 13900K and 14900K.Some users are on their 3rd and 4th CPU.Thank you for the response. After further testing I've confirmed that core 20 appears to be degraded — I've had 4 segfaults from different processes (Unmanic, Authentik/Celery, generic Python) all on core 20. I've now isolated it with isolcpus and am testing stability. I also had crashes with parity paused, which further points to CPU rather than the md driver alone.I'll be filing an Intel RMA. Microcode 0x132 was applied at install about a year ago, so hopefully the degradation is limited. Appreciate the confirmation.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.