[7.1.4] kernel oops: smartctl_type page fault at 0xffffffffffffffff - emhttpd crash, system survived - General Support

April 29Apr 29

Hey all,

Looking for advice/thoughts on a multitude of issues I've been having on my unraid server. I'll start with the most recent issue and then detail all of the others below.

After 9 days of uptime, the web GUI became completely inaccessible (this is the third issue I've had over the past few weeks, more details below). Plex was still serving streams, SMB shares were accessible, all Docker containers were running normally - only the Unraid management layer (emhttpd/nginx) was dead. I plugged a monitor directly into the server and saw a kernel oops on screen. Was able to log in via the local console. System was otherwise fully functional.

Hardware:

ASUS PRIME B760M-A AX (BIOS 1646, March 2024)
i7-12700K
32GB Corsair Dominator DDR5 2x16GB, running at stock 4800 MT/s, XMP disabled
4x Seagate Exos X18 18TB array drives + 2x parity
2x Kingston NV2 1TB NVMe cache (BTRFS RAID1)
Sparkle Intel Arc A380 (currently removed from system, separate issue)
Unraid 7.1.4, kernel 6.12.24

Background - three separate issues over the past few weeks. Detailing all just in case they may be related and anyone has information:

Issue 1 - Hard kernel freezes (April 8-10, TESTING) Multiple complete hard freezes requiring hard resets. No network, no ping, no console, no peripheral power. Zero log evidence locally. Set up comprehensive monitoring writing to a Synology NAS every 10 seconds to survive crashes. Monitoring logs identified a potential cause - Plex VAAPI hardware transcoding via i915 on the Intel UHD 770 iGPU. Two concurrent VAAPI sessions triggered a GPU hang, kernel stalled waiting for GPU response, froze completely. I am testing disabling Plex hardware transcoding. System has ran stably for 9 days afterward. RAM ruled out - 18 passes Memtest86 across all configurations, 0 errors.

Issue 2 - USB flash drive filesystem corruption (April 18) Web GUI went down, cron scripts started failing with exit 126, console was spammed with errors. System was still partially functional. Root cause was filesystem corruption on the USB boot drive (sda1) - the drive had been in a USB 3.0 port for ~2 years. Kernel logs showed:

2026-04-18T00:19:37 device offline error, dev sda, sector 2437240 op 0x0:(READ)
2026-04-18T00:19:37 device offline error, dev sda, sector 3110768 op 0x0:(READ)
2026-04-18T00:19:37 I/O error, dev loop1, sector 347456 op 0x0:(READ)
2026-04-18T00:19:37 SQUASHFS error: Failed to read block 0xa9a8308: -5
2026-04-18T00:19:37 SQUASHFS error: Unable to read fragment cache entry [a9a8308]
2026-04-18T00:19:37 device offline error, dev sda, sector 3607723 op 0x1:(WRITE)
2026-04-18T00:19:37 Buffer I/O error on dev sda1, logical block 3605675, lost async page write
2026-04-18T00:19:37 FAT-fs (sda1): unable to read inode block for updating (i_pos 55271939)
2026-04-18T00:19:37 FAT-fs (sda1): FAT read failed (blocknr 1777)
2026-04-18T00:19:37 FAT-fs (sda1): FAT read failed (blocknr 1776)

Windows chkdsk found and repaired filesystem corruption. Drive moved to USB 2.0 port. System came back up cleanly. Hasn't had issues since.

Issue 3 - smartctl_type kernel oops (April 25, current issue) After 9 days of clean uptime following the above fixes, the kernel oops described below occurred.

What I think triggered it: The Dynamix system monitor is configured to refresh every minute (system="*/1 * * * *" in dynamix.cfg). As part of that refresh cycle it calls smartctl_type to update drive temperatures and stats on the dashboard. With 8 drives being polled every minute this gives frequent opportunities to hit whatever kernel bug is lurking in the mmap code path.

Worth noting: both xe and i915 modules were loaded simultaneously at crash time - i915 for the iGPU, xe loaded by default on this kernel. Not sure if relevant but including it for completeness.

Crash signature (from dmesg, April 25th 13:27:54):

BUG: unable to handle page fault for address: ffffffffffffffff
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
Oops: Oops: 0010 [#1] PREEMPT SMP NOPTI
CPU: 18 UID: 0 PID: 2458609 Comm: smartctl_type
Tainted: P O 6.12.24-Unraid #1
Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Hardware name: ASUS System Product Name/PRIME B760M-A AX, BIOS 1646 03/20/2024
RIP: 0010:0xffffffffffffffff
Code: Unable to access opcode bytes at 0xffffffffffffffd5

Call Trace:
 <TASK>
 ? mast_split_data+0x3c/0x140
 ? mas_push_data+0x1c6/0x210
 ? mas_wr_bnode+0x417/0x4e0
 ? mas_push_data+0x1e5/0x210
 ? mas_store_prealloc+0x94/0xd0
 ? vma_complete+0x7d/0x190
 ? __split_vma+0x1c9/0x220
 ? vms_gather_munmap_vmas+0x155/0x1d0
 ? __mmap_region+0x216/0x700
 ? mmap_region+0x72/0x90
 ? do_mmap+0x43b/0x4a0
 ? vm_mmap_pgoff+0xb6/0x110
 ? ksys_mmap_pgoff+0x156/0x190
 ? do_syscall_64+0x68/0xe0
 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
note: smartctl_type[2458609] exited with irqs disabled

Key observations: The crash is entirely in kernel memory management code (mmap), not in smartctl itself. The instruction pointer jumped to 0xffffffffffffffff - an invalid address - suggesting a corrupted function pointer or use-after-free in the mmap code path. smartctl_type just happened to be the process that triggered it.

This appears related to a pattern in the forums - the "6.12.8 - Segfaults and call traces" thread shows the same mmap crash signature appearing in smartctl_type, php-fpm, and python3 across different kernel versions. This suggests it's a recurring kernel mmap bug that manifests through different processes rather than being smartctl-specific.

What happened after the crash: The oops killed smartctl_type which took emhttpd down with it. All containers, SMB shares, and network remained fully operational. Was able to log in via local console. Clean reboot restored everything.

Previous forum thread for Issue 1 (hard freezes): https://forums.unraid.net/topic/198203-at-a-loss-unraid-keeps-crashing-and-no-logsevidenceartifacts-left-behind-are-helping-figure-it-out/

Diagnostics and full dmesg log attached.

dmesg.log unraid-diagnostics-20260428-2341.zip

Quote

April 29Apr 29

Community Expert

In my experience, those smartctl related crashes are typically caused by a specific device, most often an NVMe.

Quote

April 30Apr 30

Author

12 hours ago, JorgeB said:
In my experience, those smartctl related crashes are typically caused by a specific device, most often an NVMe.

Yeah, that's fair. I checked both NVMe drives (2x Kingston NV2 SNV2S1000G) and both are returning Read Self-test Log failed: Invalid Field in Command (0x002) when smartctl tries to read the self-test log. Everything else is clean - 0 errors, 100% spare, temps normal. Could this unsupported command response be the trigger for the kernel bug?

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.24-Unraid] (local build)

=== START OF INFORMATION SECTION ===

Model Number: KINGSTON SNV2S1000G

Serial Number: 50026B7686B7D65D

Firmware Version: SBM02106

PCI Vendor/Subsystem ID: 0x2646

IEEE OUI Identifier: 0x0026b7

Controller ID: 1

NVMe Version: 1.4

Number of Namespaces: 1

Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]

Namespace 1 Formatted LBA Size: 512

Namespace 1 IEEE EUI-64: 0026b7 686b7d65d5

Local Time is: Wed Apr 29 16:23:07 2026 EDT

Firmware Updates (0x12): 1 Slot, no Reset required

Optional Admin Commands (0x0016): Format Frmw_DL Self_Test

Optional NVM Commands (0x009f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Verify

Log Page Attributes (0x12): Cmd_Eff_Lg Pers_Ev_Lg

Maximum Data Transfer Size: 64 Pages

Warning Comp. Temp. Threshold: 83 Celsius

Critical Comp. Temp. Threshold: 90 Celsius

Supported Power States

St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat

0 + 5.00W - - 0 0 0 0 0 0

1 + 3.50W - - 1 1 1 1 0 200

2 + 2.50W - - 2 2 2 2 0 1000

3 - 1.50W - - 3 3 3 3 5000 5000

4 - 1.50W - - 4 4 4 4 20000 70000

Supported LBA Sizes (NSID 0x1)

Id Fmt Data Metadt Rel_Perf

0 + 512 0 0

=== START OF SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)

Critical Warning: 0x00

Temperature: 31 Celsius

Available Spare: 100%

Available Spare Threshold: 10%

Percentage Used: 18%

Data Units Read: 114,618,684 [58.6 TB]

Data Units Written: 207,487,939 [106 TB]

Host Read Commands: 383,008,645

Host Write Commands: 996,926,132

Controller Busy Time: 123,966

Power Cycles: 35

Power On Hours: 14,443

Unsafe Shutdowns: 19

Media and Data Integrity Errors: 0

Error Information Log Entries: 0

Warning Comp. Temperature Time: 0

Critical Comp. Temperature Time: 0

Temperature Sensor 2: 52 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)

No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.24-Unraid] (local build)

=== START OF INFORMATION SECTION ===

Model Number: KINGSTON SNV2S1000G

Serial Number: 50026B7686B6899D

Firmware Version: SBM02106

PCI Vendor/Subsystem ID: 0x2646

IEEE OUI Identifier: 0x0026b7

Controller ID: 1

NVMe Version: 1.4

Number of Namespaces: 1

Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]

Namespace 1 Formatted LBA Size: 512

Namespace 1 IEEE EUI-64: 0026b7 686b6899d5

Local Time is: Wed Apr 29 16:23:19 2026 EDT

Firmware Updates (0x12): 1 Slot, no Reset required

Optional Admin Commands (0x0016): Format Frmw_DL Self_Test

Optional NVM Commands (0x009f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Verify

Log Page Attributes (0x12): Cmd_Eff_Lg Pers_Ev_Lg

Maximum Data Transfer Size: 64 Pages

Warning Comp. Temp. Threshold: 83 Celsius

Critical Comp. Temp. Threshold: 90 Celsius

Supported Power States

St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat

0 + 5.00W - - 0 0 0 0 0 0

1 + 3.50W - - 1 1 1 1 0 200

2 + 2.50W - - 2 2 2 2 0 1000

3 - 1.50W - - 3 3 3 3 5000 5000

4 - 1.50W - - 4 4 4 4 20000 70000

Supported LBA Sizes (NSID 0x1)

Id Fmt Data Metadt Rel_Perf

0 + 512 0 0

=== START OF SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)

Critical Warning: 0x00

Temperature: 33 Celsius

Available Spare: 100%

Available Spare Threshold: 10%

Percentage Used: 19%

Data Units Read: 75,048,809 [38.4 TB]

Data Units Written: 205,472,724 [105 TB]

Host Read Commands: 239,304,662

Host Write Commands: 988,904,080

Controller Busy Time: 122,383

Power Cycles: 27

Power On Hours: 14,249

Unsafe Shutdowns: 15

Media and Data Integrity Errors: 0

Error Information Log Entries: 0

Warning Comp. Temperature Time: 0

Critical Comp. Temperature Time: 0

Temperature Sensor 2: 57 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)

No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

On 4/29/2026 at 3:44 AM, JorgeB said:
In my experience, those smartctl related crashes are typically caused by a specific device, most often an NVMe.

Quote

April 30Apr 30

Community Expert

Could be, if you can test a few hours without them, it would confirm it.

Quote

April 30Apr 30

Author

17 minutes ago, JorgeB said:
Could be, if you can test a few hours without them, it would confirm it.

The issue is it rarely happens (or ever? I actually dont know if this is the first time Ive seen this specific issue). The system was up and running for 6 days before this even happened. I've had other issues and it's hard to tell if they are related or if its multiple issues at a time. Guess my point is taking them out could work but Id need to run the system without them for a long time to confirm which I cant really do atm lol.

Quote

April 30Apr 30

Community Expert

54 minutes ago, cjlmediasolutions said:
The issue is it rarely happens (or ever? I actually dont know if this is the first time Ive seen this specific issue)

Then I wouldn't worry about it for now, that by itself should not cause any issues.

Quote

[7.1.4] kernel oops: smartctl_type page fault at 0xffffffffffffffff - emhttpd crash, system survived

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)