UPDATE (Potential Resolution) June 29, 2026 I wanted to post an update in case someone else with a Ryzen 5000 system finds this thread. After working with JorgeB and following the Unraid FAQ below, I was able to resolve what appears to have been the cause of the crashes. Unraid FAQ https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173 That FAQ led me to the Arch Linux Ryzen documentation regarding Ryzen 5000 processors and Machine Check Exception (MCE) events under Linux. Arch Linux Ryzen Documentation https://wiki.archlinux.org/title/Ryzen#Random_reboots BIOS changes made Updated to the latest ASUS BIOS Disabled DOCP/XMP (memory now running at JEDEC defaults) Set Power Supply Idle Control to Typical Current Idle Disabled ASUS Performance Enhancement Left SVM and IOMMU enabled Initially tested a Positive Curve Optimizer (+4) as recommended in the Linux documentation, but after additional testing returned Curve Optimizer to Auto I also removed my secondary GTX 980 during troubleshooting since it was no longer needed and simplified my PCIe configuration. Current Status Since disabling DOCP/XMP, the server has been completely stable. I've successfully run: Multiple parity checks Unmanic Sonarr imports Normal Docker workloads General heavy disk activity without any freezes, unexpected reboots, or Machine Check Exceptions. The latest diagnostics are also clean. Conclusion Based on my testing, the instability appears to have been related to the memory configuration (DOCP/XMP) rather than the SATA controller cards or motherboard. Many thanks to @JorgeB who took the time to help troubleshoot this. Original Post Hi everyone, I'm hoping to get some help troubleshooting intermittent freezes on my Unraid server. I've already run through quite a few hardware and software tests and I'm trying to determine whether this is a controller issue, motherboard issue, or something else. Hardware Unraid (latest stable) ASUS ROG Strix B550-F Gaming AMD Ryzen 9 5950X 64 GB DDR4 (4x16 GB Corsair CMK32GX4M2A2666C16) NVIDIA RTX 5060 Fractal Design Ion+ 660W Platinum PSU Storage 10-drive array 3 SSD cache pools 4 HDDs connected directly to motherboard SATA 6 HDDs connected to one PCIe SATA controller 2 HDDs connected to a second PCIe SATA controller (The controller cards are generic PCIe SATA cards.) Symptoms The issue almost always occurs during a parity check or other heavy disk activity. Examples: Parity check running Unmanic processing a video Sonarr importing files General heavy disk I/O The server will suddenly become unresponsive. Sometimes it appears completely locked: Web UI unavailable SMB unavailable Docker applications stop responding Other times it eventually recovers after several minutes without rebooting. I also had parity checks abort unexpectedly. What I had already tested MemTest86 completed first pass with 0 errors Disabled Global C-States PCIe slots forced to Gen3 (GPU left at Gen4) GPU replaced (no change) Temperatures normal Drives report no SMART errors Array reports 0 disk errors Docker and cache drives appeared healthy Kernel Messages During boot I consistently saw: mce: [Hardware Error]: Machine check events logged Modules loaded: edac_mce_amd edac_core No obvious SATA timeout or reset messages appeared in dmesg after recovery. Other observations qBittorrent and Sonarr occasionally lost communication during the freeze but eventually reconnected. The issue was strongly correlated with heavy storage activity. The freezes became much easier to reproduce during parity checks. Questions Does this sound more like failing PCIe SATA controller cards? Has anyone seen generic SATA controller cards cause temporary system-wide I/O hangs like this? Would moving all drives to a proper LSI/Broadcom HBA (9300-16i IT mode) be the next logical troubleshooting step? Is there anything specific in the diagnostics that I should be looking for? I've attached my diagnostics zip generated immediately after one of the freezes. Thanks in advance for any suggestions. viper-diagnostics-20260626-0713.zip

Thanks for the recommendations. I went through my BIOS and made the following changes: Updated the BIOS to the latest version (already had) Set Power Supply Idle Control to Typical Current Idle. Left Global C-State Control disabled. Set all PCIe slots used for storage/controller cards to Gen 3 (left the GPU at Gen 4). Disabled DOCP/XMP and am running the memory at the default JEDEC speed (2666 MT/s). I'm going to leave everything else at stock for now, including PBO and Curve Optimizer, so I only change one variable at a time. I'll run the server under normal use along with parity checks over the next day or so and report back with the results. If it still locks up or logs additional Machine Check Events, I'll post the updated diagnostics and logs.

@JorgeB Thanks again for all the help and for pointing me in the right direction. Based on the Ryzen/Linux stability information in the FAQ and Arch Linux documentation, I've made the following changes: Disabled DOCP/XMP and am running the memory at the default 2666 MHz. Disabled ASUS Performance Enhancement. Set Power Supply Idle Control to Typical Current Idle. Left Global C-State Control disabled. Left all PBO settings on Auto. (Latest Change) Set Curve Optimizer to All Cores, Positive, Magnitude 4, as recommended in the Linux documentation for Ryzen 5000 stability. I'm going to let the server run overnight with a parity check and normal Docker activity and see if it remains stable. If it crashes again, I'll post the new diagnostics and report back with the results. Thanks again for taking the time to help troubleshoot this.

June 26Jun 26

UPDATE (Potential Resolution)

June 29, 2026

I wanted to post an update in case someone else with a Ryzen 5000 system finds this thread.

After working with JorgeB and following the Unraid FAQ below, I was able to resolve what appears to have been the cause of the crashes.

Unraid FAQ
https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173

That FAQ led me to the Arch Linux Ryzen documentation regarding Ryzen 5000 processors and Machine Check Exception (MCE) events under Linux.

Arch Linux Ryzen Documentation
https://wiki.archlinux.org/title/Ryzen#Random_reboots

BIOS changes made

Updated to the latest ASUS BIOS
Disabled DOCP/XMP (memory now running at JEDEC defaults)
Set Power Supply Idle Control to Typical Current Idle
Disabled ASUS Performance Enhancement
Left SVM and IOMMU enabled
Initially tested a Positive Curve Optimizer (+4) as recommended in the Linux documentation, but after additional testing returned Curve Optimizer to Auto

I also removed my secondary GTX 980 during troubleshooting since it was no longer needed and simplified my PCIe configuration.

Current Status

Since disabling DOCP/XMP, the server has been completely stable.

I've successfully run:

Multiple parity checks
Unmanic
Sonarr imports
Normal Docker workloads
General heavy disk activity

without any freezes, unexpected reboots, or Machine Check Exceptions.

The latest diagnostics are also clean.

Conclusion

Based on my testing, the instability appears to have been related to the memory configuration (DOCP/XMP) rather than the SATA controller cards or motherboard.

Many thanks to @JorgeB who took the time to help troubleshoot this.

Original Post

Hi everyone,

I'm hoping to get some help troubleshooting intermittent freezes on my Unraid server. I've already run through quite a few hardware and software tests and I'm trying to determine whether this is a controller issue, motherboard issue, or something else.

Hardware

Unraid (latest stable)
ASUS ROG Strix B550-F Gaming
AMD Ryzen 9 5950X
64 GB DDR4 (4x16 GB Corsair CMK32GX4M2A2666C16)
NVIDIA RTX 5060
Fractal Design Ion+ 660W Platinum PSU

Storage

10-drive array
3 SSD cache pools
4 HDDs connected directly to motherboard SATA
6 HDDs connected to one PCIe SATA controller
2 HDDs connected to a second PCIe SATA controller

(The controller cards are generic PCIe SATA cards.)

Symptoms

The issue almost always occurs during a parity check or other heavy disk activity.

Examples:

Parity check running
Unmanic processing a video
Sonarr importing files
General heavy disk I/O

The server will suddenly become unresponsive.

Sometimes it appears completely locked:

Web UI unavailable
SMB unavailable
Docker applications stop responding

Other times it eventually recovers after several minutes without rebooting.

I also had parity checks abort unexpectedly.

What I had already tested

MemTest86 completed first pass with 0 errors
Disabled Global C-States
PCIe slots forced to Gen3 (GPU left at Gen4)
GPU replaced (no change)
Temperatures normal
Drives report no SMART errors
Array reports 0 disk errors
Docker and cache drives appeared healthy

Kernel Messages

During boot I consistently saw:

mce: [Hardware Error]: Machine check events logged

Modules loaded:

edac_mce_amd
edac_core

No obvious SATA timeout or reset messages appeared in dmesg after recovery.

Other observations

qBittorrent and Sonarr occasionally lost communication during the freeze but eventually reconnected.
The issue was strongly correlated with heavy storage activity.
The freezes became much easier to reproduce during parity checks.

Questions

Does this sound more like failing PCIe SATA controller cards?
Has anyone seen generic SATA controller cards cause temporary system-wide I/O hangs like this?
Would moving all drives to a proper LSI/Broadcom HBA (9300-16i IT mode) be the next logical troubleshooting step?
Is there anything specific in the diagnostics that I should be looking for?

I've attached my diagnostics zip generated immediately after one of the freezes.

Thanks in advance for any suggestions.

viper-diagnostics-20260626-0713.zip

Edited June 29Jun 29 by viper81
Resolution update

Quote

June 26Jun 26

Community Expert
Solution

Make sure this has been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173

Quote

1

June 26Jun 26

Author

Thanks work on this shortly. I believe C-state is already disabled, but I will report back once I go through everything.

Quote

June 26Jun 26

Author

Thanks for the recommendations. I went through my BIOS and made the following changes:

Updated the BIOS to the latest version (already had)
Set Power Supply Idle Control to Typical Current Idle.
Left Global C-State Control disabled.
Set all PCIe slots used for storage/controller cards to Gen 3 (left the GPU at Gen 4).
Disabled DOCP/XMP and am running the memory at the default JEDEC speed (2666 MT/s).

I'm going to leave everything else at stock for now, including PBO and Curve Optimizer, so I only change one variable at a time.

I'll run the server under normal use along with parity checks over the next day or so and report back with the results. If it still locks up or logs additional Machine Check Events, I'll post the updated diagnostics and logs.

Quote

1

June 26Jun 26

Author

6 hours ago, JorgeB said:
Make sure this has been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173

I made the recommended BIOS changes:

Power Supply Idle Control = Typical Current Idle
Global C-State disabled
Memory running at JEDEC 2666 (DOCP disabled)
PCIe storage slots forced to Gen3

I ran another parity check and the server rebooted again.

The new diagnostics now show:

Previous system reset reason: an uncorrected error caused a data fabric sync flood event

followed by Machine Check Events.

I don't see any obvious SATA timeout or disk I/O errors before the reboot. Does this point more toward the Ryzen platform itself (Infinity Fabric/CPU/motherboard) than the storage controllers?

viper-diagnostics-20260626-1414.zip

Edited June 26Jun 26 by viper81

Quote

June 27Jun 27

Community Expert

Jun 26 13:53:05 VIPER kernel: x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood event

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: Machine check events logged

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: CPU 17: Machine Check: 0 Bank 5: bea0000000000108

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: TSC 0 ADDR 146ad2338960 MISC d01a000000000000 SYND 4d000000 IPID 500b000000000

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1782499925 SOCKET 0 APIC 3 microcode a201030

This is a protective mechanism: when an uncorrected, fatal error occurs within the internal data fabric or a core, the system immediately halts and resets to prevent data corruption from spreading across. Most common reasons would not be enough voltage to the CPU, overclocked RAM, bad CPU. Since you already disabled XMP, I would increase the CPU voltage a little, especially since that is a known issue with those CPUs and Linux, as mentioned in the FAQ

Quote

1

June 27Jun 27

Author

@JorgeB Thanks again for all the help and for pointing me in the right direction.

Based on the Ryzen/Linux stability information in the FAQ and Arch Linux documentation, I've made the following changes:

Disabled DOCP/XMP and am running the memory at the default 2666 MHz.
Disabled ASUS Performance Enhancement.
Set Power Supply Idle Control to Typical Current Idle.
Left Global C-State Control disabled.
Left all PBO settings on Auto.
(Latest Change) Set Curve Optimizer to All Cores, Positive, Magnitude 4, as recommended in the Linux documentation for Ryzen 5000 stability.

I'm going to let the server run overnight with a parity check and normal Docker activity and see if it remains stable. If it crashes again, I'll post the new diagnostics and report back with the results.

Thanks again for taking the time to help troubleshoot this.

Edited June 27Jun 27 by viper81

Quote

1

June 29Jun 29

Author

@JorgeB viper-diagnostics-20260629-0734.zipThanks again for all the help and guidance. I really appreciate you taking the time to look through my diagnostics and point me in the right direction.

I've updated my original post with everything I changed and the resources I used so hopefully it can help someone else with a similar issue.

So far the server has now been stable for about 32 hours with zero freezes, reboots, or Machine Check Errors. It has completed parity checks and handled normal workloads without any issues since disabling DOCP/XMP and other settings discussed in the link.

I'll continue to let it run and will report back if anything changes, but so far it's looking very promising.

Thanks again!

Quote

1

System hangs under heavy disk I/O - MCE logged - Diagnostics attached

Featured Replies

UPDATE (Potential Resolution)

BIOS changes made

Current Status

Conclusion

Original Post

Hardware

Storage

Symptoms

What I had already tested

Kernel Messages

Other observations

Questions

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)