Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

System hangs under heavy disk I/O - MCE logged - Diagnostics attached

Featured Replies

UPDATE (Potential Resolution)

June 29, 2026

I wanted to post an update in case someone else with a Ryzen 5000 system finds this thread.

After working with JorgeB and following the Unraid FAQ below, I was able to resolve what appears to have been the cause of the crashes.

Unraid FAQ
https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173

That FAQ led me to the Arch Linux Ryzen documentation regarding Ryzen 5000 processors and Machine Check Exception (MCE) events under Linux.

Arch Linux Ryzen Documentation
https://wiki.archlinux.org/title/Ryzen#Random_reboots

BIOS changes made

  • Updated to the latest ASUS BIOS

  • Disabled DOCP/XMP (memory now running at JEDEC defaults)

  • Set Power Supply Idle Control to Typical Current Idle

  • Disabled ASUS Performance Enhancement

  • Left SVM and IOMMU enabled

  • Initially tested a Positive Curve Optimizer (+4) as recommended in the Linux documentation, but after additional testing returned Curve Optimizer to Auto

I also removed my secondary GTX 980 during troubleshooting since it was no longer needed and simplified my PCIe configuration.

Current Status

Since disabling DOCP/XMP, the server has been completely stable.

I've successfully run:

  • Multiple parity checks

  • Unmanic

  • Sonarr imports

  • Normal Docker workloads

  • General heavy disk activity

without any freezes, unexpected reboots, or Machine Check Exceptions.

The latest diagnostics are also clean.

Conclusion

Based on my testing, the instability appears to have been related to the memory configuration (DOCP/XMP) rather than the SATA controller cards or motherboard.

Many thanks to @JorgeB who took the time to help troubleshoot this.


Original Post

Hi everyone,

I'm hoping to get some help troubleshooting intermittent freezes on my Unraid server. I've already run through quite a few hardware and software tests and I'm trying to determine whether this is a controller issue, motherboard issue, or something else.

Hardware

  • Unraid (latest stable)

  • ASUS ROG Strix B550-F Gaming

  • AMD Ryzen 9 5950X

  • 64 GB DDR4 (4x16 GB Corsair CMK32GX4M2A2666C16)

  • NVIDIA RTX 5060

  • Fractal Design Ion+ 660W Platinum PSU

Storage

  • 10-drive array

  • 3 SSD cache pools

  • 4 HDDs connected directly to motherboard SATA

  • 6 HDDs connected to one PCIe SATA controller

  • 2 HDDs connected to a second PCIe SATA controller

(The controller cards are generic PCIe SATA cards.)

Symptoms

The issue almost always occurs during a parity check or other heavy disk activity.

Examples:

  • Parity check running

  • Unmanic processing a video

  • Sonarr importing files

  • General heavy disk I/O

The server will suddenly become unresponsive.

Sometimes it appears completely locked:

  • Web UI unavailable

  • SMB unavailable

  • Docker applications stop responding

Other times it eventually recovers after several minutes without rebooting.

I also had parity checks abort unexpectedly.

What I had already tested

  • MemTest86 completed first pass with 0 errors

  • Disabled Global C-States

  • PCIe slots forced to Gen3 (GPU left at Gen4)

  • GPU replaced (no change)

  • Temperatures normal

  • Drives report no SMART errors

  • Array reports 0 disk errors

  • Docker and cache drives appeared healthy

Kernel Messages

During boot I consistently saw:

mce: [Hardware Error]: Machine check events logged

Modules loaded:

edac_mce_amd
edac_core

No obvious SATA timeout or reset messages appeared in dmesg after recovery.

Other observations

  • qBittorrent and Sonarr occasionally lost communication during the freeze but eventually reconnected.

  • The issue was strongly correlated with heavy storage activity.

  • The freezes became much easier to reproduce during parity checks.

Questions

  1. Does this sound more like failing PCIe SATA controller cards?

  2. Has anyone seen generic SATA controller cards cause temporary system-wide I/O hangs like this?

  3. Would moving all drives to a proper LSI/Broadcom HBA (9300-16i IT mode) be the next logical troubleshooting step?

  4. Is there anything specific in the diagnostics that I should be looking for?

I've attached my diagnostics zip generated immediately after one of the freezes.

Thanks in advance for any suggestions.

viper-diagnostics-20260626-0713.zip

Edited by viper81
Resolution update

Solved by JorgeB

  • Author

Thanks work on this shortly. I believe C-state is already disabled, but I will report back once I go through everything.

  • Author

Thanks for the recommendations. I went through my BIOS and made the following changes:

  • Updated the BIOS to the latest version (already had)

  • Set Power Supply Idle Control to Typical Current Idle.

  • Left Global C-State Control disabled.

  • Set all PCIe slots used for storage/controller cards to Gen 3 (left the GPU at Gen 4).

  • Disabled DOCP/XMP and am running the memory at the default JEDEC speed (2666 MT/s).

I'm going to leave everything else at stock for now, including PBO and Curve Optimizer, so I only change one variable at a time.

I'll run the server under normal use along with parity checks over the next day or so and report back with the results. If it still locks up or logs additional Machine Check Events, I'll post the updated diagnostics and logs.

  • Author
6 hours ago, JorgeB said:

I made the recommended BIOS changes:

  • Power Supply Idle Control = Typical Current Idle

  • Global C-State disabled

  • Memory running at JEDEC 2666 (DOCP disabled)

  • PCIe storage slots forced to Gen3

I ran another parity check and the server rebooted again.

The new diagnostics now show:

Previous system reset reason: an uncorrected error caused a data fabric sync flood event

followed by Machine Check Events.

I don't see any obvious SATA timeout or disk I/O errors before the reboot. Does this point more toward the Ryzen platform itself (Infinity Fabric/CPU/motherboard) than the storage controllers?

viper-diagnostics-20260626-1414.zip

Edited by viper81

  • Community Expert

Jun 26 13:53:05 VIPER kernel: x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood event

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: Machine check events logged

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: CPU 17: Machine Check: 0 Bank 5: bea0000000000108

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: TSC 0 ADDR 146ad2338960 MISC d01a000000000000 SYND 4d000000 IPID 500b000000000

Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1782499925 SOCKET 0 APIC 3 microcode a201030

This is a protective mechanism: when an uncorrected, fatal error occurs within the internal data fabric or a core, the system immediately halts and resets to prevent data corruption from spreading across. Most common reasons would not be enough voltage to the CPU, overclocked RAM, bad CPU. Since you already disabled XMP, I would increase the CPU voltage a little, especially since that is a known issue with those CPUs and Linux, as mentioned in the FAQ

  • Author

@JorgeB Thanks again for all the help and for pointing me in the right direction.

Based on the Ryzen/Linux stability information in the FAQ and Arch Linux documentation, I've made the following changes:

  • Disabled DOCP/XMP and am running the memory at the default 2666 MHz.

  • Disabled ASUS Performance Enhancement.

  • Set Power Supply Idle Control to Typical Current Idle.

  • Left Global C-State Control disabled.

  • Left all PBO settings on Auto.

  • (Latest Change) Set Curve Optimizer to All Cores, Positive, Magnitude 4, as recommended in the Linux documentation for Ryzen 5000 stability.

I'm going to let the server run overnight with a parity check and normal Docker activity and see if it remains stable. If it crashes again, I'll post the new diagnostics and report back with the results.

Thanks again for taking the time to help troubleshoot this.

Edited by viper81

  • Author

@JorgeB viper-diagnostics-20260629-0734.zipThanks again for all the help and guidance. I really appreciate you taking the time to look through my diagnostics and point me in the right direction.

I've updated my original post with everything I changed and the resources I used so hopefully it can help someone else with a similar issue.

So far the server has now been stable for about 32 hours with zero freezes, reboots, or Machine Check Errors. It has completed parity checks and handled normal workloads without any issues since disabling DOCP/XMP and other settings discussed in the link.

I'll continue to let it run and will report back if anything changes, but so far it's looking very promising.

Thanks again!

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.