June 26Jun 26 UPDATE (Potential Resolution)June 29, 2026I wanted to post an update in case someone else with a Ryzen 5000 system finds this thread.After working with JorgeB and following the Unraid FAQ below, I was able to resolve what appears to have been the cause of the crashes.Unraid FAQhttps://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173That FAQ led me to the Arch Linux Ryzen documentation regarding Ryzen 5000 processors and Machine Check Exception (MCE) events under Linux.Arch Linux Ryzen Documentationhttps://wiki.archlinux.org/title/Ryzen#Random_rebootsBIOS changes madeUpdated to the latest ASUS BIOSDisabled DOCP/XMP (memory now running at JEDEC defaults)Set Power Supply Idle Control to Typical Current IdleDisabled ASUS Performance EnhancementLeft SVM and IOMMU enabledInitially tested a Positive Curve Optimizer (+4) as recommended in the Linux documentation, but after additional testing returned Curve Optimizer to AutoI also removed my secondary GTX 980 during troubleshooting since it was no longer needed and simplified my PCIe configuration.Current StatusSince disabling DOCP/XMP, the server has been completely stable.I've successfully run:Multiple parity checksUnmanicSonarr importsNormal Docker workloadsGeneral heavy disk activitywithout any freezes, unexpected reboots, or Machine Check Exceptions.The latest diagnostics are also clean.ConclusionBased on my testing, the instability appears to have been related to the memory configuration (DOCP/XMP) rather than the SATA controller cards or motherboard.Many thanks to @JorgeB who took the time to help troubleshoot this.Original PostHi everyone,I'm hoping to get some help troubleshooting intermittent freezes on my Unraid server. I've already run through quite a few hardware and software tests and I'm trying to determine whether this is a controller issue, motherboard issue, or something else.HardwareUnraid (latest stable)ASUS ROG Strix B550-F GamingAMD Ryzen 9 5950X64 GB DDR4 (4x16 GB Corsair CMK32GX4M2A2666C16)NVIDIA RTX 5060Fractal Design Ion+ 660W Platinum PSUStorage10-drive array3 SSD cache pools4 HDDs connected directly to motherboard SATA6 HDDs connected to one PCIe SATA controller2 HDDs connected to a second PCIe SATA controller(The controller cards are generic PCIe SATA cards.)SymptomsThe issue almost always occurs during a parity check or other heavy disk activity.Examples:Parity check runningUnmanic processing a videoSonarr importing filesGeneral heavy disk I/OThe server will suddenly become unresponsive.Sometimes it appears completely locked:Web UI unavailableSMB unavailableDocker applications stop respondingOther times it eventually recovers after several minutes without rebooting.I also had parity checks abort unexpectedly.What I had already testedMemTest86 completed first pass with 0 errorsDisabled Global C-StatesPCIe slots forced to Gen3 (GPU left at Gen4)GPU replaced (no change)Temperatures normalDrives report no SMART errorsArray reports 0 disk errorsDocker and cache drives appeared healthyKernel MessagesDuring boot I consistently saw:mce: [Hardware Error]: Machine check events logged Modules loaded:edac_mce_amd edac_core No obvious SATA timeout or reset messages appeared in dmesg after recovery.Other observationsqBittorrent and Sonarr occasionally lost communication during the freeze but eventually reconnected.The issue was strongly correlated with heavy storage activity.The freezes became much easier to reproduce during parity checks.QuestionsDoes this sound more like failing PCIe SATA controller cards?Has anyone seen generic SATA controller cards cause temporary system-wide I/O hangs like this?Would moving all drives to a proper LSI/Broadcom HBA (9300-16i IT mode) be the next logical troubleshooting step?Is there anything specific in the diagnostics that I should be looking for?I've attached my diagnostics zip generated immediately after one of the freezes.Thanks in advance for any suggestions. viper-diagnostics-20260626-0713.zip Edited June 29Jun 29 by viper81 Resolution update
June 26Jun 26 Community Expert Solution Make sure this has been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173
June 26Jun 26 Author Thanks work on this shortly. I believe C-state is already disabled, but I will report back once I go through everything.
June 26Jun 26 Author Thanks for the recommendations. I went through my BIOS and made the following changes:Updated the BIOS to the latest version (already had)Set Power Supply Idle Control to Typical Current Idle.Left Global C-State Control disabled.Set all PCIe slots used for storage/controller cards to Gen 3 (left the GPU at Gen 4).Disabled DOCP/XMP and am running the memory at the default JEDEC speed (2666 MT/s).I'm going to leave everything else at stock for now, including PBO and Curve Optimizer, so I only change one variable at a time.I'll run the server under normal use along with parity checks over the next day or so and report back with the results. If it still locks up or logs additional Machine Check Events, I'll post the updated diagnostics and logs.
June 26Jun 26 Author 6 hours ago, JorgeB said:Make sure this has been taken care of: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-819173I made the recommended BIOS changes:Power Supply Idle Control = Typical Current IdleGlobal C-State disabledMemory running at JEDEC 2666 (DOCP disabled)PCIe storage slots forced to Gen3I ran another parity check and the server rebooted again.The new diagnostics now show:Previous system reset reason: an uncorrected error caused a data fabric sync flood eventfollowed by Machine Check Events.I don't see any obvious SATA timeout or disk I/O errors before the reboot. Does this point more toward the Ryzen platform itself (Infinity Fabric/CPU/motherboard) than the storage controllers? viper-diagnostics-20260626-1414.zip Edited June 26Jun 26 by viper81
June 27Jun 27 Community Expert Jun 26 13:53:05 VIPER kernel: x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood eventJun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: Machine check events loggedJun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: CPU 17: Machine Check: 0 Bank 5: bea0000000000108Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: TSC 0 ADDR 146ad2338960 MISC d01a000000000000 SYND 4d000000 IPID 500b000000000 Jun 26 13:53:05 VIPER kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1782499925 SOCKET 0 APIC 3 microcode a201030This is a protective mechanism: when an uncorrected, fatal error occurs within the internal data fabric or a core, the system immediately halts and resets to prevent data corruption from spreading across. Most common reasons would not be enough voltage to the CPU, overclocked RAM, bad CPU. Since you already disabled XMP, I would increase the CPU voltage a little, especially since that is a known issue with those CPUs and Linux, as mentioned in the FAQ
June 27Jun 27 Author @JorgeB Thanks again for all the help and for pointing me in the right direction.Based on the Ryzen/Linux stability information in the FAQ and Arch Linux documentation, I've made the following changes:Disabled DOCP/XMP and am running the memory at the default 2666 MHz.Disabled ASUS Performance Enhancement.Set Power Supply Idle Control to Typical Current Idle.Left Global C-State Control disabled.Left all PBO settings on Auto.(Latest Change) Set Curve Optimizer to All Cores, Positive, Magnitude 4, as recommended in the Linux documentation for Ryzen 5000 stability.I'm going to let the server run overnight with a parity check and normal Docker activity and see if it remains stable. If it crashes again, I'll post the new diagnostics and report back with the results.Thanks again for taking the time to help troubleshoot this. Edited June 27Jun 27 by viper81
June 29Jun 29 Author @JorgeB viper-diagnostics-20260629-0734.zipThanks again for all the help and guidance. I really appreciate you taking the time to look through my diagnostics and point me in the right direction.I've updated my original post with everything I changed and the resources I used so hopefully it can help someone else with a similar issue.So far the server has now been stable for about 32 hours with zero freezes, reboots, or Machine Check Errors. It has completed parity checks and handled normal workloads without any issues since disabling DOCP/XMP and other settings discussed in the link.I'll continue to let it run and will report back if anything changes, but so far it's looking very promising.Thanks again!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.