October 29, 20241 yr Hey there, I've been reading various posts on this forum (and reddit) for the past couple weeks. I set up a media server using a combination of handed down parts from my gaming rig and some new "home server" type components, and decided to use Unraid for the backend. The overall setup has been relatively painless, and I have a VM running over top to host the usual media server things. My issue (as you probably already read) has been rather elusive though. Even during the very first setup, as I was moving files onto my new array and allowing parity to sync, the system randomly would hard reset. At first I assumed it was due to one of my two parity drives showing SMART errors (nothing too bad, just some sectors that wouldn't write). I replaced the drive and tried to build parity again, only to have the same issue, at some point during the partiy sync, the system just resets. Given that the power remains on, and drives remain spinning, I've at least concluded it's not the power supply. I suspected RAM, but memtest returned no issues. I turned off DOCP, and disabled some power saving features on the CPU, and still have the issue. I continued to try to troubleshoot why the system resets only during parity calculations. I tried the new config option a bunch to do different combinations of my 2 parity drives, as well as having maintenance mode enabled so that it was the only thing being done on the system. I even upgraded to the version 7 beta to see if it was any better and I still have the same issue. I've had the system running for a few days at times without any parity drives, hosting files, doing transcodes and downloads with the VM, really hammering the CPU, and have confirmed the system is stable when not calculating parity (it even remains stable if the parity sync is paused for multiple days). I have to imagine that this isn't normal, and likely could still be attributed to some sort of issue with the CPU or Motherboard, but since the system hard resets it never gets a chance to write anything to the logs (even with mirror syslog enabled). I've attached the customary diagnostic files for what their worth, but my main hope is to get some advice of other things to try or check before I consider this motherboard and CPU to be a lost cause and purchase replacements. I'm going to try to old 'tail syslog in an ssh terminal', so when I reproduce the issue again I'll add a reply with that particular log . Thanks in advance for your time, and advice, kind denizens of the Unraid forums. anton-diagnostics-20241029-1231.zip Edited October 29, 20241 yr by _hollish Added note about next step for data gathering.
October 29, 20241 yr Community Expert 34 minutes ago, _hollish said: the system randomly would hard reset. This is almost always a hardware problem, since memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.
October 29, 20241 yr Author Thanks for the tip, I'll give that a try. Do you know if there's a way to get more verbose logging from the Parity Sync process? I'd love to see if an actual error occurs before or during the hard reset, or if it maybe triggers on a specific part of the process. It would just simplify the hardware troubleshooting if I could get a better understanding of what is happening when the system fails.
October 29, 20241 yr Community Expert This type of hardware issues don't usually leave anything logged.
October 29, 20241 yr Author Fair enough. Thanks for the swift response, Jorge. I guess I'm fiddling with RAM sticks after work today.
October 29, 20241 yr Author So, I did get something interesting with running an ssh'd tail on the log file. Roughly 2 minutes before the system came back online from a hard reset, I'm seeing the following get added to the log (which does not appear in syslog_previous after the reboot). I have no clue how to interpret this, but my first instinct is to suspect the SATA controller on the motherboard. Perhaps someone who actually understands this output would be willing to explain it to me? I still intend to try RAM stick roulette after work, but I figured this was worth adding to the thread. Oct 29 13:29:03 anton sSMTP[90956]: <E-mail telling me the Parity Sync has started> Oct 29 15:49:51 anton kernel: ata4.00: exception Emask 0x0 SAct 0xfffffc3f SErr 0x0 action 0x6 frozen Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/98:00:78:00:22/01:00:e8:00:00/40 tag 0 ncq dma 208896 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/c8:08:10:02:22/00:00:e8:00:00/40 tag 1 ncq dma 102400 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/b0:10:d8:02:22/01:00:e8:00:00/40 tag 2 ncq dma 221184 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/c8:18:88:04:22/01:00:e8:00:00/40 tag 3 ncq dma 233472 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/40:20:50:06:22/01:00:e8:00:00/40 tag 4 ncq dma 163840 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/88:28:90:07:22/00:00:e8:00:00/40 tag 5 ncq dma 69632 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/68:50:18:e8:21/01:00:e8:00:00/40 tag 10 ncq dma 184320 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/18:58:c0:f0:21/01:00:e8:00:00/40 tag 11 ncq dma 143360 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/28:60:d8:f1:21/01:00:e8:00:00/40 tag 12 ncq dma 151552 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/70:68:00:f3:21/01:00:e8:00:00/40 tag 13 ncq dma 188416 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/b0:70:70:f4:21/00:00:e8:00:00/40 tag 14 ncq dma 90112 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/48:78:20:f5:21/01:00:e8:00:00/40 tag 15 ncq dma 167936 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/d8:80:68:f6:21/00:00:e8:00:00/40 tag 16 ncq dma 110592 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/10:88:40:f7:21/01:00:e8:00:00/40 tag 17 ncq dma 139264 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/48:90:50:f8:21/01:00:e8:00:00/40 tag 18 ncq dma 167936 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/40:98:98:f9:21/00:00:e8:00:00/40 tag 19 ncq dma 32768 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/68:a0:d8:f9:21/01:00:e8:00:00/40 tag 20 ncq dma 184320 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/50:a8:40:fb:21/01:00:e8:00:00/40 tag 21 ncq dma 172032 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/f0:b0:90:fc:21/00:00:e8:00:00/40 tag 22 ncq dma 122880 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/98:b8:80:fd:21/01:00:e8:00:00/40 tag 23 ncq dma 208896 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/c0:c0:80:e9:21/00:00:e8:00:00/40 tag 24 ncq dma 98304 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/b8:c8:40:ea:21/01:00:e8:00:00/40 tag 25 ncq dma 225280 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/b8:d0:f8:eb:21/00:00:e8:00:00/40 tag 26 ncq dma 94208 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/10:d8:b0:ec:21/01:00:e8:00:00/40 tag 27 ncq dma 139264 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/18:e0:c0:ed:21/01:00:e8:00:00/40 tag 28 ncq dma 143360 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/e8:e8:d8:ee:21/00:00:e8:00:00/40 tag 29 ncq dma 118784 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/00:f0:c0:ef:21/01:00:e8:00:00/40 tag 30 ncq dma 131072 in Oct 29 15:49:51 anton kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4.00: failed command: READ FPDMA QUEUED Oct 29 15:49:51 anton kernel: ata4.00: cmd 60/60:f8:18:ff:21/01:00:e8:00:00/40 tag 31 ncq dma 180224 in Oct 29 15:49:51 anton kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 29 15:49:51 anton kernel: ata4.00: status: { DRDY } Oct 29 15:49:51 anton kernel: ata4: hard resetting link Oct 29 15:49:52 anton kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Oct 29 15:49:52 anton kernel: ata4.00: configured for UDMA/133 Oct 29 15:49:52 anton kernel: ata4: EH complete Edited October 29, 20241 yr by _hollish Fixing some wording.
October 30, 20241 yr Author Well, I've tried all 4 of my RAM sticks in 3 different slots over the 2 available channels. All 4 reproduce the issue, so I guess it's got to be motherboard or processor. Considering the processor is new and the motherboard was handed down from my gaming rig, I guess I'll try a new motherboard next.
November 1, 20241 yr Author Solution Just in case anyone finds this thread while dealing with similar issues, a new motherboard fixed the issue for me. It took a while, but unraid was finally able to finish the parity calculations. I suspect the SATA controller on the motherboard was the issue to some extent, so it's possible that a different controller (like an LSI card) would resolve the issue with less cost. I wound up having to order one anyways, since my new motherboard doesn't allow use of all 6 SATA ports if you're running 2 NVMe drives as well. I probably won't try the old motherboard again though, since I've already put enough time and money into troubleshooting the issue.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.