December 3, 2025Dec 3 I've been having an off and on issue with my main UnRAID server where I will get a few (1-2) parity errors randomly. They are never the same sector and the issue can sometimes not occur for months at a time:As you can see i had some issues at the beginning of the year, which then disappeared for months, resurfaced for a single month, disappeared for five more months, and has now resurfaced again.Back in March when these errors first started appearing I tested the RAM, and it passed an overnight memtest. I understand memtest is never definitive unless it actually finds errors so it may still be the RAM. I plan on retesting these in a different machine and borrowing a new kit from someone to test in my server (since DRAM prices are insane right now).While I'm working the RAM angle, what other things should I be checking?No unclean shutdowns and the server is on a UPS of sufficient size.I don't see any reported sector issues on any disks. I have a single disk that has some historical UDMA CRC errors but I don't recall seeing any recently. Nothing that to my eyes would suggest I need to swap about sata/power cables for the disks or that one of the disks is failing.Power supply should be of sufficient size, IIRC its either 850 or 1000W.I'm using Dell H310 HBAs with SAS style split out SATA connectors.I don't use ZFS or the file integrity plugin so currently no way to know where the troublesome data is stored.node-diagnostics-20251203-0817.zip Edited December 3, 2025Dec 3 by weirdcrap
December 3, 2025Dec 3 Solution RAM would be the main suspect. If you have multiple DIMMs, try using just one, then try with a different one. Note that the first check can still find errors even if the issue has been resolved, so take notice of the 2nd and subsequent ones only.
December 3, 2025Dec 3 Author 1 hour ago, JorgeB said:RAM would be the main suspect. If you have multiple DIMMs, try using just one, then try with a different one. Note that the first check can still find errors even if the issue has been resolved, so take notice of the 2nd and subsequent ones only.So with the new kit in I need to run two checks to be sure the error is resolved? I'm going to test the RAM in another machine to rule out contributing factors in my server.
December 3, 2025Dec 3 1 hour ago, weirdcrap said:So with the new kit in I need to run two checks to be sure the error is resolved?Yes, since the first run can still find actual error(s).
December 7, 2025Dec 7 Author Yeah so despite my RAM continuing to pass multiple memtest attempts all of my parity errors go away with different RAM installed and two back to back parity tests.. Now I've got to figure out how to convince corsair to replace these sticks despite their being "nothing wrong with them".EDIT: I was expecting some pushback from corsair on the RMA but they were surprisingly easy going about it even though I had no concrete proof the ram was defective, just circumstantial evidence that the problem goes away with different ram. Should have a new kit on the way in a few weeks while I limp by with a borrowed 16GB kit. Edited December 10, 2025Dec 10 by weirdcrap
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.