grapefruitevening Posted April 16, 2023 Share Posted April 16, 2023 (edited) I'm quite new to unRAID so any help you can offer would be greatly appreciated. Logs are attached. My first quarterly parity check detected some errors (3). Things had been running smoothly until that point with mostly new hardware. A faulty UPS meant some unclean shutdowns so I thought that may have been the cause. I ran a correcting parity check which picked up five errors. Next I ran another non-correcting check which has picked up four errors. The sectors seem to be the same? Hard drives are connected directly to the motherboard. I have - run extended SMART and all disks passed - currently running memtest86+ but clear so far after a few hours - switched the SATA cables to new ones Assuming the memtest is clear overnight, what should I do next? Edited April 28, 2023 by grapefruitevening Quote Link to comment
Solution JorgeB Posted April 16, 2023 Solution Share Posted April 16, 2023 Apr 12 06:10:23 tower kernel: md: recovery thread: P incorrect, sector=11515649200 Apr 12 06:24:24 tower kernel: md: recovery thread: P incorrect, sector=11797234176 Apr 13 07:55:05 tower kernel: md: recovery thread: P incorrect, sector=19001710760 Apr 13 21:12:23 tower kernel: md: recovery thread: P corrected, sector=9239094240 Apr 13 23:04:06 tower kernel: md: recovery thread: P corrected, sector=11515649200 Apr 14 00:26:27 tower kernel: md: recovery thread: P corrected, sector=13144112592 Apr 14 01:31:14 tower kernel: md: recovery thread: P corrected, sector=14377903704 Apr 14 01:38:38 tower kernel: md: recovery thread: P corrected, sector=14514929456 Apr 14 22:04:40 tower kernel: md: recovery thread: P incorrect, sector=9239094240 Apr 15 01:18:45 tower kernel: md: recovery thread: P incorrect, sector=13144112592 Apr 15 02:58:28 tower kernel: md: recovery thread: P incorrect, sector=14377903704 Apr 15 04:09:49 tower kernel: md: recovery thread: P incorrect, sector=14514929456 Not all sectors are the same, some might have been wrongly correctly, hence why they were detected again, this suggests a hardware issue, most commonly RAM related Quote Link to comment
grapefruitevening Posted April 16, 2023 Author Share Posted April 16, 2023 9 hours ago, JorgeB said: Not all sectors are the same, some might have been wrongly correctly, hence why they were detected again, this suggests a hardware issue, most commonly RAM related Is this the sort of error memtest should pick up? So far it hasn't picked up anything but I'll leave it running for 24 hours. Is there something else I can do to nail down the cause? Is the best approach from here to replace the RAM, run a correcting check and then a non-correcting check to make sure the problem is solved? Quote Link to comment
JorgeB Posted April 17, 2023 Share Posted April 17, 2023 9 hours ago, grapefruitevening said: Is this the sort of error memtest should pick up? Usually yes but it's not a guarantee, if it doesn't remove one of the RAM sticks and run a couple of checks, if errors are not consistent try the other one. Quote Link to comment
grapefruitevening Posted April 17, 2023 Author Share Posted April 17, 2023 20 hours ago, JorgeB said: Apr 14 22:04:40 tower kernel: md: recovery thread: P incorrect, sector=9239094240 Apr 15 01:18:45 tower kernel: md: recovery thread: P incorrect, sector=13144112592 Apr 15 02:58:28 tower kernel: md: recovery thread: P incorrect, sector=14377903704 Apr 15 04:09:49 tower kernel: md: recovery thread: P incorrect, sector=14514929456 1 hour ago, JorgeB said: Usually yes but it's not a guarantee, if it doesn't remove one of the RAM sticks and run a couple of checks, if errors are not consistent try the other one. I ran memtest for more than 24 hours and no errors. I've swapped out the RAM for a spare set I had and running a new check. What we hope to see is the same four sectors from the last check? Quote Link to comment
JorgeB Posted April 17, 2023 Share Posted April 17, 2023 First check will likely found some errors, if it's correct 2nd check should find 0, if it's non correct 2nd check should find the same ones. Quote Link to comment
grapefruitevening Posted April 20, 2023 Author Share Posted April 20, 2023 On 4/17/2023 at 9:57 PM, JorgeB said: First check will likely found some errors, if it's correct 2nd check should find 0, if it's non correct 2nd check should find the same ones. 1st non-correcting check detected the same 4 errors. 2nd correcting check corrected the same 4 errors. 3rd non-correcting check detected 0 errors. So it seems like the RAM was the issue. I'll run another check in a week or two to confirm. Thank you very much for your help tracking down the problem! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.