dak1220 Posted November 2, 2023 Share Posted November 2, 2023 (edited) I know this has been discussed a lot, but I just built my first Unraid box a month or so ago and I am already getting parity check errors. I bought 5 new 18TB Exos hard drives. 1 arrived to me with SMART errors already so I returned it and replaced it with a new one. Party was rebuilt after I inserted the new drive. Now I changed my settings to run a parity check at the start of each month. The first one finished last night with 528 errors. From other posts I have seen, that doesn't seem like a lot, but it is still concerning to me as this array has precious data. What can I do to fix these errors? Should I run a correcting check and see if more errors reappear Dec. 1? nas-diagnostics-20231102-1155.zip Edited November 2, 2023 by dak1220 Quote Link to comment
trurl Posted November 2, 2023 Share Posted November 2, 2023 Go to memtest86.com and get that memtest since you have ECC then test your RAM. Quote Link to comment
dak1220 Posted November 2, 2023 Author Share Posted November 2, 2023 33 minutes ago, trurl said: Go to memtest86.com and get that memtest since you have ECC then test your RAM. I have this running now. From what I have seen, with 64GB this may take some time. I will come back when I have something from it. Thank you for your help. Quote Link to comment
dak1220 Posted November 3, 2023 Author Share Posted November 3, 2023 20 hours ago, trurl said: Go to memtest86.com and get that memtest since you have ECC then test your RAM. So the memtest completed with 0 errors. Is it possible the original failing drive caused some of this? If I recall, one of the issues were some unreadable sectors. Maybe bad data got written to parity or something? If there is anything else to try please let me know. I would like to get these errors to 0. Quote Link to comment
JorgeB Posted November 3, 2023 Share Posted November 3, 2023 Log shows constant memory errors being corrected, try with just one stick of RAM, if the same try the other one. Quote Link to comment
dak1220 Posted November 3, 2023 Author Share Posted November 3, 2023 8 minutes ago, JorgeB said: Log shows constant memory errors being corrected, try with just one stick of RAM, if the same try the other one. Try the memtest again? Or do you mean the parity check? Quote Link to comment
JorgeB Posted November 3, 2023 Share Posted November 3, 2023 Just use the server normally, you can run a parity check, and check if those errors are still being logged Quote Link to comment
dak1220 Posted November 6, 2023 Author Share Posted November 6, 2023 On 11/3/2023 at 10:59 AM, JorgeB said: Just use the server normally, you can run a parity check, and check if those errors are still being logged It looks like both sticks of memory are giving that ECC memory error in the logs. I have another set of RAM that isn't ECC. I assume I should try it? Memory errors can cause the parity check to show errors? Quote Link to comment
JorgeB Posted November 6, 2023 Share Posted November 6, 2023 If the errors are corrected they should not cause sync issues, but that's not normal, unclear to me as well how well ECC RAM is supported with Ryzen and any specific board, try the other RAM. Quote Link to comment
dak1220 Posted November 7, 2023 Author Share Posted November 7, 2023 23 hours ago, JorgeB said: If the errors are corrected they should not cause sync issues, but that's not normal, unclear to me as well how well ECC RAM is supported with Ryzen and any specific board, try the other RAM. So I installed the other RAM. It seems like the memory errors are missing from the logs now, but I still got the 528 errors on a parity check I ran overnight. I have attached new diagnostics. nas-diagnostics-20231107-0825.zip Quote Link to comment
Solution JorgeB Posted November 7, 2023 Solution Share Posted November 7, 2023 Run a correcting check, then a non correcting one without rebooting, if the 2nd one finds new errors post new diags. Quote Link to comment
dak1220 Posted November 7, 2023 Author Share Posted November 7, 2023 17 minutes ago, JorgeB said: Run a correcting check, then a non correcting one without rebooting, if the 2nd one finds new errors post new diags. I am running a correcting check now. I assume there might be a small risk of files being corrupted? My parity checks take just over 24 hours, so 2 in a row will take some time. I can report back after both checks have been run. Thank you for your help. Quote Link to comment
JorgeB Posted November 7, 2023 Share Posted November 7, 2023 24 minutes ago, dak1220 said: I assume there might be a small risk of files being corrupted? There is a small change some files could be already corrupt, but the previous sync finding the same errors as before is a good sign, and most likely parity is just out of sync. Quote Link to comment
dak1220 Posted November 9, 2023 Author Share Posted November 9, 2023 On 11/7/2023 at 9:42 AM, JorgeB said: There is a small change some files could be already corrupt, but the previous sync finding the same errors as before is a good sign, and most likely parity is just out of sync. So the first parity check finished and corrected the 528 errors I have been having. I ran a second non-correcting check and it just finished a few minutes ago and found 0 errors. While the change in RAM didn't seem to make a difference, I assume I should probably not use the RAM that was causing errors anymore? Are there any further steps I should take to make sure this is resolved? Quote Link to comment
JorgeB Posted November 9, 2023 Share Posted November 9, 2023 24 minutes ago, dak1220 said: Are there any further steps I should take to make sure this is resolved? I would way for the next scheduled check, and if still no errors consider it resolved. Quote Link to comment
dak1220 Posted November 9, 2023 Author Share Posted November 9, 2023 2 minutes ago, JorgeB said: I would way for the next scheduled check, and if still no errors consider it resolved. My next scheduled check is the 1st of next month. If I have no errors then I will consider it fully resolved. For now though I will mark your answer as the solution and reopen something later on if necessary. Thank your for your help. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.