John Detter Posted July 10, 2020 Posted July 10, 2020 Hello, I have been having this issue for about a month now and I've tried everything I can think of to fix it, but here's my problem: I have a parity check (no correct) that runs every morning at 1am which takes about ~9 hours to complete. It seems like I get about 1-2 parity sync errors once in a while which I know is not good but I've looked all over the forums trying to find ways to figure out what the problem is with no avail. However yesterday I got a party check back that had 200 sync errors which is pretty bad so here I am. I also have not ran any parity sync with write corrections enabled. I know this is probably a bad assumption but I've been assuming that the parity sync errors have been "read errors" and the parity is actually correct which could explain why it returns no parity errors most days even with automatic correction turned off. Things I've tried: - Ran memtest for 24 hours (no errors) - Disabled all plugins - File system check on all disks (no issues) - Extended smart check on all disks (no issues) Is there any way I can figure out which disk is having problems? I have no read/write errors on any of my disks so I wouldn't even know which one to swap out. Is this something I shouldn't worry about? Thanks for reading, any advise/help would be very useful as I'm pretty new to unraid. Here is my syslog: syslog.txt Here is the result of my party checks Here is the current status of my disks (system uptime is 15 days, 10 hours): Quote
JorgeB Posted July 10, 2020 Posted July 10, 2020 Please run two consecutive checks without rebooting (so we can compare the errors) and post the complete diagnostics: Settings -> Diagnostics Quote
John Detter Posted July 10, 2020 Author Posted July 10, 2020 @johnnie.black There is a parity check running right now that will be finished soon so I will grab the diagnostics from that and then do another parity check right after. These should be non-correcting checks right? If these 2 consecutive tests return with no sync errors should I keep running until I get a run with errors? (the sync errors do not appear after every consecutive run). Thanks for the quick reply. Quote
JorgeB Posted July 10, 2020 Posted July 10, 2020 19 minutes ago, John Detter said: These should be non-correcting checks right? For now yes. 20 minutes ago, John Detter said: If these 2 consecutive tests return with no sync errors should I keep running until I get a run with errors? Yes, would be good to compare if the errors are in the same sectors or not. Quote
John Detter Posted July 10, 2020 Author Posted July 10, 2020 (posting parity check result 1/2) The day got away from me a bit, but here are the diagnostics from the run that just finished: unraid-server-diagnostics-20200710-1418.zip There were 2000+ errors in my latest run: I have another run going right now. I'm a bit surprised that there were so many sync errors, I've never had this many before. I'm running another non-correcting check now, I will try to post that later today when it finishes. Quote
John Detter Posted July 11, 2020 Author Posted July 11, 2020 Here is the result of the second run: unraid-server-diagnostics-20200711-0629.zip Any help would be much appreciated, I'm really out of things to try. Quote
JorgeB Posted July 11, 2020 Posted July 11, 2020 On 7/10/2020 at 12:27 PM, John Detter said: Ran memtest for 24 hours (no errors) First thing would be this again, I assume you've run before when it was detecting 1 or 2 errors, now it appears to be worse and the errors are on different sectors which is consistent with a RAM issue, with more errors it might be easier to detect any RAM issue. Quote
John Detter Posted July 11, 2020 Author Posted July 11, 2020 Alright I will start that now, I will let it run for 24 hours and then post the results tomorrow. Quote
John Detter Posted July 12, 2020 Author Posted July 12, 2020 I ran memtest86 for over 24 hours and no issues, its still running now and I will probably keep it running until tomorrow. The only thing left I could think of is maybe swapping out sata cables? Also this is non-ECC ram which I know is not ideal, but this is the second time I've run this test for over 24 hours and never had an issue. I agree that it would make a lot of sense if it was a ram issue but it seems like memtest can't reproduce the issue, is there anything else I should try before buying new ram? Quote
Spies Posted July 12, 2020 Posted July 12, 2020 (edited) Remove half the sticks, see if the error count changes. Maybe try with just the crucial memory. Edited July 12, 2020 by Spies Quote
John Detter Posted July 12, 2020 Author Posted July 12, 2020 @Spies Good point I will give that a try Quote
JorgeB Posted July 13, 2020 Posted July 13, 2020 If it's not RAM the board/controller would be my next suspect, but not so easy to test unless you have another board/CPU combo you could test with. Quote
John Detter Posted July 13, 2020 Author Posted July 13, 2020 I actually do have another motherboard I can plug everything into, do I just move over all of the ram + disks + USB drive to the new board and then run a parity sync? Quote
JorgeB Posted July 13, 2020 Posted July 13, 2020 5 minutes ago, John Detter said: I can plug everything into, do I just move over all of the ram + disks + USB drive to the new board and then run a parity sync? Yep. Quote
John Detter Posted July 13, 2020 Author Posted July 13, 2020 Ok, I will give that a try and post the results later today. Quote
John Detter Posted July 13, 2020 Author Posted July 13, 2020 Small update: I did still get errors after removing half of the ram. I've transplanted the system onto a different mobo/cpu combo that I had and so far it actually has no sync errors, I will update again later today/tomorrow with the results of the parity check that's running right now. However if I do get a small number of parity sync errors it is possible that those are legitimate errors right? I suppose I need to do 2 consecutive parity checks and see if the errors are consistent? (either no errors or same amount of errors with same sectors). Quote
JorgeB Posted July 14, 2020 Posted July 14, 2020 9 hours ago, John Detter said: I suppose I need to do 2 consecutive parity checks and see if the errors are consistent? Yes, or run a correcting check and any check after that should always find 0 errors. Quote
John Detter Posted July 15, 2020 Author Posted July 15, 2020 First parity sync resulted in 0 sync errors, another parity sync is in progress right now but it looks like there 3 errors so far so it seems like the problem hasn't gone away. I guess it has to be the ram? The only thing I moved to the new motherboard is the disks + ram + psu. I'm just going to run a parity sync with the kingston sticks in to see if I can get consistent results, I believe I was still getting sync errors with the crucial sticks. Quote
JorgeB Posted July 15, 2020 Posted July 15, 2020 6 hours ago, John Detter said: I guess it has to be the ram? It would still be my first guess, could also be a disk, but that's not very likely and hopefully it's RAM since it would be easier to find/fix. Quote
John Detter Posted July 16, 2020 Author Posted July 16, 2020 After taking out the crucial sticks it seems like I keep getting 1 sync error on the same sector, so now I'm running a correcting check and then I'll run another parity check to see if I get 0 sync errors. It does seem to be one of the crucial sticks causing the issue. I'm about 90% sure the problem is solved now, just need to buy some more ram. Thank you everyone for the help! :) 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.