July 10, 20205 yr Hello, I have been having this issue for about a month now and I've tried everything I can think of to fix it, but here's my problem: I have a parity check (no correct) that runs every morning at 1am which takes about ~9 hours to complete. It seems like I get about 1-2 parity sync errors once in a while which I know is not good but I've looked all over the forums trying to find ways to figure out what the problem is with no avail. However yesterday I got a party check back that had 200 sync errors which is pretty bad so here I am. I also have not ran any parity sync with write corrections enabled. I know this is probably a bad assumption but I've been assuming that the parity sync errors have been "read errors" and the parity is actually correct which could explain why it returns no parity errors most days even with automatic correction turned off. Things I've tried: - Ran memtest for 24 hours (no errors) - Disabled all plugins - File system check on all disks (no issues) - Extended smart check on all disks (no issues) Is there any way I can figure out which disk is having problems? I have no read/write errors on any of my disks so I wouldn't even know which one to swap out. Is this something I shouldn't worry about? Thanks for reading, any advise/help would be very useful as I'm pretty new to unraid. Here is my syslog: syslog.txt Here is the result of my party checks Here is the current status of my disks (system uptime is 15 days, 10 hours):
July 10, 20205 yr Community Expert Please run two consecutive checks without rebooting (so we can compare the errors) and post the complete diagnostics: Settings -> Diagnostics
July 10, 20205 yr Author @johnnie.black There is a parity check running right now that will be finished soon so I will grab the diagnostics from that and then do another parity check right after. These should be non-correcting checks right? If these 2 consecutive tests return with no sync errors should I keep running until I get a run with errors? (the sync errors do not appear after every consecutive run). Thanks for the quick reply.
July 10, 20205 yr Community Expert 19 minutes ago, John Detter said: These should be non-correcting checks right? For now yes. 20 minutes ago, John Detter said: If these 2 consecutive tests return with no sync errors should I keep running until I get a run with errors? Yes, would be good to compare if the errors are in the same sectors or not.
July 10, 20205 yr Author (posting parity check result 1/2) The day got away from me a bit, but here are the diagnostics from the run that just finished: unraid-server-diagnostics-20200710-1418.zip There were 2000+ errors in my latest run: I have another run going right now. I'm a bit surprised that there were so many sync errors, I've never had this many before. I'm running another non-correcting check now, I will try to post that later today when it finishes.
July 11, 20205 yr Author Here is the result of the second run: unraid-server-diagnostics-20200711-0629.zip Any help would be much appreciated, I'm really out of things to try.
July 11, 20205 yr Community Expert On 7/10/2020 at 12:27 PM, John Detter said: Ran memtest for 24 hours (no errors) First thing would be this again, I assume you've run before when it was detecting 1 or 2 errors, now it appears to be worse and the errors are on different sectors which is consistent with a RAM issue, with more errors it might be easier to detect any RAM issue.
July 11, 20205 yr Author Alright I will start that now, I will let it run for 24 hours and then post the results tomorrow.
July 12, 20205 yr Author I ran memtest86 for over 24 hours and no issues, its still running now and I will probably keep it running until tomorrow. The only thing left I could think of is maybe swapping out sata cables? Also this is non-ECC ram which I know is not ideal, but this is the second time I've run this test for over 24 hours and never had an issue. I agree that it would make a lot of sense if it was a ram issue but it seems like memtest can't reproduce the issue, is there anything else I should try before buying new ram?
July 12, 20205 yr Remove half the sticks, see if the error count changes. Maybe try with just the crucial memory. Edited July 12, 20205 yr by Spies
July 13, 20205 yr Community Expert If it's not RAM the board/controller would be my next suspect, but not so easy to test unless you have another board/CPU combo you could test with.
July 13, 20205 yr Author I actually do have another motherboard I can plug everything into, do I just move over all of the ram + disks + USB drive to the new board and then run a parity sync?
July 13, 20205 yr Community Expert 5 minutes ago, John Detter said: I can plug everything into, do I just move over all of the ram + disks + USB drive to the new board and then run a parity sync? Yep.
July 13, 20205 yr Author Small update: I did still get errors after removing half of the ram. I've transplanted the system onto a different mobo/cpu combo that I had and so far it actually has no sync errors, I will update again later today/tomorrow with the results of the parity check that's running right now. However if I do get a small number of parity sync errors it is possible that those are legitimate errors right? I suppose I need to do 2 consecutive parity checks and see if the errors are consistent? (either no errors or same amount of errors with same sectors).
July 14, 20205 yr Community Expert 9 hours ago, John Detter said: I suppose I need to do 2 consecutive parity checks and see if the errors are consistent? Yes, or run a correcting check and any check after that should always find 0 errors.
July 15, 20205 yr Author First parity sync resulted in 0 sync errors, another parity sync is in progress right now but it looks like there 3 errors so far so it seems like the problem hasn't gone away. I guess it has to be the ram? The only thing I moved to the new motherboard is the disks + ram + psu. I'm just going to run a parity sync with the kingston sticks in to see if I can get consistent results, I believe I was still getting sync errors with the crucial sticks.
July 15, 20205 yr Community Expert 6 hours ago, John Detter said: I guess it has to be the ram? It would still be my first guess, could also be a disk, but that's not very likely and hopefully it's RAM since it would be easier to find/fix.
July 16, 20205 yr Author After taking out the crucial sticks it seems like I keep getting 1 sync error on the same sector, so now I'm running a correcting check and then I'll run another parity check to see if I get 0 sync errors. It does seem to be one of the crucial sticks causing the issue. I'm about 90% sure the problem is solved now, just need to buy some more ram. Thank you everyone for the help! :)
Archived
This topic is now archived and is closed to further replies.