rxauin Posted October 7, 2017 Share Posted October 7, 2017 So I have this problem with constant parity errors that started up recently as seen in the screenshot. The 'write corrections to parity' is always selected. Between the 10/3 and 10/5 dates, i rebooted the server and ran extended smart check on all the drives, no issues reported. How can I find out what's wrong? I havent added/removed any files to the array since the start of this issue. In addition, none of my VM/dockers use write-storage on this array, so this array has been quiet save manual smart tests and parity checks. Prior to 10/1 I had the disks configured to spin down when unused, I took that off on 10/3 to see if there was any difference. unRAID Version 6.3.5 2017-05-26 Quote Link to comment
Frank1940 Posted October 7, 2017 Share Posted October 7, 2017 Start by posting up your diagnostics file. 'Tools' >>> 'Diagnostics' Quote Link to comment
rxauin Posted October 7, 2017 Author Share Posted October 7, 2017 Diagnostic files mammon-diagnostics-20171007-0857.zip Quote Link to comment
JorgeB Posted October 7, 2017 Share Posted October 7, 2017 Nothing really jumps out, RAM is ECC, so maybe something wrong with the board or the onboard controller, do you have an HBA you could test with? Quote Link to comment
rxauin Posted October 7, 2017 Author Share Posted October 7, 2017 I dont, just using onboard connections. It shouldnt be ECC memory, the memory (https://www.amazon.com/gp/product/B019FRCQAK/) isnt ECC. Quote Link to comment
JorgeB Posted October 7, 2017 Share Posted October 7, 2017 39 minutes ago, rxauin said: It shouldnt be ECC memory Are you sure? According to the board specs it requires ECC RAM, if it's not then it would be the prime suspect. Quote Link to comment
rxauin Posted October 8, 2017 Author Share Posted October 8, 2017 I'm certain the memory I have is not ECC. I've matched the model number to crucial's website (http://www.crucial.com/usa/en/ct2k16g4dfd824a) and it clearly states it's NON-ECC. This system has been running for 4 months, if I had the wrong memory wouldnt I have noticed problems earlier? Quote Link to comment
JorgeB Posted October 8, 2017 Share Posted October 8, 2017 Then the manual is wrong, since it says only ECC memory is supported, that's normal for a server board and also kind of a waste to not be using it, you already have a server board and a Xeon, there's no significant price difference between Non-ECC and unbuffered ECC. 3 hours ago, rxauin said: if I had the wrong memory wouldnt I have noticed problems earlier? If it's the RAM it's not because it's wrong, but because it's faulty, try with just 1 DIMM at a time, if you still get errors with either one then it's probably not the RAM, since it's unlikely to have two bad DIMMs, but don't forget you need at least two parity checks with each DIMM, it's normal for the 1st one to still find sync errors, but there should be none on the 2nd one if it's fixed. Quote Link to comment
rxauin Posted October 8, 2017 Author Share Posted October 8, 2017 A scheduled parity check kicked off last night and reported 0 errors. I'm going to go ahead and stick some ECC memory in it anyways and see where that takes me. Quote Link to comment
rxauin Posted October 12, 2017 Author Share Posted October 12, 2017 (edited) To put some closure on this thread. As I was replacing the memory, I noticed one of the clips wasnt secure, it was probably loose and got jostled while moving it to the basement rack. Either way, I replaced the memory with ECC and the system seems a lot better, I used to get random segment faults in php in my logs, and that hasnt happened since, also havent seen any more parity errors. Edited October 12, 2017 by rxauin Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.