Parity Issue not correcting

rxauin · October 7, 2017

So I have this problem with constant parity errors that started up recently as seen in the screenshot.

The 'write corrections to parity' is always selected.

Between the 10/3 and 10/5 dates, i rebooted the server and ran extended smart check on all the drives, no issues reported.

How can I find out what's wrong?

I havent added/removed any files to the array since the start of this issue. In addition, none of my VM/dockers use write-storage on this array, so this array has been quiet save manual smart tests and parity checks. Prior to 10/1 I had the disks configured to spin down when unused, I took that off on 10/3 to see if there was any difference.

unRAID Version 6.3.5 2017-05-26

Frank1940 · October 7, 2017

Start by posting up your diagnostics file. 'Tools' >>> 'Diagnostics'

rxauin · October 7, 2017

Diagnostic files

mammon-diagnostics-20171007-0857.zip

JorgeB · October 7, 2017

Nothing really jumps out, RAM is ECC, so maybe something wrong with the board or the onboard controller, do you have an HBA you could test with?

rxauin · October 7, 2017

I dont, just using onboard connections.

It shouldnt be ECC memory, the memory (https://www.amazon.com/gp/product/B019FRCQAK/) isnt ECC.

JorgeB · October 7, 2017

39 minutes ago, rxauin said:

It shouldnt be ECC memory

Are you sure? According to the board specs it requires ECC RAM, if it's not then it would be the prime suspect.

rxauin · October 8, 2017

I'm certain the memory I have is not ECC. I've matched the model number to crucial's website (http://www.crucial.com/usa/en/ct2k16g4dfd824a) and it clearly states it's NON-ECC.

This system has been running for 4 months, if I had the wrong memory wouldnt I have noticed problems earlier?

JorgeB · October 8, 2017

Then the manual is wrong, since it says only ECC memory is supported, that's normal for a server board and also kind of a waste to not be using it, you already have a server board and a Xeon, there's no significant price difference between Non-ECC and unbuffered ECC.

3 hours ago, rxauin said:

if I had the wrong memory wouldnt I have noticed problems earlier?

If it's the RAM it's not because it's wrong, but because it's faulty, try with just 1 DIMM at a time, if you still get errors with either one then it's probably not the RAM, since it's unlikely to have two bad DIMMs, but don't forget you need at least two parity checks with each DIMM, it's normal for the 1st one to still find sync errors, but there should be none on the 2nd one if it's fixed.

rxauin · October 8, 2017

A scheduled parity check kicked off last night and reported 0 errors.

I'm going to go ahead and stick some ECC memory in it anyways and see where that takes me.

rxauin · October 12, 2017

To put some closure on this thread.

As I was replacing the memory, I noticed one of the clips wasnt secure, it was probably loose and got jostled while moving it to the basement rack. Either way, I replaced the memory with ECC and the system seems a lot better, I used to get random segment faults in php in my logs, and that hasnt happened since, also havent seen any more parity errors.

Edited October 12, 2017 by rxauin

Parity Issue not correcting

Recommended Posts

rxauin

Link to comment

Frank1940

Link to comment

rxauin

Link to comment

JorgeB

Link to comment

rxauin

Link to comment

JorgeB

Link to comment

rxauin

Link to comment

JorgeB

Link to comment

rxauin

Link to comment

rxauin

Link to comment

Join the conversation