Jump to content

[SOLVED] Parity Errors not being fixed


Recommended Posts

Greetings,

I ran a parity check last night and it finished with 20 errors. I started another check, assuming that the errors would have been fixed during the first check. However, I'm halfway through the second parity check and 20 errors have popped up again. I am running unRaid 4.7 and my syslog is attached. I wondered if it might have be caused by the automatic movement of 15 files from the cache drive to the array in the middle of the check.

Any ideas as to what might be going on?

Cheers

syslog.txt

Link to comment

Greetings,

I ran a parity check last night and it finished with 20 errors. I started another check, assuming that the errors would have been fixed during the first check. However, I'm halfway through the second parity check and 20 errors have popped up again. I am running unRaid 4.7 and my syslog is attached. I wondered if it might have be caused by the automatic movement of 15 files from the cache drive to the array in the middle of the check.

Any ideas as to what might be going on?

Cheers

File movement has absolutely nothing to do with parity errors.  Parity is maintained while the moves are made.

 

Are you sure you are not doing NOCORRECT parity checks, where it detects, but does not correct the errors.  (The web-management interface ALWAYS says it corrected them even when in NOCORRCT mode.  It never got updated to where it says something different)

 

Sure looks like you are doing parity checks in CORRECT mode.  I'd say you have either bad memory, or memory set up with the wrong voltage. timing, or clock speed, or a bad motherboard chipset, or a bad disk drive returning random data, or possible a bad power supply.

 

Start with smart reports on all your disks, since bad sectors could cause this class of error, then a memory test, preferably overnight, then you can go about isolating the cause if those are not the issue.  Unfortunately, when you have this class of error, it could be almost anything.

Link to comment

Hi Joe,

Thanks for your input. The only thing that has changed in the array recently in the addition of a second sata card, an Adaptec 1430SA, that I installed yesterday. I put it in the second pci-e x16 slot. In the first slot I have a Supermicro AOC-SASLP-MV8 that has been in the server for the past year. I'll remove the Adaptec and run another check. Could that be the cause? Prior to that I have had few issues with the server since I built it last year.

Thanks again.

Link to comment

Hi Joe,

Thanks for your input. The only thing that has changed in the array recently in the addition of a second sata card, an Adaptec 1430SA, that I installed yesterday. I put it in the second pci-e x16 slot. In the first slot I have a Supermicro AOC-SASLP-MV8 that has been in the server for the past year. I'll remove the Adaptec and run another check. Could that be the cause? Prior to that I have had few issues with the server since I built it last year.

Thanks again.

Looking closer, since the addresses of the parity errors are the same, it is possible the first flipped some bits, and the second flipped them back.  A third parity check might be in order, but this time do it in the nocorrect mode.

 

Joe L.

Link to comment

How do I run the check in nocorrect mode? I'm using the default interface and I don't see where to set that.

Thanks

in your version of unRAID, you must do it on the command line. (or from the button in unMENU's array management page)

 

Log in and type

/root/mdcmd check NOCORRECT

 

 

Link to comment

Run one more regular correcting check to make sure.

That is BAD advice.

 

If there is a problem with one of your disks or hardware it would write BAD information to the parity disk.  

(if your system was working properly, it would do no harm at all)  The reason the NOCORRECT parity check exists is because we unRAID users requested it to assist in our tests in exactly the situation you had, random parity errors.

 

The more correct advice would be to run several more NOCORRECT parity checks.  If they all are without error, then you are fine.  If they detect anything, at least parity is correct (as best as it is right now) and you can still use it in the event another disk failed.

Link to comment

I know my advise was wrong before but now that your confident in the system wouldn't it be prudent to run a correcting check? Just to come full circle. That is what revealed the problem in the first place and to be absolutely sure that it has been resolved a successful completion of a correcting test is required.

Link to comment

I know my advise was wrong before but now that your confident in the system wouldn't it be prudent to run a correcting check? Just to come full circle. That is what revealed the problem in the first place and to be absolutely sure that it has been resolved a successful completion of a correcting test is required.

 

A correcting check will not find errors that a nocorrect check will miss. It makes no difference as long as there are no parity errors when done.

 

Peter

 

Link to comment

I know my advise was wrong before but now that your confident in the system wouldn't it be prudent to run a correcting check? Just to come full circle. That is what revealed the problem in the first place and to be absolutely sure that it has been resolved a successful completion of a correcting test is required.

 

A correcting check will not find errors that a nocorrect check will miss. It makes no difference as long as there are no parity errors when done.

 

Peter

 

 

This is true in theory; however, a correcting parity check not fixing parity correctly is the bases for this thread.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...