Jump to content

[SOLVED] How can Parity be "valid" with 724 Errors?


Recommended Posts

I lost power last night...UPS has been bumped higher on my next buy list...

 

Anyhow, it needed to do it's Parity Sync this AM when I brought the server up.. and now I see this... how can it be "valid" with all those errors, and do I just need to check the "Correct any Parity-Sync" errors check box and click "check" button again?

3-28-2011%204-19-57%20PM.jpg

 

Syslog attached...

syslog-2011-03-28.txt

Link to comment

I'm also seeing some apparent drive error messages...not sure how to pinpoint (all appears to be "ata7" related, but not sure what drive that is.)

 

Mar 28 06:37:56 Media kernel: ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x780100 action 0x6 (Errors)

Mar 28 06:37:56 Media kernel: ata7: SError: { UnrecovData 10B8B Dispar BadCRC Handshk } (Errors)

Mar 28 06:37:56 Media kernel:          res 40/00:00:28:60:38/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) (Errors)

Mar 28 06:37:56 Media kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x980000 action 0x6 frozen (Errors)

Mar 28 06:37:56 Media kernel: ata7: SError: { 10B8B Dispar LinkSeq } (Errors)

 

*EDIT* Appears to be my new/old Cache Drive per MyMain "Smart" view

Link to comment

I'm sure others can correct me if i'm wrong, but this same thing happened to me, and my mover was running at the time...the errors the server tells you it has are stuff its already fixed.  I ended up with 2016, they were just read errors from the disk writing crap when it lost power.  If you run another check, those errors should go away. 

Link to comment

Parity being valid is unRAIDs way of saying that a parity drive has been added and that parity has been built.  It means that each write is going to update parity, and if a drive fails it will rebuild (at least to the best of its ability) the failed disk.  Obviously if parity isn't perfect, a rebuild won't be perfect, that's why you want to fix these parity errors.

 

If you ran a correcting parity check, those 724 parity errors are already corrected.  If not, you need to run a correcting parity check.  I have had this same thing happen to me a couple times.  There is some slight contention that occurs as unRAID starts up after a power failure that can cause sync errors to not be all fixed with the first correcting check.  I'd recommend running them 2 or 3 times (at the most) to get them all to clear.  If you are still getting sync errors after 3 parity checks you have other problesms.

 

Link to comment

Parity being valid is unRAIDs way of saying that a parity drive has been added and that parity has been built.  It means that each write is going to update parity, and if a drive fails it will rebuild (at least to the best of its ability) the failed disk.  Obviously if parity isn't perfect, a rebuild won't be perfect, that's why you want to fix these parity errors.

 

If you ran a correcting parity check, those 724 parity errors are already corrected.  If not, you need to run a correcting parity check.  I have had this same thing happen to me a couple times.  There is some slight contention that occurs as unRAID starts up after a power failure that can cause sync errors to not be all fixed with the first correcting check.  I'd recommend running them 2 or 3 times (at the most) to get them all to clear.  If you are still getting sync errors after 3 parity checks you have other problesms.

 

 

I hadn't run the "correcting" parity check yet (before posting this thread).  It's going now, I think I may have other problems with my Cache Drive per the Syslog too, which stinks since I just got SABnzbd, Sickbeard and Couch Potato all up and configured on it.

Link to comment

There were still 724 Errors again after running the "Correction"... guess I'll try it again.

 

Its actually exactly to be expected.  If it were more or less it would indicate a problem.

 

Run another one - you should get 0, but I would not be surprised to see it correcting a few (based on my experience).

Link to comment

It had to correct the errors. The next check should be good now they are corrected.

 

Peter

 

Well, I clicked the "Correct...checkbox", clicked the Check Button, it ran for multiple hours overnight, and there are still hundreds of these entries that popped up in the log after the Check started...

 

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286408 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286416 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286424 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286432 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286440 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286448 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286456 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286464 (Errors)

Link to comment

It had to correct the errors. The next check should be good now they are corrected.

 

Peter

 

Well, I clicked the "Correct...checkbox", clicked the Check Button, it ran for multiple hours overnight, and there are still hundreds of these entries that popped up in the log after the Check started...

 

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286408 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286416 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286424 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286432 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286440 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286448 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286456 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286464 (Errors)

 

Yes - unRAID logs the location of each parity mismatch (up to some limit, I forget the number max, to prevent "log explotion").

 

This is a very valuable feature if you are one of the unlucky few that have problems with ongoing sync errors and you are trying to determine if they are occurring at the same location each time, or if they are in new locations.

Link to comment

It had to correct the errors. The next check should be good now they are corrected.

 

Peter

 

Well, I clicked the "Correct...checkbox", clicked the Check Button, it ran for multiple hours overnight, and there are still hundreds of these entries that popped up in the log after the Check started...

 

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286408 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286416 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286424 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286432 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286440 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286448 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286456 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286464 (Errors)

If this is the second parity check with the "correct errors" checkbox checked, then you have a hardware issue to identify and resolve.   The only help I can offer is that is is unlikely to be the keyboard, mouse, or monitor.  

 

Unfortunately it could be almost anything else...  Memory, (or bios configuration of its voltage, timing, or clock speed), a disk drive, a disk controller, or a motherboard, or lastly, a poorly regulated power supply.

 

Start with a memory test, preferably overnight.  It is the most likely suspect.  The next most likely suspect will be the disk drives, but it only takes one to be the issue, and we've seen them just return an occasional bad bit with no other indication of an error.  Those disks cause hair-loss.  (you'll pull your hair out trying to figure out what is happening)

 

Once you eliminate the memory as a potential issue, then the method to locate errors are to perform repeated checksums of each disk in turn in the address range where the parity errors are occurring.  A disk that always return the same checksum is probably good.  One with different checksums is not as likely to be good unless it was written to in between.

 

 

Joe L.

Link to comment

It had to correct the errors. The next check should be good now they are corrected.

 

Peter

 

Well, I clicked the "Correct...checkbox", clicked the Check Button, it ran for multiple hours overnight, and there are still hundreds of these entries that popped up in the log after the Check started...

 

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286408 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286416 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286424 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286432 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286440 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286448 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286456 (Errors)

Mar 28 20:43:10 Media kernel: md: parity incorrect: 820286464 (Errors)

If this is the second parity check with the "correct errors" checkbox checked, then you have a hardware issue to identify and resolve.   The only help I can offer is that is is unlikely to be the keyboard, mouse, or monitor.  

 

Unfortunately it could be almost anything else...  Memory, (or bios configuration of its voltage, timing, or clock speed), a disk drive, a disk controller, or a motherboard, or lastly, a poorly regulated power supply.

 

Start with a memory test, preferably overnight.  It is the most likely suspect.  The next most likely suspect will be the disk drives, but it only takes one to be the issue, and we've seen them just return an occasional bad bit with no other indication of an error.  Those disks cause hair-loss.  (you'll pull your hair out trying to figure out what is happening)

 

Once you eliminate the memory as a potential issue, then the method to locate errors are to perform repeated checksums of each disk in turn in the address range where the parity errors are occurring.  A disk that always return the same checksum is probably good.  One with different checksums is not as likely to be good unless it was written to in between.

 

 

Joe L.

 

This was after the first pass with the "Correct" checkbox on... I'm running the 2nd one now and will be done in 4-5 hours.

Link to comment

In case it's still not clear, the first time you run a parity check with the correct option unRAID will find any errors, report them and then correct them.

 

The second pass should have no errors since they were all just fixed. If more appear then you have a bigger issue.

 

Peter

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...