Parity Check Finish 1 Errors


Recommended Posts

I built this unraid server about 2 months ago. Its been working fine for the most part. Since the whole house is backed up by a set of Tesla Powerwalls I do not have the server on a UPS backup. The other day tesla was conducting some repairs and shut off the whole system down without allowing me time to properly shutdown the server.

 

When power was restore and the server came back online it automatically started a parity check, 22 hrs later it gave out a warning that 1 error was found Parity check completed. I always have the server to auto correct errors on scheduled parity checks (once a month) but I don't know if this setting was enforced so I decided to manually start a parity check and this time made sure that the fix parity errors setting was checked. 

 

Just now the second parity check finished and gave me this notification:  What should I do next?  

unRAID Parity check: 12-07-2018 17:10
Notice [POSEIDON] - Parity check finished (1 errors)
Duration: 22 hours, 45 minutes, 18 seconds. Average speed: 146.5 MB/s
 
 
 
 
Link to comment
11 hours ago, NGMK said:

I always have the server to auto correct errors on scheduled parity checks (once a month)

 

Be careful about using correcting parity except when adding new disks or after a power loss. It's required to correct the parity after a power loss because you normally always have a couple of blocks with wrong parity caused by all the data disks not being properly unmounted.


But for an already running system that is expected to have correct parity, auto-repair may potentially destroy a valid parity because of a data disk goofing and silently reading out wrong data. When the parity computation detects a difference, unRAID doesn't know why there is a difference. It isn't possible to know which of all the disks that have read out a value that doesn't agree with the content of all the other drives. That's the danger with silent errors on RAID systems. When a disk fails and stops being able to read out data, then it's easy for the system to figure out that the data from all other disks can be used to recompute the data from the problem disk. But with a silent error, any disk may be at fault.

 

So a parity error for a fully working system means that you want to be able to sit down and analyze everything carefully to see if the error is repeatable or if it was a single transfer error. With an automatic parity repair, you don't get this chance because the system will then always assume all the data disks are correct and that it's the parity drive that should be rewritten.

  • Like 3
Link to comment
5 hours ago, pwm said:

 

Be careful about using correcting parity except when adding new disks or after a power loss. It's required to correct the parity after a power loss because you normally always have a couple of blocks with wrong parity caused by all the data disks not being properly unmounted.


But for an already running system that is expected to have correct parity, auto-repair may potentially destroy a valid parity because of a data disk goofing and silently reading out wrong data. When the parity computation detects a difference, unRAID doesn't know why there is a difference. It isn't possible to know which of all the disks that have read out a value that doesn't agree with the content of all the other drives. That's the danger with silent errors on RAID systems. When a disk fails and stops being able to read out data, then it's easy for the system to figure out that the data from all other disks can be used to recompute the data from the problem disk. But with a silent error, any disk may be at fault.

  

So a parity error for a fully working system means that you want to be able to sit down and analyze everything carefully to see if the error is repeatable or if it was a single transfer error. With an automatic parity repair, you don't get this chance because the system will then always assume all the data disks are correct and that it's the parity drive that should be rewritten.

Thanks for all this information, unraid should give out at least a warning of the danger of doing error correcting parity checks specially during the scheduled ones.  

 

My array disk composition is as following, Parity  drive 12TB Ironwolf Pro, all the data disks are WD Red 8TB (all shuck from WD easystore bestbuy drives), so far the current position on the check is 8.2TB with no error this far yet reported, does this means that any errors encountered  from this point on solely on the parity Disk?   What is the advantage of continuing with the parity check from beyond the initial 8TB position?

Link to comment
4 hours ago, NGMK said:

What is the advantage of continuing with the parity check from beyond the initial 8TB position?

 

You should have a routine where all surface of all disks are read end-to-end regularly. Because it's only when the disk tries to read the individual sectors that the disk can detect problems with the surface or with locking on to the servo information that describes the location of the tracks and sectors.

 

Most data loss in traditional RAID systems (except user errors like file overwrites or accidental deletes) is caused by people not having scheduled testing of the drives. So as long as they don't get a read error when trying to view a film or opening a document, they don't know the state of the drives. So when they finally get a read error, they may already have multiple disks with errors - and not enough parity data to recover.

Link to comment
  • 3 years later...
On 7/13/2018 at 10:16 AM, pwm said:

It isn't possible to know which of all the disks that have read out a value that doesn't agree with the content of all the other drives. That's the danger with silent errors on RAID systems. When a disk fails and stops being able to read out data, then it's easy for the system to figure out that the data from all other disks can be used to recompute the data from the problem disk. But with a silent error, any disk may be at fault.

 

So a parity error for a fully working system means that you want to be able to sit down and analyze everything carefully to see if the error is repeatable or if it was a single transfer error. With an automatic parity repair, you don't get this chance because the system will then always assume all the data disks are correct and that it's the parity drive that should be rewritten.

If you have 2 parity disks wouldn't it calculate it from parity instead?

I thought the whole point of double parity was that you gain the ability to know what disk is at fault?

Any thoughts anyone?

Is safe to run auto-repair with double parity?  (excluding the less likely but still possible 2 bad drives at the same position)

Link to comment
23 minutes ago, itimpi said:

It would be nice if this was the case but apparently it is not always that easy :( 

Oh ok, any recommendation on a what I can do?

I have no idea how to check anything (and checking 70Tb by hand isn't going to happen)

 

I personally think that the Unraid guys should make this more clear. 

Link to comment
20 minutes ago, mdrodge said:

Oh ok, any recommendation on a what I can do?

I have no idea how to check anything (and checking 70Tb by hand isn't going to happen)

 

I personally think that the Unraid guys should make this more clear. 

 

If you do not think that you have a drive playing up then the only thing you can sensibly do is run a correcting parity check.  If you have not rebooted then you can post your diagnostics covering the period in question to see if anyone can spot anything.

 

If you want to know if you have 'bit rot' on array drives then you need to either be using BTRFS as the file system or use the File Integrity plugin to maintain checksums.

 

Link to comment

UDMA CRC error count 2867 on Parity drive 2.

UDMA CRC error count 14 on Disk 1

UDMA CRC error count 8 on Disk 3

UDMA CRC error count 8 on Disk 6

That is probably the issue

I'm thinking my controller is toast.

I had better try and get a replacement.

Thanks Bud ! 

Yes i use BTRFS and it's been solid (until my cheap sata card went wonky)

 

I think i have a good one i can pull from my Chia Server now i think about it :)

Edited by mdrodge
Link to comment

Oh ok. I'll check that out. Though I'm definitely suspicious that 4 of the 8 drives on that rig had bad cables, I was tempted to say cable when i saw the first one but after 4 i wasn't so sure. (I've got 4 onboard and 4 on a sata card)

Yes you'd expect a read error or drop out or something fatal I guess with a controller.

Either way I'm pointing in that area for my issue.

 

Thanks Dude!!

Edited by mdrodge
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.