A few errors after a parity check.

dalben · March 2, 2019

I've just noticed that since Jan 1st my parity checks have thrown me 480m errors. The last two monthly parity checks I just didn't click as I must have ignored that number as a date or something, Realised yesterday there were errors and now noticed it's been the last 2 monthly checks.

With that size number out of nowhere, and not having experienced any issues with the server, could it be something other than real parity errors? Anything I can check or look at to see what the real state of my server is ? I'm more they are real errors with the parity check increasing by 4 hours. The only real change is that the Dec 02 run is when I installed a 6Tb parity disk.

The last few checks:

2019-03-01, 12:14:33   12 hr, 14 min, 32 sec   136.2 MB/s   OK   488376000
2019-02-01, 12:15:10   12 hr, 15 min, 9 sec   136.1 MB/s   OK   488376000
2019-01-01, 12:15:14   12 hr, 15 min, 13 sec   136.0 MB/s   OK   488376000
2018-12-02, 15:46:17   8 hr, 34 min, 21 sec   194.5 MB/s   OK   0
2018-11-30, 04:53:20   8 hr, 36 min, 50 sec   129.0 MB/s   OK   0
2018-11-01, 08:30:29   8 hr, 30 min, 28 sec   130.6 MB/s   OK   0
2018-10-12, 06:25:34   8 hr, 33 min, 52 sec   129.8 MB/s   OK   0

trurl · March 2, 2019

memtest

trurl · March 2, 2019

Another possibility is you did something between December and January that invalidated your parity and you didn't resync, then after that only did noncorrecting parity checks.

Actually I think that is the most likely without further information.

trurl · March 2, 2019

51 minutes ago, dalben said:

The only real change is that the Dec 02 run is when I installed a 6Tb parity disk.

How exactly did you do this "install"?

trurl · March 2, 2019

If you went from 4 to 6 TB parity that would certainly explain the difference in time.

dalben · March 2, 2019

I used the parity swap procedure found here:

https://wiki.unraid.net/The_parity_swap_procedure

Moved the 4tb parity to a data drive then chose the new 6tb as the parity. I'm pretty sure I ran a full parity after the swap and remembered seeing the increase in speed to 194.5 MB/s which I assumed was due to a 6tb parity run knowing there'd be 2tb of nothing to check. But I could be mistaken.

What would be the risks of another parity check and correcting errors vs rebuilding parity from scratch. I haven't noticed any file corruptions (yet).

trurl · March 2, 2019

17 minutes ago, dalben said:

I assumed was due to a 6tb parity run knowing there'd be 2tb of nothing to check.

No, it would check the whole parity disk, and everything after the 4TB would have to be zero, at least until you had a data disk larger than 4TB. Otherwise that part of parity would be invalid when you did add a larger disk.

Rebuilding parity might actually be a little faster than checking parity, but the end result should be the same, valid parity. Making your parity valid one way or the other is your only choice.

trurl · March 2, 2019

Was that

1 hour ago, dalben said:

2018-12-02, 15:46:17 8 hr, 34 min, 21 sec 194.5 MB/s OK 0

actually the parity swap itself? That might make sense, or at least I can see how it might arrive at those results for just the data rebuild part of the swap. Then it would have calculated the 4TB rebuild based on 6TB parity. But that wouldn't include the parity copy part it does at the beginning of the swap.

dalben · March 2, 2019

24 minutes ago, trurl said:

Was that

actually the parity swap itself? That might make sense, or at least I can see how it might arrive at those results for just the data rebuild part of the swap. Then it would have calculated the 4TB rebuild based on 6TB parity. But that wouldn't include the parity copy part it does at the beginning of the swap.

I'm assuming that was the parity rebuild, not the copy, that took a while as well.

To rebuild the parity am I right that these are the correct steps:

Unassign the parity drive, then start and stop the array, then reassign the parity drive and restart the array ?

JorgeB · March 2, 2019

Parity swap sometimes appears to not correctly zero the new disk, though I never been able to reproduce it, looks like you need to do a correcting check to correctly sync parity, after that all checks should result in 0 errors.

dalben · March 3, 2019

18 hours ago, johnnie.black said:

Parity swap sometimes appears to not correctly zero the new disk, though I never been able to reproduce it, looks like you need to do a correcting check to correctly sync parity, after that all checks should result in 0 errors.

Thanks. It looks like you're right. Started a correcting check, all fine until the 4tb mark now is correcting errors at a rapid rate. As my biggest data disk is 4tb that seems in line with your thoughts.

dalben · March 3, 2019

So the correcting check ran.

Last check completed on Sunday, 03-03-2019, 18:07 (today), finding 488376000 errors. 
Duration: 17 hours, 54 minutes, 33 seconds. Average speed: 93.1 MB/sec

Log has a fair few of these entries, then stopped logging

Mar 3 08:40:58 tdm kernel: md: recovery thread: P corrected, sector=7814037848 
Mar 3 08:40:58 tdm kernel: md: recovery thread: P corrected, sector=7814037856 
Mar 3 08:40:58 tdm kernel: md: recovery thread: stopped logging

Then we see

Mar 3 18:07:53 tdm kernel: md: sync done. time=64472sec 
Mar 3 18:07:54 tdm kernel: md: recovery thread: completion status: 0

So now I'm trying to work out if it did correct those errors. I can't see a log entry or comment anywhere of the amount of errors it corrected. As 17hour Parity check is about 5 hours more than usual so it assumes it did a fair bit of extra work, but I'd like to see some confirmation.

JorgeB · March 3, 2019

8 minutes ago, dalben said:

So now I'm trying to work out if it did correct those errors

It did:

8 minutes ago, dalben said:

P corrected

Next checks should return 0 errors.

dalben · March 3, 2019

OK, thanks. I'll kick one off tonight then to see how it goes.

As an aside, it might be a worthy addition to the Parity Check summary to say it did correct xxxxxx errors to avoid the easily confused like me

JorgeB · March 3, 2019

It will say how many errors found, if it was a correcting check all were corrected.

A few errors after a parity check.

Recommended Posts

dalben

Link to comment

trurl

Link to comment

trurl

Link to comment

trurl

Link to comment

trurl

Link to comment

dalben

Link to comment

trurl

Link to comment

trurl

Link to comment

dalben

Link to comment

JorgeB

Link to comment

dalben

Link to comment

dalben

Link to comment

JorgeB

Link to comment

dalben

Link to comment

JorgeB

Link to comment

Join the conversation