A few errors after a parity check.


Recommended Posts

I've just noticed that since Jan 1st my parity checks have thrown me 480m errors.  The last two monthly parity checks I just didn't click as I must have ignored that number as a date or something,   Realised yesterday there were errors and now noticed it's been the last 2 monthly checks.

 

With that size number out of nowhere, and not having experienced any issues with the server, could it be something other than real parity errors?  Anything I can check or look at to see what the real state of my server is ?  I'm more they are real errors with the parity check increasing by 4 hours.  The only real change is that the Dec 02 run is when I installed a 6Tb parity disk.

 

The last few checks:

 

2019-03-01, 12:14:33    12 hr, 14 min, 32 sec    136.2 MB/s    OK    488376000
2019-02-01, 12:15:10    12 hr, 15 min, 9 sec    136.1 MB/s    OK    488376000
2019-01-01, 12:15:14    12 hr, 15 min, 13 sec    136.0 MB/s    OK    488376000
2018-12-02, 15:46:17    8 hr, 34 min, 21 sec    194.5 MB/s    OK    0
2018-11-30, 04:53:20    8 hr, 36 min, 50 sec    129.0 MB/s    OK    0
2018-11-01, 08:30:29    8 hr, 30 min, 28 sec    130.6 MB/s    OK    0
2018-10-12, 06:25:34    8 hr, 33 min, 52 sec    129.8 MB/s    OK    0

 

Link to comment

I used the parity swap procedure found here:

https://wiki.unraid.net/The_parity_swap_procedure

 

Moved the 4tb parity to a data drive then chose the new 6tb as the parity.  I'm pretty sure I ran a full parity after the swap and remembered seeing the increase in speed to 194.5 MB/s which I assumed was due to a 6tb parity run knowing there'd be 2tb of nothing to check.  But I could be mistaken.

 

What would be the risks of another parity check and correcting errors vs rebuilding parity from scratch.  I haven't noticed any file corruptions (yet). 

Link to comment
17 minutes ago, dalben said:

I assumed was due to a 6tb parity run knowing there'd be 2tb of nothing to check.

No, it would check the whole parity disk, and everything after the 4TB would have to be zero, at least until you had a data disk larger than 4TB. Otherwise that part of parity would be invalid when you did add a larger disk.

 

Rebuilding parity might actually be a little faster than checking parity, but the end result should be the same, valid parity. Making your parity valid one way or the other is your only choice.

Link to comment

Was that 

1 hour ago, dalben said:

2018-12-02, 15:46:17    8 hr, 34 min, 21 sec    194.5 MB/s    OK    0

actually the parity swap itself? That might make sense, or at least I can see how it might arrive at those results for just the data rebuild part of the swap. Then it would have calculated the 4TB rebuild based on 6TB parity. But that wouldn't include the parity copy part it does at the beginning of the swap.

Link to comment
24 minutes ago, trurl said:

Was that 

actually the parity swap itself? That might make sense, or at least I can see how it might arrive at those results for just the data rebuild part of the swap. Then it would have calculated the 4TB rebuild based on 6TB parity. But that wouldn't include the parity copy part it does at the beginning of the swap.

I'm assuming that was the parity rebuild, not the copy, that took a while as well.

 

To rebuild the parity am I right that these are the correct steps:

 

Unassign the parity drive, then start and stop the array, then reassign the parity drive and restart the array ?

Link to comment
18 hours ago, johnnie.black said:

Parity swap sometimes appears to not correctly zero the new disk, though I never been able to reproduce it, looks like you need to do a correcting check to correctly sync parity, after that all checks should result in 0 errors.

Thanks. It looks like you're right. Started a correcting check, all fine until the 4tb mark now is correcting errors at a rapid rate. As my biggest data disk is 4tb that seems in line with your thoughts. 

Link to comment

So the correcting check ran.  

 

Last check completed on Sunday, 03-03-2019, 18:07 (today), finding 488376000 errors. 
Duration: 17 hours, 54 minutes, 33 seconds. Average speed: 93.1 MB/sec

 

Log has a fair few of these entries, then stopped logging

Mar 3 08:40:58 tdm kernel: md: recovery thread: P corrected, sector=7814037848 
Mar 3 08:40:58 tdm kernel: md: recovery thread: P corrected, sector=7814037856 
Mar 3 08:40:58 tdm kernel: md: recovery thread: stopped logging

 

Then we see

Mar 3 18:07:53 tdm kernel: md: sync done. time=64472sec 
Mar 3 18:07:54 tdm kernel: md: recovery thread: completion status: 0

So now I'm trying to work out if it did correct those errors.  I can't see a log entry or comment anywhere of the amount of errors it corrected.  As 17hour Parity check is about 5 hours more than usual so it assumes it did a fair bit of extra work, but I'd like to see some confirmation.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.