kernel: md: recovery thread: P incorrect || but weird. sectors incremented by 8, mayhaps parity checker software fault?


Recommended Posts

Hello people! :)

 

I've been running Unraind since 2019 with no issues, no disk failures, no nothing except great performance, knock on wood.

 

Anyway, the first issue ever arose a few days ago. It ran it's standard parity check, the only difference was that I changed it in config from every 1mo to every 2mo.

 

And bam.. Notification - Notice [NAS] - Parity check finished (3360 errors).

 

So I did some reading, I got the diagnostics file downloaded and attached, I also ran another parity check and same result, same sectors got marked, same amount (3360 errors). So it's probably not ram related and also shouldn't be cabling related (sata in my case).

 

Also some other important facts:

 

  • 355 day uptime, no issues
  • runs on a very good UPS, very stable voltage, no fluctuations
  • no sudden shutdowns recently or ever... UPS always gracefully shuts it down
  • no any weird changes to settings or modifications except changing scheduler from 1mo to 2mo
  • it runs 32MB ECC ram
  • parity was always clean (0 errors before) since the server's first startup

 

SMART data seems to be clean and I don't notice anything else weird.

 

I DO NOTICE ONE VERY WEIRD THING Sorry for caps, but this is very important.

 

Pay attention to the sector numbers:

 

Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737232
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737240
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737248
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737256
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737264
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737272
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737280
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737288
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737296
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737304
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737312
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737320
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737328
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737336
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737344
Feb  5 04:40:09 NAS kernel: md: recovery thread: P incorrect, sector=1056737352

 

They're incremented by 8 non stop. 32, 40, 48, 56, 64, 72, 80, 88, 96, 04, 12, 20, 28.

 

There is something very wrong. I don't think parity is actually bad, I think there's something wack with the parity checker.

 

So what do I do now?

 

Should I reboot?

 

Thank you.

______________

 

p.s. I had two questions on my mind for some time:

1. Should I add the checksum plugin?

2. I'm on 6.7.2 atm, is it safe for me to update? since I'm on a really old version I'm not sure if it can update directly without issues, thanks

nas-diagnostics-20230209-2050-anonymized2.zip

Link to comment
9 hours ago, Arandomdood said:

They're incremented by 8 non stop

That's normal.

 

9 hours ago, Arandomdood said:

So what do I do now?

Run another non correcting check to compare the errors, if they are the exact same sectors you should then run a correcting check.

 

You should also update Unraid, that's a very old release.

Link to comment
8 hours ago, JorgeB said:

That's normal.

 

Run another non correcting check to compare the errors, if they are the exact same sectors you should then run a correcting check.

 

You should also update Unraid, that's a very old release.

Hello JorgeB, thank you for your reply.

 

I see, so it's normal then. I wonder why this happened. It seems that it's 1.7MB of data that got ruined on it's own. I did run two checks, errors are exact same.

 

Oh well, I'll sync it and get that file integrity plugin running, since I have backups anyway so I have no issues restoring from them.

 

By the way, what's the best way to go upgrading the server? I agree 6.7.2 is old, I think I would benefit from newer kernel and everything.

Link to comment
4 hours ago, JorgeB said:

Don't remember if 6.7 supports updating via the GUI (Tools -> Update OS), but you can always do a manual update.

There indeed is Tools -> Update OS. 6.11.5 is the current stable.

 

Can I anyhow create a full backup for reverting to 6.7.2 in case some big issue arises? Then I'd do a fresh install instead.

 

Edit: Ah I see: "Before upgrading, we highly recommend making a complete backup of your USB flash device. You can do this by copying the entire contents of the "flash" share to a separate computer."

 

What would be the best approach to make this copy without shutting down the server?

 

Also, would you recommend running a correcting parity check first before upgrade? And, should I reboot before correcting parity check and run a noncorrecting one just to see if it still persists? (Rule out software glitch, but it probably isn't a glitch).

Edited by Arandomdood
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.