How do I Troubleshoot Parity Errors?


Go to solution Solved by trurl,

Recommended Posts

Unraid Version: 6.12.6

 

I'm a bit new to using Parity and btrfs, I'll be honest. I set up my Unraid box a while ago and everything's been going well! Unfortunately as of late, I've stumbled upon quite a few Parity Checks that have resulted with errors being found. I went thru some of the other forum posts regarding this, but in my case, I don't quite know where to start with getting this fixed. I'll outline what I have done below.

 

What I have done so far:

  • Ran manual Parity check after automatic check just to make sure the errors existed
  • I didn't immediately find anything wrong with any data, so I ran another Parity Check with "Write corrections to parity" enabled
    • This did not seem to fix or resolve the errors as they simply showed up again in the Parity History
  • Ran Scrub (with Array in Normal mode, using BTRFS) on all disks in array
    • All disks return no errors

 

At this point I don't really know what to do from here. I can only recall one incident a while back where the UPS attached to the NAS was tripped and the NAS lost power. Otherwise, she's only ever had clean shutdowns / reboots. I've also attached my diagnostics here just to be a lil ahead of the game.

nasty-diagnostics-20240207-1835.zip

Link to comment
8 hours ago, Wzyss said:

This did not seem to fix or resolve the errors as they simply showed up again in the Parity History

This is normal, since it should find but correct the same errors, was the number of errors exactly the same?

 

8 hours ago, Wzyss said:

Ran Scrub (with Array in Normal mode, using BTRFS) on all disks in array

  • All disks return no errors

 

That confirms all the data is fine.

Link to comment
7 hours ago, JorgeB said:

This is normal, since it should find but correct the same errors, was the number of errors exactly the same?

Yes, the errors are consistently the same. Currently I'm dealing with 12897. A few months prior I was dealing with Parity having 121 errors, but that eventually seems to have fixed itself before this incident.

 

Here's a screenshot of my Parity Check history. The cancellations are usually me realizing I hit Check again instead of the History button.

 

 

7 hours ago, JorgeB said:

That confirms all the data is fine.

I'm glad this is the case, and I assumed as much. However at this point I wonder what's gone wrong with the Parity. Seeing this made me get a little antsy!

chrome_PWPwnDcrwJ.png

Link to comment

That suggests to me that sometimes the server is not having a clean shutdown, after that a few sync errors are expected, and they should be the same until corrected, likely the last complete parity check when there were 121 errors was correct, so after that there were zero errors again, until possibly the next unclean shutdown.

 

 

Link to comment

So in this case, what would an unclean shutdown be? I've only had one unexpected shutdown when the UPS itself was tripped. Every other reboot / shutdown was initiated from the admin interface. Also, in reference to correcting the parity, I did run a Parity Check with Write Corrections Enabled, so I would have thought that if the errors were the same, it would attempt to correct them.

 

Again, I'm new to using Parity in this way and I've attempted to read and understand it -- however it blows my mind every time I dig deep into it lol

Edited by Wzyss
Added more context
Link to comment
17 minutes ago, JorgeB said:

A telltale sign is if a parity check starts after a reboot, do you remember if that happened?

 

I can't recall for certain, but I have gotten notifications that a parity check has started (off of schedule) and I went and cancelled it because I thought to myself: "Why in the world are you doing this?". The system will auto power on if it loses power but then gets it back, so other than checking the uptime it is something I might not notice. However we do use Plex quite a lot and haven't noticed it just not working at any point in time.

Edited by Wzyss
Link to comment
  • Solution

Complete a correcting parity check, then immediately after run a non-correcting check. Don't reboot during any of this so all of it is recorded in the same syslog and you can post the diagnostics after.

 

If the non-correcting check still has sync errors, further investigation is needed.

Link to comment
5 hours ago, trurl said:

Complete a correcting parity check, then immediately after run a non-correcting check. Don't reboot during any of this so all of it is recorded in the same syslog and you can post the diagnostics after.

 

If the non-correcting check still has sync errors, further investigation is needed.

 

I've got this running now and will update with the results.

Link to comment

@trurl So I went ahead and ran a correcting parity check followed by a non-correcting check. Both returned zero errors. Would the correcting parity check still notify you if it had found errors?

 

Nonetheless, it seems everything is fine now. I'll keep my eyes on it.

Link to comment

Interesting... I'm not sure how I went from having parity errors to suddenly not having them. If the correcting parity check didn't find any errors, I'm a bit curious as to why (up until this point) I was receiving them. I've heard that this could be the result of bad files stored on a drive itself and I did do some moving and deleting of old data. Not sure if that statement is true or if my messing around with data had anything to do with what I was seeing.

 

Nonetheless, I appreciate the assistance everyone has provided me here and I'm crossing my fingers and hoping I don't run into this same scenario later down the road.

Link to comment
10 hours ago, itimpi said:

@Wzyss Perhaps you did not realise that a correcting check reports as "errors" every sector that it corrects.   Any check run after that will report 0 unless a new error is found.

 

It is very possible that I ended up running my parity with write corrections and still saw the errors and assumed something was still wrong. That's probably when I came to the forum to post about this. Then once I was told to run yet another check with write corrections enabled, it reported 0 errors. I wish it was easier to tell when the "Write Correction" option was enabled when viewing the history -- that would've cleared a lot of things up.

Link to comment
4 minutes ago, Wzyss said:

 

It is very possible that I ended up running my parity with write corrections and still saw the errors and assumed something was still wrong. That's probably when I came to the forum to post about this. Then once I was told to run yet another check with write corrections enabled, it reported 0 errors. I wish it was easier to tell when the "Write Correction" option was enabled when viewing the history -- that would've cleared a lot of things up.

If you install the Parity Check Tuning plugin then even if you do not use its other features you will find that history entries get enhanced with that type of information.

Link to comment
Just now, itimpi said:

If you install the Parity Check Tuning plugin then even if you do not use its other features you will find that history entries get enhanced with that type of information.

 

I'll give that a look! Thanks for the info. :)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.