[SOLVED] Parity Drive Errors


Recommended Posts

I have unRAID 6.8.3 set up to do a parity check monthly.  Every month, it finds 5 errors.  It seems likely that the errors are from the parity drive itself.  Are there any other options to correct this short of replacing the parity drive?  I tried to search this forum for suggestions, but didn't spot anything that matched my situation.  If I missed an applicable post, please reply with the link and I'll go from there.  Thanks in advance!

Link to comment

Looks like you rebooted since last parity check so nothing to see in syslog.

 

On 1/18/2021 at 6:42 PM, KeithAbbott said:

parity check monthly.  Every month, it finds 5 errors

 

If your parity checks have been noncorrecting, then they are finding the same parity errors because they haven't been corrected.

 

You need to run a correcting parity check so the parity errors get corrected. Then you need to run a noncorrecting parity check to verify that there are zero sync errors. Exactly zero is the only acceptable result, and if you don't get that you haven't finished diagnosing your problem.

Link to comment

Thanks for the quick response.  My monthly parity check has had "Write corrections to parity disk" set to "Yes" ever since I first set it up.  I will rerun the parity check tonight and repost the diagnostics ZIP file, maybe that will give a better clue as to what is happening.  My parity check takes 19 hours to run, so I probably will not have anything to post until later tomorrow.

Link to comment

New diagnostics ZIP file attached, this one was created after a parity check was completed, but before any system reboot occurred.

 

Thanks for the info about the SAS2LP, looks like I will be shopping for an LSI controller card.  Although I would still like someone to take a look at my diagnostics file and confirm whether that was the root cause or not.

 

sagetv-diagnostics-20210125-1946.zip

Link to comment
Jan 24 22:28:25 SageTV kernel: mdcmd (62): check 
Jan 24 22:28:25 SageTV kernel: md: recovery thread: check P ...
Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004440
Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004448
Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004456
Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004464
Jan 25 00:10:39 SageTV kernel: md: recovery thread: P corrected, sector=2353004472

Now to see if the next one has the same sync errors.

Link to comment

Well, I ran another parity check overnight, and wouldn't you know this one came back with zero errors.  First time in probably over a year.  My next scheduled parity check is a week from today, I'll see how that one goes and post the results.

 

In the meantime, I'm looking at replacing my SAS2LP with an LSI 9207-8i.  Good choice?  Anything I should be aware of?  Anything I need to do besides unplugging the SAS2LP and plugging the 9207-8i in?

Link to comment

I've rerun the parity check after rebooting, and attached the resulting diagnostics ZIP file.  Here's the relevant snippet from the syslog:

 

Feb  2 01:30:01 SageTV kernel: mdcmd (42): check
Feb  2 01:30:01 SageTV kernel: md: recovery thread: check P ...
Feb  2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004440
Feb  2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004448
Feb  2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004456
Feb  2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004464
Feb  2 03:12:23 SageTV kernel: md: recovery thread: P corrected, sector=2353004472

 

As you can see, the corrected sectors match the last parity check exactly.  Hopefully, this gives a clue as to root cause of the problem and suggestions on where to go from here.  If the drive(s) are good, and this is strictly an issue with the SAS2LP controller card, I can replace the card with an LSI 9207-8i and resolve the issue.  However, if I have issues with the parity drive that requires replacing the drive, that will be a more costly remedy.

 

Suggestions?

sagetv-diagnostics-20210202-2040.zip

Link to comment
  • 1 month later...

I have replaced my SAS2LP with an LSI 9207-8i.  I reran the parity check, this time without the setting to write corrections to the parity disk.  It is still finding the same five errors as before.  If I rerun parity check again (except this time writing corrections to the parity disk), will it make the corrections and then behave properly (no errors) thereafter?

 

     Mar  6 23:06:21 SageTV kernel: mdcmd (42): check nocorrect
     Mar  6 23:06:21 SageTV kernel: md: recovery thread: check P ...
     Mar  7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004440
     Mar  7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004448
     Mar  7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004456
     Mar  7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004464
     Mar  7 00:48:05 SageTV kernel: md: recovery thread: P incorrect, sector=2353004472

Link to comment

OK, thanks.  I've reran the parity check once again, this time with the setting to write corrections to the parity disk.  My next scheduled parity check is in early April, so I'll know then whether the new controller has solved the problem or not.

 

With the new LSI 9207-8i controller, comes a new question.  I think the firmware version is 18.xx.xx.xx, so I am thinking that I should be upgrading the firmware to 20.00.07.00.  I am running it in IT mode, and from researching on this forum, it looks like I should probably remove/erase the BIOS, which would speed up the boot process a bit.

 

In summary, upgrade the firmware and remove the BIOS.  Any cautions against doing that?  Any guidance either for or against?

Link to comment

Thanks much for the reply, looks like I found my weekend project.  I do have a spare Windows 10 workstation that I will use, instead of my server.  That way, I'm not messing with my "production" box, and I can then erase the BIOS also.  Thanks for providing the link, that gave me exactly what I need.

 

Oh, one other question.  Is it best practice to run the monthly parity check with the setting to not write corrections to the parity disk?

Edited by KeithAbbott
Link to comment

As a quick info on top: I did the update to Unraid 6.9.0 this morning, everything worked as before, no issues there no nothing.
I kept watching a bunch of series thru plex over the curse of the day, no issues there too.

When I wanted to copy a picture to the server and then delete it, it told me that the "medium" was read/right-protected, which was the first thing that was unsual.

So I went into the WebGUI and there was a big, red error message regarding the parity disk, which is now diabled.

I, and that was a mistake, rebooted the system shortly after that cause I thought it would help - It did not. It said something like: Sector issue, cannot read something. Wouldn't surprise me if the drive is dead but I hope not

Fun Fact: the Parity disk will also not spin down if the array spins down

So now I'm stuck with an unprotected array and a disabled parity disk. Anyway, Diagnostics are attached

 

server-diagnostics-20210309-1955.zip

Edited by deltaexray
Forgot something
Link to comment
  • JorgeB changed the title to [SOLVED] Parity Drive Errors

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.