Monthly Parity check was aborted, rerunning it yields tons of errors, what should I do


Go to solution Solved by JorgeB,

Recommended Posts

Hi to anyone who could help me, I am fearing my server is having a really bad time in the moment.

 

I had a monthly scheduled parity check that was aborted because of errors found, it reported 177. I'm trying to find clear instructions on the right thing to do, and in doing so tried rerunning a new parity check. In about two minutes it gave millions of errors. I have not yet rebooted the server as I want to make sure nothing dies in the process.

 

Trying to explore files on the server fails and it seems every disk is empty.

 

Here are diagnostics, any help is greatly appreciated for the following steps to take. Thank you

pegasus-diagnostics-20231201-1834.zip

Link to comment
  • Solution
Dec  1 12:15:59 Pegasus kernel: ahci 0000:01:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0000100000000080 flags=0x0010]
Dec  1 12:15:59 Pegasus kernel: ahci 0000:01:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0000100000000000 flags=0x0010]

 

Onboard SATA controller issues, this was pretty common with older kernels and Ryzen/Threadripper boards, don't usually see it with current kernels, so try updating Unraid to latest and retest.

Link to comment
  • 4 weeks later...

Hey! Sorry for the delay in my answer, I didn't want to spam and also didn't really have the time with work. These last few days I have had the time to try to resolve the issue and your diagnosis was right! Here are the steps I went through, should someone else stumble upon this thread with the same issue.

 

I bought and installed a controller to avoid the problematic motherboard controller, the JMicron JMB585 to be precise.

Unplugged the appropriate drive, changed the sata data cable and plugged it into the new controller, and restarted my server.

From there I unassigned the hard drive associated to the disabled disk and started the array in maintenance mode.

I then stopped the array (not the server), and assigned the drive once more to the disk, started the array *not* in maintenance mode.

From there it started to rebuild the disk, and after a few hours, it's now back to business and behaving as expected!

 

Thanks a lot for your help diagnosing the issue, should anyone want to see how I chose the controller, I used the information from this thread :

When I have time to tackle a bigger beast, I'll try to update everything hahaha, thanks again and have a nice day and nice holidays! :)

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.