Monthly Parity check was aborted, rerunning it yields tons of errors, what should I do

AntoineR · December 1, 2023

Hi to anyone who could help me, I am fearing my server is having a really bad time in the moment.

I had a monthly scheduled parity check that was aborted because of errors found, it reported 177. I'm trying to find clear instructions on the right thing to do, and in doing so tried rerunning a new parity check. In about two minutes it gave millions of errors. I have not yet rebooted the server as I want to make sure nothing dies in the process.

Trying to explore files on the server fails and it seems every disk is empty.

Here are diagnostics, any help is greatly appreciated for the following steps to take. Thank you

pegasus-diagnostics-20231201-1834.zip

JorgeB · December 2, 2023

Dec  1 12:15:59 Pegasus kernel: ahci 0000:01:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0000100000000080 flags=0x0010]
Dec  1 12:15:59 Pegasus kernel: ahci 0000:01:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0000100000000000 flags=0x0010]

Onboard SATA controller issues, this was pretty common with older kernels and Ryzen/Threadripper boards, don't usually see it with current kernels, so try updating Unraid to latest and retest.

AntoineR · December 24, 2023

Hey! Sorry for the delay in my answer, I didn't want to spam and also didn't really have the time with work. These last few days I have had the time to try to resolve the issue and your diagnosis was right! Here are the steps I went through, should someone else stumble upon this thread with the same issue.

I bought and installed a controller to avoid the problematic motherboard controller, the JMicron JMB585 to be precise.

Unplugged the appropriate drive, changed the sata data cable and plugged it into the new controller, and restarted my server.

From there I unassigned the hard drive associated to the disabled disk and started the array in maintenance mode.

I then stopped the array (not the server), and assigned the drive once more to the disk, started the array *not* in maintenance mode.

From there it started to rebuild the disk, and after a few hours, it's now back to business and behaving as expected!

Thanks a lot for your help diagnosing the issue, should anyone want to see how I chose the controller, I used the information from this thread :

When I have time to tackle a bigger beast, I'll try to update everything hahaha, thanks again and have a nice day and nice holidays!

Monthly Parity check was aborted, rerunning it yields tons of errors, what should I do

Recommended Posts

AntoineR

Link to comment

JorgeB

Link to comment

AntoineR

Link to comment

Join the conversation