Multiple disk errors during parity drive upgrade


Recommended Posts

Something terrible seems to have happened to my server while I was performing some upgrades 😨

 

I wanted to replace the existing 10TB parity drive with a 14TB one so that I could add larger capacity drives to the array. I ran a parity check beforehand, which reported 0 errors.

 

After replacing the drive, Unraid began a parity rebuild, but didn't get very far before all drives in the array began spewing errors into the system log. I've tried the process a couple more times & the same thing keeps happening. After stopping the parity check & attempting a recursive ls on each disk, some files/directories can be accessed while others fail with "Input/output error". I can't view the SMART status of any of the drives either; it just says that "a mandatory command failed".

 

What the hell has happened?? How could all 7 disks in the array simultaneously go bad, after a full parity check beforehand not report any errors?

 

I didn't touch any of the cables to the array drives during the upgrade... the new drive is connected to a totally different SATA controller/PSU power cable. In the interest of full disclosure, I also added more RAM to the system recently, but this has been in place for a week without issues, and was there during the parity check prior to adding the new drive.

 

screenshot_2020-04-02_at_19_41_16.png.4b0e41ba15764730e28051e20cf68bf7.png

holt-diagnostics-20200403-1326.zip

Link to comment
12 minutes ago, jam said:

I'm guessing that the motherboard is screwed then?

I wouldn't say the motherboard is bad, since this happens with multiple Ryzen models, so possibly a kernel/compatibility issue, you can also try v6.9-beta1 which uses a much newer kernel, if still the same then a different model board might help, if you're lucky.

Link to comment
18 hours ago, jam said:

Okay, I'm quietly confident that the 6.9 beta is working properly 🤞The parity rebuild is at 10% now without errors; it never reached 5% before.

I'm having similar issues on a x399 TR board - Did the rebuild complete with the 6.9-beta release?

Link to comment
Just now, dcoulson said:

I'm having similar issues on a x399 TR board - Did the rebuild complete with the 6.9-beta release?

It’s still going but it’s reached 75% without issues. I’m using an X399 TR board too (ASRock Taichi) so I’d recommend giving the beta a try.

Link to comment
On 4/4/2020 at 7:50 AM, jam said:

It’s still going but it’s reached 75% without issues. I’m using an X399 TR board too (ASRock Taichi) so I’d recommend giving the beta a try.

Did it complete successfully? I tried the beta and had the same issue. Trying with IOMMU disabled in the BIOS now...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.