(Solved) Disk6 in Error State after using the process of shrinking array with Parity intact


Go to solution Solved by JorgeB,

Recommended Posts

So I've been having issues with UDMA CRC errors. Here's my previous post, UDMA CRC ERRORS FROM ONLY 1 DISK FROM LSI BROADCOM SAS 9300-8I.

 

I have yet to be able to build my new case that removes the backplane thinking that is what the underlying issue is. 

In the meantime I decided to use SpaceInvaderOne's video on shrinking my array and preserving Parity so that I can RMA Disk4, which also had UDMA CRC errors. Disk6 in my other post was out of warranty unfortunately. Thanks @SpaceInvaderOne, your videos are amazing. 

I was able to move all of my data from Disk4 onto other drives, then clear the drive. I had zero issues following his video.

However, when I started the array after unassigning Disk4, Disk6 threw errors and went disabled. 

I just ordered a new hard drive because I don't have a spare. Yes I know, I should always have at least one spare. I'm an idiot.

 

My question is, I don't see any 'bad' errors, like Unallocated or Pending sector. So is the disk actually bad or is there another issue like in my previous post with the bad backplane or maybe a problem with my LSI card?

 

I am 100% going to leave my server off until I get the new drive, then I will rebuild onto it after doing a Preclear, but should I re-build everything into my new case that I got, which doesn't have a backplane, or should I use my current case with the backplane and rebuild my array with the new disk?

I just don't want to keep getting all these errors especially when I'm trying to rebuild a disk. 

 

I'm not sure if the Diagnostics file will show all the errors or not, but I'm including it and Disk6 smart report. 

I also started a SMART extended self-test on the disabled disk. 

Screenshot 2022-04-10 174500.jpg

Screenshot 2022-04-10 174521.jpg

threadripper19-diagnostics-20220410-1708.zip threadripper19-smart-20220410-1709.zip

Edited by FQs19
Topic Solved
Link to comment
5 hours ago, JorgeB said:

Assuming nothing was written to disk6 since it got disable I would do a new config to re-enable it, then run a correcting parity check.

I just want to verify with you, but is this the procedure you would like me to follow to re-enable the disk:

749801002_Screenshot2022-04-11153126.thumb.jpg.3d6f7e8ddb7874c91016c79c6ca96747.jpg

Rebuilding a drive onto itself

I haven't written anything to the disk that I know of. I've disabled mover (set it to run monthly but not til the 1st of the month), disabled my dockers, and haven't written to the data disks from other computers in my house.

 

Thanks again for the help.

Link to comment
  • Solution
10 hours ago, FQs19 said:

I just want to verify with you, but is this the procedure you would like me to follow to re-enable the disk:

No, I mentioned doing a new config: Tools -> New Config, keep all assignments, check parity is already valid before array start, then run a correcting check.

  • Like 1
Link to comment
7 hours ago, JorgeB said:

No, I mentioned doing a new config: Tools -> New Config, keep all assignments, check parity is already valid before array start, then run a correcting check.

Glad I checked with you first. 

 

I did exactly what you said, Tools>New Config>Keep All Assignments, then checked the box for 'Parity is Valid', started array, then started a Correcting Parity Check by checking the box 'Write Corrections to Parity'.

 

It'll take at least 19hrs for it to finish.

After about 2 mins of starting the array, Disk6 received 4 more UDMA CRC Errors. 

I need this parity correcting check to finish so I can shutdown the server and move everything over to my new case ASAP. 

 

Thanks so much for the help.

Link to comment
19 hours ago, JorgeB said:

No, I mentioned doing a new config: Tools -> New Config, keep all assignments, check parity is already valid before array start, then run a correcting check.

Just wanted to give an update:

 

I'm in the middle of the correcting parity check and I'm at 517 sync errors. I'm not seeing any errors on the disks though. 

Should I do another correcting parity check after this finishes or just a parity check?

I understand that a parity check should always come back with 0 errors. So I assume I should do another correcting parity check instead of wasting time doing a parity check then having to do another correcting parity check after finding errors with a parity check. 

 

Unraid Parity check1.jpg

Unraid Parity check2.jpg

threadripper19-diagnostics-20220412-2151.zip

Link to comment
9 hours ago, itimpi said:

If you are running a correcting check then it should be fixing the errors reported.    The next check should be non-correcting and if everything is good will come back with 0 errors.

 

5 hours ago, JorgeB said:

Some sync errors are expected because of what happened, just let it finish, like mentioned by itimpi you can then run a non correcting check if you want to confirm all is fine, and it should be.

Thank you both for the help.

I'm at 91% on the parity check and the errors are still at 517.

I'll let it finish then run a non-correcting parity check to confirm 0 errors.

I can then finally switch this server over to my new case that doesn't have a backplane. 

9200908_Screenshot2022-04-13082812.thumb.jpg.3439a938b66e00f895e12b575402d76e.jpg

Link to comment
  • 4 weeks later...

I finally had time to move my server into a new case. 

I moved to a Rosewill RSV-L4500U, which doesn't have a SATA backplane/Hot Swap bays. It has three 5 disk bays. I ordered new SATA power cables because my power supply only has four 1x3 ones. I wish they made 1x5 cables, but I got 1x4 from EVGA 4x SATA Cable (Single). I also got extensions to run to the 5th disk in each bay from my fourth SATA power cable. Cable management is difficult without purchasing custom cables (which I'm not going to do). 

Wish I could use the software for the MSI MPG Series CORELIQUID K360 AIO I have for it, but I'm just using the motherboard's fan controllers. 

 

I have 4 disks (including the 2 Parity disks) connected to my motherboard's TRX40 chipset controller.

I have the remaining 8 disks connected to my HBA LSI 9300 8i card.

I will have 2 disks that will be connected to my motherboard's ASmedia's controller as spares.

 

I'm currently running a Correcting Parity Check. I will do a Non-Correcting Parity Check after it finishes. 

I'm sure there will be errors since I stopped the Correcting Parity Check before it finished. 

 

I haven't seen any errors being reported since switching to the new case. Fingers crossed that it was the old case's backplane causing all my errors. 

IMG_0368.thumb.JPG.8bb9e43de3fddc034ef27952f0b50d6f.JPG

 

If you guys have any suggestions on what to do after I complete my parity checks, please let me know. 

Thanks again for the help.

Link to comment
  • FQs19 changed the title to (Solved) Disk6 in Error State after using the process of shrinking array with Parity intact

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.