Jump to content

Parity swap: Second data drive has disabled after starting data rebuild


Go to solution Solved by JorgeB,

Recommended Posts

Hi all,

 

I'm in the middle of completing the parity swap procedure - new 12TB drive replacing existing 10TB parity drive which will then replace an old 3TB data drive that had disabled itself.

 

The copy of parity to the 12TB drive has completed successfully and I have just commenced the final step of starting up the array for the data rebuild onto what was the old 10TB parity drive. About 15 mins into this step another data drive, (not involved in the swap procedure at all) has gone and disabled itself.

 

I would appreciate advice on what is my best/safest course of action now?

 

Have attached  a screenshot and diagnostics for reference. Thanks!

Parity swap - another drive disabled.png

themagiceye-diagnostics-20230505-1034.zip

Edited by nametaken_thisonetoo
Correcting errors
Link to comment
  • nametaken_thisonetoo changed the title to Parity swap: Second data drive has disabled after starting data rebuild
21 hours ago, JorgeB said:

Disk2 dropped offline, looks more like a power/connection issue, but since it dropped there's no SMART, let the rebuild finished then check/replace cables for disk2 and post new diags after array start.

OK so have now replaced all my sata cables including for disk 2. Powering on see's a message suggesting the array has turned good and now has no disks with read errors. Do I now just stop the array and add disk 2 back into the array? Diagnostics also attached - thanks again

 

image.thumb.png.16129413d75e97e877cffb4459ea18c1.png

 

themagiceye-diagnostics-20230506-1459.zip

Link to comment
21 hours ago, JorgeB said:

Disk2 dropped offline, looks more like a power/connection issue, but since it dropped there's no SMART, let the rebuild finished then check/replace cables for disk2 and post new diags after array start.

Sheesh have also just noticed the 10TB data drive that just finished rebuilding is reporting as Unmountable: Wrong or no file system. Can be seen in the screenshot of previous post.

Not sure what this is about as the rebuild appeared successful.

Link to comment
6 hours ago, JorgeB said:

Check filesystem on disk3.

Things seem very off. First -nv check ran for about a second and reported no issues. Ran it again and had literally thousands of lines as the test ran, then provided the following results, again with no suggested repairs. Very unsure what to do from here - format disk 3 and rebuild it again?

image.thumb.png.39045e18ff63c6d886f076cc9a2c8265.png

themagiceye-diagnostics-20230507-0010.zip

Link to comment
4 minutes ago, itimpi said:

A rebuild will not fix unmountable drive, and a format will wipe its contents so you do not want to do that.!  
 

You might want to run an extended SMART test on disk3 to check its health.

 

Were the diagnostics taken after trying the check filesystem?   Asking as the last thing in the syslog seems to be a cancellation of a correcting parity check.

 

 

Yep the diagnostics were taken immediately after the second test. I have no idea how/why there was a cancellation of a correcting parity check. So many strange things keep happening

Link to comment
19 hours ago, itimpi said:

A rebuild will not fix unmountable drive, and a format will wipe its contents so you do not want to do that.!  
 

You might want to run an extended SMART test on disk3 to check its health.

 

Were the diagnostics taken after trying the check filesystem?   Asking as the last thing in the syslog seems to be a cancellation of a correcting parity check.

 

 

Extended SMART test has completed without error. Have attached the results as well as latest Diagnostics. @JorgeB will run a memtest next and see how that goes. I do have an issue with memory filling up doe to poor config somewhere in one of my containers - but I assume that's not going to contribute to this? Besides Docker has been disabled since this all started last week.

 

themagiceye-smart-20230507-1811.zip themagiceye-diagnostics-20230507-1838.zip

Edited by nametaken_thisonetoo
Clarity
Link to comment

Apolgies @JorgeB for the noob question, but I've started the memtest. Has been running about 10 mins and all that's happened is below the Boot Options menu is a single line of text that says "Loading /memtest... ok". Should something else have happened/progressed, or is this normal behaviour?

Also curious how long I should run the test for - anywhere from a few hours to multiple days seems to be recommended in various forum posts. Thanks again

Link to comment
5 hours ago, JorgeB said:

Run a quick memtest to rule out obvious RAM issues and if OK run xfs_repair again without -n.

Alrighty, so passed the memtest without any errors, ran the xfs repair again without the -n. It found some issues (see screenshot), but appears to have fixed them as when I ran it again with the -n there were no issues this time around. Fired up the array out of maintenance mode, and Disk 3 is back!

Thank you so much!

One last question - given that the SMART data for disk 2 seems fine, if the best plan just to rebuild the drive on top of itself?

 

image.thumb.png.376a3ee73047ce6bc03d79934fc38c09.png

 

themagiceye-smart-20230507-2336.zip

Edited by nametaken_thisonetoo
added SMART data
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...