nametaken_thisonetoo Posted May 5, 2023 Share Posted May 5, 2023 (edited) Hi all, I'm in the middle of completing the parity swap procedure - new 12TB drive replacing existing 10TB parity drive which will then replace an old 3TB data drive that had disabled itself. The copy of parity to the 12TB drive has completed successfully and I have just commenced the final step of starting up the array for the data rebuild onto what was the old 10TB parity drive. About 15 mins into this step another data drive, (not involved in the swap procedure at all) has gone and disabled itself. I would appreciate advice on what is my best/safest course of action now? Have attached a screenshot and diagnostics for reference. Thanks! themagiceye-diagnostics-20230505-1034.zip Edited May 5, 2023 by nametaken_thisonetoo Correcting errors Quote Link to comment
Solution JorgeB Posted May 5, 2023 Solution Share Posted May 5, 2023 Disk2 dropped offline, looks more like a power/connection issue, but since it dropped there's no SMART, let the rebuild finished then check/replace cables for disk2 and post new diags after array start. 1 Quote Link to comment
nametaken_thisonetoo Posted May 6, 2023 Author Share Posted May 6, 2023 21 hours ago, JorgeB said: Disk2 dropped offline, looks more like a power/connection issue, but since it dropped there's no SMART, let the rebuild finished then check/replace cables for disk2 and post new diags after array start. OK so have now replaced all my sata cables including for disk 2. Powering on see's a message suggesting the array has turned good and now has no disks with read errors. Do I now just stop the array and add disk 2 back into the array? Diagnostics also attached - thanks again themagiceye-diagnostics-20230506-1459.zip Quote Link to comment
nametaken_thisonetoo Posted May 6, 2023 Author Share Posted May 6, 2023 21 hours ago, JorgeB said: Disk2 dropped offline, looks more like a power/connection issue, but since it dropped there's no SMART, let the rebuild finished then check/replace cables for disk2 and post new diags after array start. Sheesh have also just noticed the 10TB data drive that just finished rebuilding is reporting as Unmountable: Wrong or no file system. Can be seen in the screenshot of previous post. Not sure what this is about as the rebuild appeared successful. Quote Link to comment
nametaken_thisonetoo Posted May 6, 2023 Author Share Posted May 6, 2023 6 hours ago, JorgeB said: Check filesystem on disk3. Things seem very off. First -nv check ran for about a second and reported no issues. Ran it again and had literally thousands of lines as the test ran, then provided the following results, again with no suggested repairs. Very unsure what to do from here - format disk 3 and rebuild it again? themagiceye-diagnostics-20230507-0010.zip Quote Link to comment
itimpi Posted May 6, 2023 Share Posted May 6, 2023 A rebuild will not fix unmountable drive, and a format will wipe its contents so you do not want to do that.! You might want to run an extended SMART test on disk3 to check its health. Were the diagnostics taken after trying the check filesystem? Asking as the last thing in the syslog seems to be a cancellation of a correcting parity check. Quote Link to comment
nametaken_thisonetoo Posted May 6, 2023 Author Share Posted May 6, 2023 4 minutes ago, itimpi said: A rebuild will not fix unmountable drive, and a format will wipe its contents so you do not want to do that.! You might want to run an extended SMART test on disk3 to check its health. Were the diagnostics taken after trying the check filesystem? Asking as the last thing in the syslog seems to be a cancellation of a correcting parity check. Yep the diagnostics were taken immediately after the second test. I have no idea how/why there was a cancellation of a correcting parity check. So many strange things keep happening Quote Link to comment
JorgeB Posted May 7, 2023 Share Posted May 7, 2023 Run a quick memtest to rule out obvious RAM issues and if OK run xfs_repair again without -n. Quote Link to comment
nametaken_thisonetoo Posted May 7, 2023 Author Share Posted May 7, 2023 (edited) 19 hours ago, itimpi said: A rebuild will not fix unmountable drive, and a format will wipe its contents so you do not want to do that.! You might want to run an extended SMART test on disk3 to check its health. Were the diagnostics taken after trying the check filesystem? Asking as the last thing in the syslog seems to be a cancellation of a correcting parity check. Extended SMART test has completed without error. Have attached the results as well as latest Diagnostics. @JorgeB will run a memtest next and see how that goes. I do have an issue with memory filling up doe to poor config somewhere in one of my containers - but I assume that's not going to contribute to this? Besides Docker has been disabled since this all started last week. themagiceye-smart-20230507-1811.zip themagiceye-diagnostics-20230507-1838.zip Edited May 7, 2023 by nametaken_thisonetoo Clarity Quote Link to comment
nametaken_thisonetoo Posted May 7, 2023 Author Share Posted May 7, 2023 Apolgies @JorgeB for the noob question, but I've started the memtest. Has been running about 10 mins and all that's happened is below the Boot Options menu is a single line of text that says "Loading /memtest... ok". Should something else have happened/progressed, or is this normal behaviour? Also curious how long I should run the test for - anywhere from a few hours to multiple days seems to be recommended in various forum posts. Thanks again Quote Link to comment
JorgeB Posted May 7, 2023 Share Posted May 7, 2023 You should see the memtest screen, note that it only works for legacy boot, not UEFI, if you can only boot UEFI download the free Passmark memtest, in either case then run one pass. 1 Quote Link to comment
nametaken_thisonetoo Posted May 7, 2023 Author Share Posted May 7, 2023 (edited) 5 hours ago, JorgeB said: Run a quick memtest to rule out obvious RAM issues and if OK run xfs_repair again without -n. Alrighty, so passed the memtest without any errors, ran the xfs repair again without the -n. It found some issues (see screenshot), but appears to have fixed them as when I ran it again with the -n there were no issues this time around. Fired up the array out of maintenance mode, and Disk 3 is back! Thank you so much! One last question - given that the SMART data for disk 2 seems fine, if the best plan just to rebuild the drive on top of itself? themagiceye-smart-20230507-2336.zip Edited May 7, 2023 by nametaken_thisonetoo added SMART data Quote Link to comment
JorgeB Posted May 8, 2023 Share Posted May 8, 2023 SMART look OK, if the emulated disk is mounting and contents look correct you can rebuild on top, may be a good idea to replace/swap cables to rule that out. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.