Is there another way to add drive back to my array?

danktankk · October 2, 2020

Here is the situation I currently have:

I am pretty sure I have a bad set of breakout cables that is coming out of one of my my LSI SAS 9201 16i ports and the drive has been taken from the array.

image.png.daf765271d4bf34667b84301f6d3d47d.png

This also happened during a scheduled parity check.

The only way I know to fix this is to remove the faulty drive from the array, format, and re-add it and let it do a parity rebuild. I dont know if that will work this time due to it having been in the middle of a parity check when the drive started getting I/O errors and was subsequently removed. I plan on replacing the faulty cable as I have another one here. I was just wondering if there is a way other than risking 4TB of data to restore this drive to operational since is/was a bad cable?

I would hate to do a parity rebuild on this if I didnt have to in this case.

Thank you for any help you can offer!

trurl · October 2, 2020

4 minutes ago, danktankk said:

The only way I know to fix this is to remove the faulty drive from the array, format, and re-add it and let it do a parity rebuild.

I get scared whenever anyone mentions format in the same post as rebuild. No need to remove and certainly no need to format. Especially important to NOT format with the disk in the array.

5 minutes ago, danktankk said:

dont know if that will work this time due to it having been in the middle of a parity check

Was it a correcting parity check?

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

danktankk · October 2, 2020

I thought that I had mentioned that I first would remove the drive from array, then format, then re-add back to array - then parity rebuild.

It was doing a scheduled parity check. Not a parity rebuild when this happened.

Here is the diagnostic file you requested as well. Thank you for the reply!

baconator-diagnostics-20201002-1826.zip

JonathanM · October 2, 2020

59 minutes ago, danktankk said:

I first would remove the drive from array, then format,

Can you elaborate on that a little bit? Exactly how do you plan to do that, and why?

trurl · October 2, 2020

55 minutes ago, danktankk said:

I thought that I had mentioned that I first would remove the drive from array, then format, then re-add back to array - then parity rebuild.

You did mention that. Removing and formatting is totally unnecessary. Formatting outside the array would be totally pointless before using the disk for rebuild since rebuild would completely overwrite the disk regardless of how it was formatted.

1 hour ago, trurl said:

Was it a correcting parity check?

58 minutes ago, danktankk said:

a scheduled parity check

In Settings - Scheduler - Parity Check, there is an option to write corrections during the scheduled check or not. The usual recommendation is No. Check that setting and let us know and also what time the check is scheduled to start. I'm trying to sort through syslog to see but a lot of nvidia related spam in there.

danktankk · October 3, 2020

@trurl I know the spam to which you are referring. I dont know of a way to get rid of it as it is related to nvidia unraid, but i forget how. Here is the parity check schedule and corrections setting. Thank you for your time on this!

image.png.90144c97fc2746099ceb2c7d0e0cc5da.png

@jonathanm

The reason I would remove the drive is because unraid has effectively shut it down due to the I/O error it was experiencing - my guess - from a faulty cable. The format, which was just explained to me, is unnecessary and I would not be doing that now. I hope this helps.

trurl · October 3, 2020

And also no reason to remove unless you intend to replace it with a different disk. Still trying to decide whether or not to believe in parity or not. I filtered out a lot of that.

So the scheduled parity check was configured to write corrections to parity. Did you let it complete?

Apparently you rebooted after that scheduled parity check

Oct  1 22:48:06 baconator kernel: Linux version 4.19.107-Unraid (root@38721b48cdfb) (gcc version 9.2.0 (GCC)) #1 SMP Sun Mar 8 14:34:03 CDT 2020

no indication of an unclean shutdown, but another parity check started soon after the reboot.

Oct  1 22:50:58 baconator kernel: mdcmd (45): check 
Oct  1 22:50:58 baconator kernel: md: recovery thread: check P ...

Did you do start that one? Was it also a correcting check? Or was it actually rebuilding parity? Still ongoing 9+ hours later

Oct  2 08:00:01 baconator root: Parity Check / rebuild in progress.  Not running mover

Then lots of read errors on disk3, finally the write errors that disabled it. Possibly Unraid tried to write the calculated data back to the disk it couldn't read. And finally it stopped itself.

Oct  2 12:48:51 baconator kernel: md: disk3 read error, sector=10938752032
Oct  2 12:48:51 baconator kernel: md: disk3 write error, sector=15780546472
...
Oct  2 12:49:27 baconator kernel: md: recovery thread: exit status: -4

My inclination at this point is to not believe in parity, but instead do New Config, and rebuild parity instead of rebuilding disk3. After you have fixed your hardware, of course.

The main problem with that idea would be if disk3 was corrupted somewhere along the way but that should become apparent immediately when starting the array. Or you could New Config without parity initially just to check if disk3 mounts, but not sure there would be much point since parity would definitely be out-of-sync that way. If there was disk3 corruption that could be dealt with separately.

The SAFEST approach would be to rebuild disk3 to a new disk and save the original disk3 and decide later which version of disk3 looked the best. Of course, that would require another disk.

See if there is anything else you would like to add, and maybe wait to see if someone else has a different opinion or approach.

trurl · October 3, 2020

One more bit of encouraging information, though not sure it definitively answers the questions. From the diagnostics emulated disk3 is mounted, so that would seem to indicate that rebuilding disk3 from parity should be successful.

danktankk · October 3, 2020

Quote

And also no reason to remove unless you intend to replace it with a different disk. Still trying to decide whether or not to believe in parity or not. I filtered out a lot of that.

I never meant physically remove. Just to remove from array.

Quote

So the scheduled parity check was configured to write corrections to parity. Did you let it complete?

No. I paused it and it lost its progress.

Quote

Apparently you rebooted after that scheduled parity check

I rebooted on Sunday, yes, but the parity check was not running then that i know of. There was a parity check that started after an unclean shutdown. Some of the docker containers werent behaving correctly and I was forced to reboot unclean.

I have 2 14TB shucks coming in tomorrow, supposedly. One day shipping is iffy at best. I'll just wait until that point to rebuild from parity.

Quote

From the diagnostics emulated disk3 is mounted, so that would seem to indicate that rebuilding disk3 from parity should be successful.

It will take almost 2 days to rebuild from parity, but I guess I should just go ahead and do it. The original question was me wondering if there is a way to reset something in unraid so it will just "see" the drive that experienced those errors and "try again" without rebuilding. I guess that will not be the case though.

Edited October 3, 2020 by danktankk

Is there another way to add drive back to my array?

Recommended Posts

danktankk

Link to comment

trurl

Link to comment

danktankk

Link to comment

JonathanM

Link to comment

trurl

Link to comment

danktankk

Link to comment

trurl

Link to comment

trurl

Link to comment

danktankk

Link to comment

Join the conversation