[SOLVED] Replacing failing data drive


Recommended Posts

I have a data drive in my array that starting exhibiting read errors on Sunday night, and reported current pending sectors overnight on the same night.  Luckily I had a new 8TB drive I was planning on installing into my server so that I could retire my oldest 2TB drives.  Just so happens that this drive reporting these errors is one of these drives.  I installed the drive into the server on Monday morning and ran a preclear on it which completed this morning.

 

So I've read in the wiki and a few other posts on the forum that to replace a failing drive is as simple as removing the failing drive from the array and then add the new drive to the same slot and tell the array to rebuild the drive using the calculations from the other disks and parity to rebuild the data.  I also found a post where someone said they run a parity check before doing this to make sure they have good parity.  I run a correcting parity check on the first of every month, and the last check on 3/1 found 2 errors.  Prior to that, I rebooted my server on 2/28 without stopping the array (just hit the reboot button in the GUI), and when the server came up it performed a parity check for some reason.  It also found 2 errors.  I'm assuming the 2 errors it found in that check were not corrected since the next scheduled parity check on 3/1 found 2 errors.  Monthly parity checks prior to 2/28 found 0 errors.  

 

So my question is - should I just go ahead and follow the wiki procedure to replace the failing drive?  Or would I be better to do something else like move the data from the failing drive to the new drive using rsync or something similar?  I know 2 errors is not a lot of errors, but it's more than 0 and I would prefer not to risk losing any data on the drive.

Edited by mlounsbury
Solved
Link to comment
12 minutes ago, mlounsbury said:

I also found a post where someone said they run a parity check before doing this to make sure they have good parity.

This should not be done if there's a known failing disk.

 

12 minutes ago, mlounsbury said:

I run a correcting parity check on the first of every month

Scheduled checks should be non correcting.

14 minutes ago, mlounsbury said:

and the last check on 3/1 found 2 errors.  Prior to that, I rebooted my server on 2/28 without stopping the array (just hit the reboot button in the GUI), and when the server came up it performed a parity check for some reason.  It also found 2 errors.  I'm assuming the 2 errors it found in that check were not corrected since the next scheduled parity check on 3/1 found 2 errors.  Monthly parity checks prior to 2/28 found 0 errors. 

That's likely what happened, and after an unclean shutdown is the only time some sync errors after a check are acceptable, even expected.

 

15 minutes ago, mlounsbury said:

So my question is - should I just go ahead and follow the wiki procedure to replace the failing drive? 

Since there's a failing disk best course of action is to replace it now using the standard procedure.

 

Link to comment
23 hours ago, johnnie.black said:

This should not be done if there's a known failing disk.

Thanks, I thought that was weird but wanted to confirm.  It is also possible this person was saying to do this when replacing a known good drive.

 

23 hours ago, johnnie.black said:

Scheduled checks should be non correcting.

I've done some more reading on this and have gone ahead and turned off correcting errors on my monthly checks.

 

23 hours ago, johnnie.black said:

That's likely what happened, and after an unclean shutdown is the only time some sync errors after a check are acceptable, even expected.

Just find it weird that it ran the check, even when the text next to the button says it will initiate a clean reset.  I've always stopped the array before rebooting or shutting down but I figured since the text now says it's a clean shutdown I'd be okay.

 

23 hours ago, johnnie.black said:

Since there's a failing disk best course of action is to replace it now using the standard procedure.

I went ahead and stopped the array and moved the new disk into the slot where the failing disk was and had it rebuild.  It completed early this morning with no errors, and everything looks okay.  I just started a non-correcting parity check to ensure everything is good.  Thanks for the help! 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.