Jump to content

Failed drive replacement


JarDo

Recommended Posts

I woke up this morning to a failed drive  :-[

 

Right now, I'm in the process of copying all contents to another location on my array and then I'll swap the drive out for a new one.

 

Is the procedure here still accurate for v6.1 rc5:

 

https://lime-technology.com/wiki/index.php/Replacing_a_Data_Drive

 

I was able to catch diagnostics with a syslog that shows the failure event.  Can anyone tell me if the nature of the drive failure can be determined. 

 

unraid-diagnostics-20150818-0737.zip

Link to comment

I woke up this morning to a failed drive  :-[

 

Right now, I'm in the process of copying all contents to another location on my array and then I'll swap the drive out for a new one.

 

Is the procedure here still accurate for v6.1 rc5:

 

https://lime-technology.com/wiki/index.php/Replacing_a_Data_Drive

 

I was able to catch diagnostics with a syslog that shows the failure event.  Can anyone tell me if the nature of the drive failure can be determined.

Are you sure the drive has failed rather than a cabling/controller issue that took the drive off-line?  There was nothing obvious in the SMART reports, but the syslog showed lots of read/write/IO errors.  This can mean that the system just lost contact with the drive rather than the drive actually failing.
Link to comment

According to your diagnostics, you have drives 1-4 OK, then no drives 5-8, then drive 9 OK and drive 10 disabled.

 

Is this true that you have skipped slots 5-8?

 

Also, there is no smart data for the disabled drive10 = WDC_WD30EZRX-00DC0B0_WD-WMC1T1583275

 

Yes.  Drives 5-8 are empty slots as a result my drive consolidation from smaller to larger drives.  Drive10 is my failed drive.

Link to comment

I just finished rewriting that page a day or 2 ago (except the last section), to bring it up to date.  Let me know if you have any problems with it, or any suggestions for improvement.

 

Well, I followed your procedure and it is working (rebuilding right now). 

 

I was a little confused looking for a check box on the same page where I assigned the new drive to the slot.  It took me a minute to realize I had to change to another page to find the checkbox under the button to start the array.

 

But, I still think the instructions are accurate.  Maybe I'm just the kind of person who needs more pictures.

Link to comment

I just finished rewriting that page a day or 2 ago (except the last section), to bring it up to date.  Let me know if you have any problems with it, or any suggestions for improvement.

 

Well, I followed your procedure and it is working (rebuilding right now).

Great!

 

I was a little confused looking for a check box on the same page where I assigned the new drive to the slot.  It took me a minute to realize I had to change to another page to find the checkbox under the button to start the array.

I wasn't sure what you meant at first, finally figured out you must be in Tabbed mode!  I'll add a note about that.

 

But, I still think the instructions are accurate.  Maybe I'm just the kind of person who needs more pictures.

I haven't been sure how far to go with pictures.  The page a week ago, was about as minimal as it could get, and I was only trying to bring it up to date, without destroying the original author's 'style'.  I wish we had a library of pictures somewhere.  I'm not comfortable tearing my system down, just to get good pictures of some of the more dangerous or radical procedures.  And the pictures that show up on the forums are usually there for a reason, something's wrong!  They aren't necessarily useful as an example for others.

Link to comment

I was a little confused looking for a check box on the same page where I assigned the new drive to the slot.  It took me a minute to realize I had to change to another page to find the checkbox under the button to start the array.

I wasn't sure what you meant at first, finally figured out you must be in Tabbed mode!  I'll add a note about that.

I prefer Tabbed mode, maybe because it seems easier to flip between tabs than to scroll to different sections, especially on my 13" notebook. But the feature that let's you choose which you want means that it's kind of tricky when trying to tell people what to do. I usually try to remember to call the tabs "sections" but I don't guess that really helps since this user is thinking of tabs as different pages. Maybe we will have to spell it out, like "tab/section" or something.

 

***Additional thoughts***

Is section the right word for what I am talking about here?

 

I notice some tabs themselves have sections, so a section isn't necessarily a tab. Is there any way we can tell just by looking when a section will become a tab if we toggle that setting?

Link to comment

Ok.  Now, I'm a little frustrated.

 

I replaced the drive.  The rebuild completed.  I started a parity check...  And now, the new drive is disabled before parity could finish.

 

Come to think of it, the first drive "failed" during a parity check. 

 

It seems the port (cabling?) is the problem, not the drive.

 

UPDATE:

I have other slots I could use so I stopped the server, shutdown and moved the drive to another slot.  I didn't unassign the drive prior to shutting down, because it was already unassigned.

 

After restarting, the server recognized the drive in the different slot and automatically re-assigned it as disk10 (its prior assignment).  The GUI even says that the "configuration is valid".

 

But, when I start the server disk10 has a status of "Device is Disabled, Contents Emulated". 

 

I'm not sure what I should do.  It's not accepting the drive and it's not rebuilding it either.

Link to comment

JarDo:  Can you provide another Diagnostics zip?

 

trurl:  I like the word 'section', decided to use it, whether they are all on one page, or on separate tabs.  I added the following simple line to the wiki -

8. Go to the Array Operation section (either scroll down to it, or go to its tab page)

 

Not too many words, but seems clear enough to me.  What do you think?

Link to comment

trurl:  I like the word 'section', decided to use it, whether they are all on one page, or on separate tabs.  I added the following simple line to the wiki -

8. Go to the Array Operation section (either scroll down to it, or go to its tab page)

 

Not too many words, but seems clear enough to me.  What do you think?

I think I would just say tab, instead of tab page. To me a page can contain multiple sections or tabs. Maybe something like:

Tab or scroll to Array Operations.

 

Most of the time I just use a short hand like:

Go to Main - Array Operations.

Link to comment

Of course, this latest diagnostic doesn't actually capture the event that disabled the drive again.

...  The GUI even says that the "configuration is valid".

 

But, when I start the server disk10 has a status of "Device is Disabled, Contents Emulated"...

Seems odd that it would have said valid and then decided to disable it after starting. When you say
...  It's not accepting the drive and it's not rebuilding it either.

Do you mean you tried to rebuild it again but it won't let you?
Link to comment

Seems odd that it would have said valid and then decided to disable it after starting. When you say

...  It's not accepting the drive and it's not rebuilding it either.

Do you mean you tried to rebuild it again but it won't let you?

When the array is stopped, the server recognizes the drive as assigned to disk10 and next to the Start button is the message "Stopped. Configuration is Valid".  There is no option to start the rebuild process.  So I press the start button, the array starts, but disk10 is emulated.  I don't know what to do next.

Link to comment

you have to unassign the drive, start the array, stop the array, re-assign the drive, start the array.

 

This is needed to get unRAID to "forget" the drive that was in slot10. By re-assigning it, it makes unRAID think you've installed a new drive and it rebuilds the contents onto that drive.

 

I would run your old drive over a few preclears (once disk10 is rebuilt) to see if its still good, then re-add it to the array.

 

If this is true, make sure to mark that original slot as defective so you don't use it in the future.

Link to comment

you have to unassign the drive, start the array, stop the array, re-assign the drive, start the array.

 

This is needed to get unRAID to "forget" the drive that was in slot10. By re-assigning it, it makes unRAID think you've installed a new drive and it rebuilds the contents onto that drive.

Yes!  That did it.  Data is rebuilding right now.  Thank you for the quick answer.  I could get the rebuild started before leaving to work for the day, and that makes me much more comfortable.

 

This procedure makes sense, but until someone recommended it I was a bit hesitant to restart the array with the drive unassigned.  But,it was no problem at all. 

 

I would run your old drive over a few preclears (once disk10 is rebuilt) to see if its still good, then re-add it to the array.

 

If this is true, make sure to mark that original slot as defective so you don't use it in the future.

 

I was thinking the same thing.  Until I have a chance to tear the hardware apart and check the cabling I've already marked the slot as bad.  Once my array is in a 'protected' mode again I'm going to pre-clear and test the old drive.

Link to comment

trurl:  I like the word 'section', decided to use it, whether they are all on one page, or on separate tabs.  I added the following simple line to the wiki -

8. Go to the Array Operation section (either scroll down to it, or go to its tab page)

 

Not too many words, but seems clear enough to me.  What do you think?

I think I would just say tab, instead of tab page. To me a page can contain multiple sections or tabs. Maybe something like:

Tab or scroll to Array Operations.

 

Most of the time I just use a short hand like:

Go to Main - Array Operations.

 

I agree, 'tab' is better than 'tab page', but just 'Main -> Array Operation' is even better, cleaner.  Thanks!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...