(2) Drives with Errors in my array - What steps do I take


Recommended Posts

Hey folks -

 

First time I've encountered an error on UnRaid, running version 6.6.6. The Fix Common Problems plugin is telling me that Disks 3 & 4 have read errors, and Disks 3 & 6 are in Read-only mode.

 

  1. How do I remove the unused disk 6 from the array? I moved all the content off the drive a long time ago, I just need to clean up the array finally.
     
  2. What do I do about Disk 3.... and then Disk 4? So I get my array back into good health.

 

Thanks,

Unraid.PNG

Edited by jeradc
Link to comment
7 hours ago, jeradc said:

How do I removed the unused disk 6 from the array? I moved all the content off the drive a long time ago,

First, click on the folder icon to the right of the emulated disk6 slot, and see if there is anything there you want to keep. Just because the disk is listed as not installed, doesn't mean it's unused. You appear to have 300GB of content on that slot that will be lost if you remove it.

Link to comment

Honestly, from the meager information you have provided, my best advice is to purchase a 4 or 8TB USB drive and make a copy of all the data on the array at the moment. You have less than 3TB used, so that shouldn't be too much trouble.

 

Some of the drives in your array have got to be pushing 15 years old, I'm surprised you haven't had a catastrophic failure. Keep in mind that ALL drives in the array are used to recover a single bad drive, and you are currently in that state, where if you lose another drive, you will lose all data on both disk6 and whichever other drive or drives decides to die next.

 

Your data is in a very risky state right now, I'd back it up to another location asap.

Link to comment

@jonathanm - Disk 6 "not installed" is the (SDD) drive listed as "unassigned" and I am 100% confident it has zero content.

 

Secondly, I'm aware of the age of array, and do a have a cloud backup with crashplan. :)

 

Right now, I just need to know the steps to put the array back together. I have (2) 1.5 TB drives available to add to the array, or to use to replace any of these drives, if needed.

 

thanks!

Link to comment

Disks 3 and 4 issues are likely a controller problem, can't see SMART for either, don't know if it's the controller or if they dropped offline since the syslog cuts off, they are both using a JMicron controller, not recommended for Unraid.

 

A reboot might fix it for now, if if does, and SMART looks good for both, you need to do a new config to remove disk6, assuming any data there doresn't matter.

 

 

Link to comment

I'll do a reboot and check the smart status when I get home from work, thanks.

 

Before I left, I did see that Unraid was doing a check, and the errors kicked up to 30M and 80M respectively @ 34% complete. 0_o

Edited by jeradc
adding more details.
Link to comment

I tried to use the Unraid web GUI to reboot or power down, but there was no response from the system. I held the power button for 8 seconds and the system came back up and said the array has returned to a "good" state.

 

No errors and everything is green. SMART show's no errors, with the short tests returning "completed without error".

 

But I dont trust it, lol.

 

I think I need to do the following:

  1. Remove the unused disk 6 by "doing a new config". (I'll google this I suppose)
  2. Upgrade to 6.6.7
  3. Run a consistency check on the array (unless setting up the new config has that step).
Edited by jeradc
Link to comment

I've done a new config.

It started a parity check, I canceled it

Then started the upgrade to 6.6.7

powered down and removed that dead drive6.

rebooted into 6.6.7 with no issues, and the parity check started again.

 

I'm gonna let that run over night, and then spend some more time removing the drives on the Jmicron controller, and adding in my 1.5 TB drives instead.

 

thanks. So far so good still.

Link to comment
5 hours ago, jeradc said:

Just telling you what Unraid did... and what I did.

 

Maybe a product suggestion to remove the irrelevant check.

If you had done a New Config then it was NOT an irrelevant check!    It would have been instead a Needed build of parity as the New Config invalidates parity.  If you cancelled it then your parity may well be invalid.    

 

After doing the New Config there was a checkbox for indicating parity is already valid before starting the array for the first time that would suppress the parity build but if you check that and parity is not really valid you cannot successfully recover a failed disk.    You better be sure that parity IS valid at that point as checking it incorrectly and suppressing the parity build means your array is effectively unprotected without any obvious indication to this effect.

Link to comment
2 hours ago, jeradc said:

maybe I'm misunderstanding.

 

If I have no data, and remove a drive with no data... does that invalidate my Parity - assuming the empty drive was paritied @ zero) ?

Removing a drive with no data is not the same as saying you are removing a drive that you have run a special zeroing operation on.   If it was zeroed then it is true the drive can be removed without invalidating data.    However Unraid would have no knowledge that you have zeroed the disk (as opposed to simply deleting its files which does not Zero the disk).    In such a special case after going the New Config route you can explicitly tell Unraid that parity is valid before starting the array by setting the check box and then no parity sync (I.e. build) will be done.   It was to handle special cases that the check box was introduced in the first place.  

 

Many people think that a drive that has no data can be removed without invalidating parity.    However this is not true unless after removing the data you explicitly wrote zeroes to every sector on the drive before doing the New Config.

Edited by itimpi
Link to comment
6 minutes ago, ijuarez said:

I'm just piggy backing on his post, cause i don't want to start a new one. Just need a second set of eye on my diags.

Looks like disk 10 (sdk) failed. from the log is it because it lost power or something else.

 

 

lahomamediacenter-diagnostics-20190405-0858.zip

Looks more line a connection issue, though LSI error codes are not very helpful, suggest replacing/swapping cables/backplane slot and see if the issue repeats.

Link to comment
1 minute ago, johnnie.black said:

Looks more line a connection issue, though LSI error codes are not very helpful, suggest replacing/swapping cables/backplane slot and see if the issue repeats.

Thank you, i had that suspicion, this drive is in a 4 (3.5) x 3 (5.25) cage and the previous drive died but it was old. This is drive i just installed a few months back new IronWolf and the same thing has happen. Going to pull it run the seagate tools I'm sure its good and check the connections.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.