Jump to content

replace failed drive help


beno

Recommended Posts

My array is half dead. Diagnostics attached.

Pretty simple config of 4TB parity, 2TB and 3TB data.  2TB drive is dying or dead, lots of read errors.  I was expecting to be able to remove the 2TB drive and put in another and choose some sort of rebuild operation but I'm lost.  It says wrong disk or incorrect configuration.  I added the replacement to disk 3 on a whim not really knowing why, hoping I could find the magic rebuild command.  I'm wondering if this "mystery disk7" is keeping me from rebuilding.  I probably added that drive for capacity and then it didn't work and and didn't cleanly remove it.  Ugh. 

1. How do I remove the mystery disk 7?

2. How do I rebuild dead 2TB on new 2TB?

image.thumb.png.d90b9242809044643b951083b30a85ac.png

nas-olsen-diagnostics-20191105-0940.zip

Edited by beno
uploaded screenshot
Link to comment

A little unclear how you got here.

 

Your first mistake was not asking for help when you were having trouble with disk7. Since you already had a missing disk7, your array had no redundancy to allow you to replace any other disk.

 

Disk1 SMART isn't showing in the diagnostics. Are your sure it is really dead? Bad connections and cables are more common than bad disks.

 

The thing I am having trouble understanding though is disk3. Unraid seems to indicate it is the correct disk in that slot, but you say it is a replacement.

 

See if you can describe exactly how you got to this point, step by step, starting from the last point in time when all of the array was showing no problems.

 

 

Link to comment

There isn't really a "rebuild" operation to choose. When you replace a disk and start the array, Unraid rebuilds to the replacement. But in order for that to work, or even be allowed, ALL other disks must be available. Parity by itself cannot rebuild anything. Parity PLUS all other disks are required to rebuild a missing disk.

 

If you have dual parity then you could rebuild 2 missing disks, but that doesn't apply to you. You have 2 missing disks according to your screenshot, so rebuilding is not possible.

 

If you can give us a better understanding how this happened, it might be possible to get Unraid to rebuild one of the missing disks by telling it that all the other disks are OK. But we need to know exactly how you got here in enough detail to let us at least take a guess at what disks parity is based on.

 

Do you have backups of anything important and irreplaceable? The simplest way forward is probably just to save disk2 and format the rest.

Link to comment

The fact that all except disk2 is showing unmountable makes me wonder which disks if any actually had any data. If you can tell us which disks should have data it might be possible to repair their filesystems individually and then start from whichever disks work and can be repaired to build a new array.

Link to comment

And your screenshot is showing the array as started, which also brings up the question of disk3 and how you got here, since in your first sentence you seemed to imply there was only 2 data disks.

 

Did you ever go to Tools - New Config when you were stumbling around? Was there any point in any of this where you rebuilt parity?

Link to comment

Yes, Disk 7 is a bit of a mystery to me.

I've replaced the cable and tried a different SATA port for original 2TB but still have read errors stopping the array.

Like I said, I don't know what I'm doing and I'm lost. I appreciate your help. I put disk3 in as a replacement but when I couldn't find a "rebuild" option, I thought I might have to add it to a different disk.  At the time, I didn't understand the ramifications of disk7.

 

1. A couple years ago everything was working great

2. I added mystery disk7 for some reason more than a year ago to add capacity I think. It didn't work and things were still "working" so I removed it physically but apparently I've been running around with a false sense of security ever since.

3. A month or so ago I started rebooting the server as a "fix" because the shares weren't available (I know)

4. Last night, I finally got around to looking into unraid config and hardware

5. Tried to figure out what drive was the issue. Tried a new cable and putting both drives in different SATA ports, determined that the 3TB drive was fine but 2TB was not. 2TB drive is still attached but not being recognized anymore.  It could be that Disk2 was 2TB and Disk1 was 3TB and I switched them for current config.

6. Inserted another 2TB drive and tried to select it in the old 2TB slot.  Tried to find a rebuild command or something. Noted that when selecting the 2TB drive, it said wrong drive or incorrect config.

7. Looked online for help

8. Asked you guys

 

I can transfer all the 3TB data fine and it looks like I'm SOL for the 2TB unless I can do some data recovery on it.  I've used DDrescue before with success.  Does the horrible timeline above help at all?  Would any other info/logs/pictures help? Thanks again for your help.

Link to comment
1 hour ago, trurl said:

There are 2 other "disks" listed in SMART in your Diagnostics, but they aren't reporting anything except:


/dev/sdb: Unknown USB bridge [0x05e3:0x0729 (0x405)]

Are you trying to use USB connections in your array?

I might have in the past. It could have been mystery 7.  Unraid is running from a USB stick.  When my shares were down, I tried to access a file recently using another stick but didn't try to incorporate it into the array.

Link to comment
57 minutes ago, trurl said:

And your screenshot is showing the array as started, which also brings up the question of disk3 and how you got here, since in your first sentence you seemed to imply there was only 2 data disks.

 

Did you ever go to Tools - New Config when you were stumbling around? Was there any point in any of this where you rebuilt parity?

Ugh, yes. Yes I did try new config.  No, I didn't rebuild parity. I did try to format the new 2TB drive.

Link to comment
1 hour ago, trurl said:

The fact that all except disk2 is showing unmountable makes me wonder which disks if any actually had any data. If you can tell us which disks should have data it might be possible to repair their filesystems individually and then start from whichever disks work and can be repaired to build a new array.

Pretty sure Disk1 and Disk2 were the data disks.  I can get them working together in harmony and shares are up but only for a minute before errors occur for the 2TB drive.

Link to comment
1 hour ago, trurl said:

And your screenshot is showing the array as started, which also brings up the question of disk3 and how you got here, since in your first sentence you seemed to imply there was only 2 data disks.

 

Did you ever go to Tools - New Config when you were stumbling around? Was there any point in any of this where you rebuilt parity?

I got here by stumbling unfortunately. Disk3 is the replacement 2TB drive that I added to the array out of complete desperation and lack of understanding.

Link to comment

beno, don't panic and start making changes without some guidance, I suspect this isn't as bad as it looks at first glance and that trurl will be able to assist you, just be patient. I'm pretty new to this so please don't make changes based on what I'm about to say but I think that once you remove disk 7 from your configuration you can rebuild disk 1 on your replacement disk 3 (which will probably need to be put in the slot where 1 was) using parity and disk 2. Just sit tight until someone with more experience chimes in.

Edited by Dissones4U
added info
Link to comment
2 hours ago, Dissones4U said:

I think that once you remove disk 7 from your configuration you can rebuild disk 1 on your replacement disk 3

No, it is not possible to remove a disk without rebuilding parity, and of course, rebuilding parity would make it impossible to rebuild any data disk. At this point, I think we can forget about rebuilding anything since it seems extremely unlikely parity is even valid.

 

If he ran with a missing disk for some time he hasn't even been able to do a parity check, since parity can't be checked in that state. All it would do if you tried is a "read check", which doesn't do anything except see if the disks can be read.

 

New Config is going to be the way forward. Probably disk2 is OK. Maybe we can get some data from the original disk1. It doesn't sound like any of the other disks are expected to have any data. I can only assume disk3 was added with New Config, since otherwise it wouldn't let you add a disk to a new slot when you already had a missing disk. Possibly disk3 is unmountable because it was never formatted and so has no filesystem.

 

@beno, you didn't actually answer this question:

4 hours ago, trurl said:

Do you have backups of anything important and irreplaceable?

Link to comment
1 hour ago, trurl said:

No, it is not possible to remove a disk without rebuilding parity, and of course, rebuilding parity would make it impossible to rebuild any data disk. At this point, I think we can forget about rebuilding anything since it seems extremely unlikely parity is even valid.

 

If he ran with a missing disk for some time he hasn't even been able to do a parity check, since parity can't be checked in that state. All it would do if you tried is a "read check", which doesn't do anything except see if the disks can be read.

 

New Config is going to be the way forward. Probably disk2 is OK. Maybe we can get some data from the original disk1. It doesn't sound like any of the other disks are expected to have any data. I can only assume disk3 was added with New Config, since otherwise it wouldn't let you add a disk to a new slot when you already had a missing disk. Possibly disk3 is unmountable because it was never formatted and so has no filesystem.

 

@beno, you didn't actually answer this question:

Thanks trurl for explaining that well.  Yes, the New Config probably is what allowed the disk add and the shares to become "stable" but missing half of the data.  Yes, I have backups of most of it.  Recovery would be a slow pita, but worth it for photos etc.  I'd be interested in general overview to recover the data from disk1, any advice?  I am moving to a more robust server so I have access to another machine/unraid setup if that helps. Most of my data recovery experience is ntfs/fat32, not xfs.  Thanks again.

Edited by beno
oops, name incorrect.
Link to comment
9 hours ago, johnnie.black said:

Try connecting disk1 to a different SATA port and using different cables, if it comes online post a SMART report, if it doesn't not much you can do other than send it for professional data recovery services. 

Thanks @johnnie.black, I won't be able to try this until Thursday evening but I will try then and report back. 🤞🤞🤞

Link to comment

After you get things square again, be sure to setup Notifications to alert you immediately by email or other agent as soon as Unraid detects a problem. Probably Unraid would have been nagging you the whole time you were letting this first problem go:

On 11/5/2019 at 6:37 PM, beno said:

I added mystery disk7 for some reason more than a year ago to add capacity I think. It didn't work and things were still "working" so I removed it physically but apparently I've been running around with a false sense of security ever since.

Might also be a good idea to install Fix Common Problems plugin. I know it would have been nagging you every single day about that.

 

And if you ever have any trouble again, please ask on the forum instead of trying random things which only make things worse.

  • Thanks 1
Link to comment
On 11/8/2019 at 9:50 AM, trurl said:

After you get things square again, be sure to setup Notifications to alert you immediately by email or other agent as soon as Unraid detects a problem. Probably Unraid would have been nagging you the whole time you were letting this first problem go:

Might also be a good idea to install Fix Common Problems plugin. I know it would have been nagging you every single day about that.

 

And if you ever have any trouble again, please ask on the forum instead of trying random things which only make things worse.

Will do @trurl. My new unraid setup is much more fault (Ben) tolerant.

Good news is that ddrescue eventually saved 99.9% of the data.  I had to shutdown and power up (restart wouldn't work) whenever it found a bad area but it finished last night and I'm copying the data over to the new (protected) unraid.  Figured out how to xfs_repair and had to use -L but it's all pretty much back or in lostandfound.

3 cheers for unraid forums!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...