Remove a failed disk from array and rebuild onto free space without replacing it?


Recommended Posts

Hey once again. Unraid 6.7. I have a 20x 3TB drive array consisting of 18 data disks and 2 for parity. I just had a drive give a heap of read errors and then go offline and a few restarts and raid card pokes and it's not coming online at all. I do not want to get another old 3TB drive and I have only filled around 7 or 8TB of the 50TB+ available. The disk that failed (disk3) was one of the disks with data so I want to just reduce the disk count and rebuild parity.

 

I read this: https://wiki.unraid.net/Shrink_array#The_.22Remove_Drives_Then_Rebuild_Parity.22_Method and I think I need clarification on the first step. What does it matter how share data is assigned after a full failure? I can't move the data off the disk now...

 

So I was simply going to screeny my config, go to Tools > New Config and reset array, assign all drives in their same positions except I would leave disk 18 slot blank and assign the disk 18 SN disk to disk 3 slot, then start array without the "Parity is already valid" checked. Then rebuild parity if it doesn't auto start it for me.

 

Will above work ok for me? I won't lose any data this way? Do I really have to start array again and include/exclude disks etc?

 

Thanks for the help. BiGs.

Link to comment
15 minutes ago, BiGs said:

I can't move the data off the disk now...

Why not? That's a requirement if you want to keep the data and remove the slot assignment later. You either have to rebuild that logical slot to a physical drive, or move the data off the logical slot and rebuild parity without it.

 

Post a full screenshot of the Main GUI page with the array started.

Link to comment

Hey J. Thanks for the fast reply. ss attached. I question this step as the drive is gone. Even raid card bios is coming up with something connected but blank SN. I don't have the option for including or excluding this disk 3 anymore which made me question that step. Perhaps it can be ignored if I'm removing a failed drive and I do the rest of the steps and rebuild the data from parity instead? Just want the clarification that my steps above will work or am I missing something? Cheers. BiGs.

mainGUI.jpg

Edited by BiGs
Link to comment
28 minutes ago, BiGs said:

I read this: https://wiki.unraid.net/Shrink_array#The_.22Remove_Drives_Then_Rebuild_Parity.22_Method and I think I need clarification on the first step. What does it matter how share data is assigned after a full failure? I can't move the data off the disk now...

Actually you can!   When a disk fails Unraid will emulate it using the combination of the remaining disks plus parity.   Your screenshot shows that Unraid is still showing a disk3 with 1.5 TB used despite the fact the physical disk is no longer present.   You will find that you can access the contents of this emulated dirk.   If you were going to simply replace the disk then Unraid would rebuild the emulated contents onto the replacement disk so no other action would be required.

 

However you want to shrink the array and the New Config that is part of that process invalidates parity and thus the ability to rebuild a disk with its content intact.    You therefore first need to get the data off the 'emulated' disk before doing anything else or you will lose the current contents of disk3.   Once you have done that you can continue with the array shrink process.

  • Like 1
Link to comment

Ok. I think I understand now. So unraid replaces the missing disk as a mounted emulated disk with the same data derived from parity? Goodo. Lucky I asked here or I guess I would of lost the data on that drive if I tried my above steps. So even tho disk3 is missing from includions, I check all others on all shares and then start the mover?

inclusions.jpg

Link to comment
28 minutes ago, BiGs said:

start the mover?

No, the mover won't move from disk3 to another disk slot. You will need to use another method to get the data from slot 3 to another slot. The unbalance app can be used, as could mc at the console, or with some configuration, krusader or dolphin dockers.

 

You could also temporarily enable disk shares and move the data across the network using your favorite file manager app.

 

The mover only transfers between cache and data slots, never between different data slots.

  • Like 1
Link to comment

Ah ha. Gotcha. Yeah I see disk shares in the ftp directory tree. I'll prob wait for this mover to finish now and I'll see if I can just move everything off that disk back onto the share again but this time with all disks included but disk3, it will redistribute everything skipping the mounted disk. Makes sense. I wonder if I should turn off cache for this.. Thanks for helping a nubcake out. I'll let you know how it goes. 

Link to comment
8 minutes ago, BiGs said:

I'll see if I can just move everything off that disk back onto the share again

If you don't take precautions, that action will result in data loss. I believe if you follow the directions about adding slot 3 to the GLOBAL exclude, you should be ok. Normally, however, you should never copy data between disk and user shares. Only disk to disk and user to user, never mix. Treat /mnt/user as special, and never mix /mnt/diskX or /mnt/cache with /mnt/user.

 

That's why I said TEMPORARILY enable disk shares and copy from disk to disk over the network.

 

Also, you keep saying move the data. I recommend copy instead of move, it will be faster, and you will be removing the source disk anyway.

  • Like 1
Link to comment
16 minutes ago, jonathanm said:

If you don't take precautions, that action will result in data loss. I believe if you follow the directions about adding slot 3 to the GLOBAL exclude, you should be ok. Normally, however, you should never copy data between disk and user shares. Only disk to disk and user to user, never mix. Treat /mnt/user as special, and never mix /mnt/diskX or /mnt/cache with /mnt/user.

 

That's why I said TEMPORARILY enable disk shares and copy from disk to disk over the network.

 

Also, you keep saying move the data. I recommend copy instead of move, it will be faster, and you will be removing the source disk anyway.

Good idea. I will follow your advice. I will actually copy the data to another PC, then remove the disk3, then copy it back to the /mnt/user/share and let unraid deal with distribution again. Cheers. 

 

p.s Im confirming I did set global share settings to include all disks but 3 as per all the shares (as the emulated disk3 doesn't show up in include or exclude)

Edited by BiGs
Link to comment

Sounds good. Remember to put disk3 back to normal with includes and excludes after you rearrange disks.

 

BTW, having so many empty disks is not so good an idea anyway. I would take this opportunity to only include as many disks as you actually need, and keep the rest on the shelf waiting to be either additions or replacements as the need happens.

 

All disks, whether empty or not, participate in the parity calculation with their full capacity. That means a pair of empty disk failures will keep you from reconstructing a 3rd failed data disk. Also, keeping them connected is a waste of power. Small, because they normally will be spun down, but still measurable.

  • Like 1
Link to comment

Yeah good idea. I've only just purchased Unraid and was just getting the hang of it by adding & pre clearing disks etc. Good idea to remove a few at least and use them for hot swap spares. I had 26x of these drives once apon a time. These older non nas drives just drop like flies now.

 

So I've xfered 1TB since. It is dragging parity from the empties too so yeah prob good to remove most of them. I've got read errors on disk 16 now too. So ill pull most of the empties out after removing them and test them on my desktop. 

parity_copy.jpg

Link to comment
1 hour ago, BiGs said:

I've got read errors on disk 16 now too.

That's not good. I'm not positive how dual parity deals with read errors during single emulation, I assume the same as single parity normally, where a read error is followed by a reconstruct and a write to put the calculated data back in place on the same drive. If that write fails, the disk will be dropped and emulated just like disk 3 is currently. If you lose 2 more disks while is the array is in the current state, you will lose the ability for unraid to reconstruct from parity, and will need to resort to standard recovery techniques on failed drives.

 

I'd definitely be removing all empty disks after you get the data recovered that you need.

  • Like 1
Link to comment

Yup. It's struggling on the last couple hunge GB. Does like 15 seconds at like 40MBs then stops, throws some read errors, then does another 15-30 seconds worth or so and repeats. Seams to slowly chugging through it tho. and it was disk 15 actually, miss read. disk 16 is a refurb I just bought, so I could of taken that back. 

strugglingdisk15.png

Link to comment

All copied over. Removed disk 3 & 13-18. Parity drives rebuilding now. I'll surface test this ones later and just have some spares. I think my problem is I've only ever used physical raid with very rigid methods of acting on failed drives. This is the first time I've gone the software way. But it's good, it gives a heap more options and fail-safes. Thanks for the help. BiGs

parity_rebuild.jpg

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.