Best pratice to replace a still working data drive with a larger one?

trurl · November 23, 2020

1 hour ago, jonathanm said:

you have notifications set up

I personally think that it should nag more about this. The most critical Notifications should be by email or other agent so the user doesn't have to actually remember to open the webUI to see they have a serious problem.

bidmead · November 23, 2020

Interesting question, jonathanm, but I may not be the one to answer that. I have 40 years experience as an IT journalist, but only a month's experience of UnRAID.

For what it's worth, it seems to me that "array turned good" was misleading once the physical drive was replaced. After all, what is the change from the previous state, emulating the missing drive, to the present state, emulating the drive but the drive is back in its slot? The goodness quotient hasn't changed: we're still emulating the drive and therefore sacrificing our insurance on all the other drives. The goodness is only restored once the physical drive has been fully successfully rebuilt.

This should surely be a yellow alert, saying something like: Disk 2 emulated, rebuild required.

Bottom line, though: the punter should alway keep an eye on the browser tab icon. If it's not a green ball, action is needed (or maybe ongoing).

--

Chris

Edited November 23, 2020 by bidmead
additional thoughts and more additional thoughts

bidmead · November 23, 2020

I pursued the rebuild of the Maxtor drive. But, as we might have expected, the work taxed the dear old drive beyond endurance. It failed the rebuild and UnRAID looped around several times trying the rebuild afresh until I put it out of its misery. I'm now running the rebuild on the second IronWolf Pro, which was always my original intention, to demonstrate that expanding capacity is a lot simpler with UnRAID than with, say, TrueNAS Core.

--

Chris

trurl · November 23, 2020

4 minutes ago, bidmead said:

demonstrate that expanding capacity

I have always preferred a small form factor so expanding capacity for me is always rebuilding to larger disks.

bidmead · November 24, 2020

The catch about migrating ever upwards to these very impressively engineered huge drives is that we begin to run into one of the issues that drove us away from RAID: the rebuild time and the pressure this exerts on the reliability of the other drives in the array.

With UnRAID, I'd have thought, much of the USP is the ability to use a large number of rather smaller drives, recovering from drive failure over a tea-break rather than during a three day vigil. And, particularly, if failing to recover, at least not losing data on the other drives of the array.

--

Chris

trurl · November 24, 2020

3 hours ago, bidmead said:

large number of rather smaller drives

That has even greater downsides in my opinion. Each additional disk requires more hardware to attach, more power to run, at some point more license unless you already have the Pro license. Smaller disks don't perform as well as larger simply due to data density.

Most importantly, each additional disk is an additional point of failure.

bidmead · November 25, 2020

Yes, that makes a lot of sense, trurl. Today's hard drives (especially once they've run the gauntlet of PreClear), it seems to me, are considerably more reliable than the commodity drives that inspired the invention of RAID. And even more reliable, I'd argue, than the costly enterprise drives that RAID aspired to replace. Bigger but fewer makes good sense today.

But I like the idea of having multiple hotswappable bays on an UnRAID system, if only because I can dump unassigned drives in there, test them with PreClear and use them experimentally as shares without having to add them to the array.

--

Chris

bidmead · November 25, 2020

Just to wind up the Maxstor story, I've successfully replaced that (now failed) drive with the second IronWolf Pro after a rebuild lasting one day, 10 hours, 26 minutes.

The Maxstor data are all preserved and the previous 300GB Maxstor share is available across the LAN. But I've hugely expanded the capacity of the array. I have my green ball back again. Job done.

Many thanks to this forum for the invaluable help.

--

Chris

2099304246_UnQNAP_MainsuccessfulMaxtorreplacement.png.300bb148c20be6c77b6c20bc6471cb5e.png

bidmead · November 26, 2020

Actually, the Maxtor story has one last chapter. I don't know if I'm right in continuing it here because it raises a rather different issue. But let's see.

The old Maxtor failed the rebuild, almost certainly due to a bad sector or sectors. However, I understand that the PreClear app can reallocate dud sectors, so I ran the drive through another single-pass total preclear. It passed.

So I've now set it up as I originally intended: it's a luks-btrfs unassigned drive exported as a disk share. It doesn't turn up under Main/Disk Share, I assume because it's not exported from the array. (Is this right?)

The luks-btrfs format was created against a pass phrase which will be required every time the disk is mounted. You can set this drive to auto mount or mount it manually from the WebGUI. To do either of these you need (only once) to enter the pass phrase into Setting/Unassigned Devices/Set Encrypted Disk Password.

Now the share can be loaded with no formality across the LAN. But this isn't, of course, very secure. You really want the share only able to be loaded against a pass phrase. But you can't share the disk until its mounted. And you can't mount it without the pass phrase. So, by definition, once it's a share its not password protected (except against it being stolen, with data access attempted outside the UnRAID device).

Is this how it is, or am I missing something? (I feel I probably am.) If not, then the workaround is clearly to use something like VeraCrypt to create an encrypted disk image on an unencrypted share or drive and require Veracrypt on the client device for the decryption.

--

Chris

bidmead · November 27, 2020

It seems that unassigned drive encryption has developed since this video from SpaceInvader One. The password's no longer in plain text in a file called keyfile. It's now encrypted in /tmp/unassigned.devices/config/unassigned.devices.cfg. And despite being in /tmp, as far as I can make out the file persists through powercycles. So the data will remain encrypted if someone steals the drive but will autodecrypt if the whole UnRAID server is stolen.

Have I got that right?

--

Chris

shEiD · January 21, 2021

On 11/23/2020 at 7:36 AM, bidmead said:

@bidmead Awesome looking annotations 🤩 What program are you using to do this?

bidmead · January 24, 2021

It's for creating comics. An Android app called PicSay. Very easy and useful. I use it quite a bit.

--

Chris

docbillnet · September 10

On 4/9/2018 at 5:18 AM, PeteB said:

Here's what I do when I replace a data drive:

1. Run a parity check first before doing anything else

2. Set the mover to not run by changing it to a date well into the future. This will need to be undone after the array has been recovered.

3. Take a screenshot of the state of the array so that you have a record of the disk assignments

4. Ensure that any dockers which write directly to the array are NOT set to auto start

5. Set the array to not autostart

6 Stop all dockers

7. Stop the array

8. Unassign the OLD drive (ie: the one being replaced)

9. Power down server

10. Install the new drive

11. Power on the server

12. Assign the NEW drive into the slot where the old drive was removed

13. Put a tick in the Yes I want to do this box and click start.

The array will then rebuild onto the new disk. Dockers that don't write directly to the array can be restarted.

When the rebuild is complete, the mover, docker and array auto start configuration can be returned to their normal settings.

NOTE: You CAN write to the array during a rebuild operation, but I elect not to do so, to ensure my parity remains untouched for the duration of the recovery. Reading from the array is fine as the device contents are emulated whilst the drive is being rebuilt.

This is not a very well thought out procedure. Imagine you reach step 8 and accidently remove the wrong drive. As soon as you power up it will try to bring up the array again, and it will now detect two failed drives. Your array is now non-recoverable.

Steps 1 - 7 are probably good. At that point remove the old drive and install the new drive. You should see when trying to assign the drive if you pulled the wrong one, and if so, since you are still offline you can restore the drive and try and find the correct one. I don't really see the point of power cycling the server at all. That seems redundant.

BTW. This is not quite a hypothetical. I had forgotten I moved my parity drive from slot 1 to slot 4 because I found it got better cooling. So when I went to replace the last drive in the array, I accidently pulled the parity drive instead. Unfortunately, I was not dealing with a dead drive. I had not unassigned the drive I intended to remove. But I did not realise my mistake until I restarted the array. So I now get to rebuild my parity, before I can try replacing the correct drive. I'm actually not sure which drive it is now, so I will probably have to try one drive at a time while it is offline until I find the one with the correct serial number.

JonathanM · September 10

1 hour ago, docbillnet said:

As soon as you power up it will try to bring up the array again, and it will now detect two failed drives. Your array is now non-recoverable.

That's not necessarily true. If you know the drive assignments (which is covered in step 3) you can set a new config with the drives in all the right slots, select parity is already valid, and start the array in maintenance mode so nothing is written. Then shut down and remove the failed drive, and you should be back in the position to do a normal replacement.

2 hours ago, docbillnet said:

So I now get to rebuild my parity, before I can try replacing the correct drive. I'm actually not sure which drive it is now, so I will probably have to try one drive at a time while it is offline until I find the one with the correct serial number.

Depending on how far you got you may still be able to recover. Please start a new thread in general support with your current situation, how you got there, and your diagnostics zip file attached.

Best pratice to replace a still working data drive with a larger one?

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

PeteB

itimpi

bidmead

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation