Best pratice to replace a still working data drive with a larger one?


Recommended Posts

Thanks, itimpi. Yes, I'd seen that page but couldn't find anything specific there about reinstating a physical drive that has been emulated. I fancy giving maintenence mode a go to see what I can do with that. Worse case, I suppose, I should be able to delete the emulated drive and rebuild parity all over again.

 

-- 

Chris

Link to comment
2 minutes ago, bidmead said:

Thanks, itimpi. Yes, I'd seen that page but couldn't find anything specific there about reinstating a physical drive that has been emulated. I fancy giving maintenence mode a go to see what I can do with that. Worse case, I suppose, I should be able to delete the emulated drive and rebuild parity all over again.

 

-- 

Chris

I thought that the section on rebuilding a disk onto itself covered what you wanted?     Maybe I have misunderstood?

Link to comment

You're right, but that seems to be a bit extreme. After all the data on the physical drive is identical to the emulated data for that drive. I can see that "rebuilding" would leave us in the right place but I would expect the system simply to be able to say, Oh, hello, you're back again. I'll stop emulating.

 

-- 

Chris

Link to comment

I couldn't do much with maintenance mode. But now I know not to pull or plug drives while the array is running I think I've managed to re-establish the status quo. What seems to work is:

 

1. Shut down the array.

2. Pull the drive that claims to be emulated.

3. Start the array. The drive slot is now declared unassigned.

5. Use the pull-down to find the Maxtor and load that back into the slot.

6. Start the array.

7. Warning message: Drive not ready. Content being reconstructed.

 

I'm hoping that "reconstructed" is different from "emulated" and that the drive's physical presence is now being acknowledged and some parity checking is going on.

 

At some later stage I'm going to have to walk this whole sequence through again (Start with a solid array, pull a drive, check the emulation, replace the drive) to understand how it's meant to work. My grievous error (apart from my gross misreporting of which drives were where---apologies once again) was not grokking that UnRAID doesn't support drives being plugged and pulled while the array is running.

 

Yes, it's parity syncing now and the physical presence of the drive has deffo been acknowledged.

 

What I'm headed for next is replacing this small Maxtor with a much bigger drive and getting the Maxtor emulation pasted onto the new drive. But I'm going to run this pull, emulate, replace once more first.

 

Oh, and I'm marking the drives externally so I won't get them mixed up again!

 

-- 

Chris

 

 

IMG20201122205749-01.jpeg

Link to comment
40 minutes ago, bidmead said:

After all the data on the physical drive is identical to the emulated data for that drive.

Possibly, but probably not. Unraid will still write to the emulated drive just like it's there, which means ANY activity to that data slot is out of sync. The safest way forward is a rebuild.

 

Unraid only disables (red ball) a drive when a write fails. That means, there was SOMETHING written to that slot. Now, it's also possible that when a read for that drive was issued, since the read failed, Unraid immediately reconstructs what SHOULD have been read from the rest of the disks, and attempts to write it back to the drive, which also failed because the drive was sitting on your desk.

 

However... since Unraid tried to write something and failed, the only way to know for sure if there are differences between the offline physical drive and the emulated data slot would be to do a binary compare. That would take an excruciatingly long time to complete, and you will be unprotected from another failure while that's in progress.

 

If you really want to blindly trust what's on the drive, you can do that, but you will still need to do a parity check to be sure the contents are in sync so a different drive failure doesn't result in corruption.

 

50 minutes ago, bidmead said:

Oh, hello, you're back again. I'll stop emulating.

Yes, but Unraid has no knowledge of the possible differences, and can't make assumptions. Like I said, if you do the procedure to reinstate the drive as is, you will need to do a parity check anyway.

Link to comment

You're right, jonathanm. With the Maxtor reinserted I did create an empty folder on the emulated drive in the hope that this would wake UnRAID up to the realisation that the drive was physically present. That idea didn't work and I think I now understand why.

 

So, yes, the data on the emulated drive would now be different from the data on the physical drive. And, as you've predicted, what we're getting now isn't just a parity check, it's a Parity Sync / Data Rebuild. Currently at 40%---thank heavens this old Maxtor (circa 2009) is small.

 

For my re-run I'll make sure a) only to pull and plug drives when the array is down and b) not to change the data on the emulated drive while Maxtor has left the building. Would I be wrong in that case to expect as smooth return to the status quo (no parity activity or rebuilding) when Maxtor makes his come-back?

 

I'm truly grateful to you guys for sharing my pain here and steering me back to sanity.

 

-- 

Chris

Link to comment
11 hours ago, jonathanm said:

I'd recommend using the last 4 digits of the serial number for the drive, since the serial number is how Unraid keeps track of assignments.

This suggests to me that I should be able physically to stick these drives into any position in the 8-bay (remembering to spin down the array first) and UnRAID will carry on regardless. Is this the case?

 

-- 

Chris

Link to comment

OK, I said I'd try a clean rerun of this without a) the senior moment about which drives are where and b) my misunderstanding of when you can hot-swap. This is that clean rerun.

 

1. I start with an entirely sane array. It's running but I then pull the Maxstor, simulating a sudden drive failure.

 

2. UnRAID flags the failure and straightaway goes into emulation mode. In this mode I'm able from my Hackintosh to tap into the (emulated) Maxtor share, use VeraCrypt to mount the encrypted disk image therein and play any of the multimedia files it contains. No buffering or glitches. Exactly as if the Maxtor were present.

 

3. The big mistake I made last time (I believe) was to plug the drive back in with the array still running. This appears to confuse UnRAID and it keeps using the emulated drive instead of the real one. This time I spun down the array before reinserting the drive. With the array spun down, UnRAID appears to recognise the returned drive almost immediately. I'm not clear how it does this. Is there a checksum of the whole drive somewhere on the Maxtor that corresponds to a checksum retained by the parity drive?

 

What I've learnt here is that, yes, you can pull a drive while the array is running without affecting services across the LAN to client devices. But it's a good idea (probably mandatory) to spin down the array before returning the drive. And I'm assuming that if your hardware supports hot-swapping you should be able to follow all the similar procedures suggested in the manual but WITHOUT having to power down the machine. But do spin down the array.

 

-- 

Chris

 

UnQNAP_Dashboard array turned good anotated.jpg

Link to comment

The drive will stay disabled until it is rebuilt.

3 minutes ago, bidmead said:

With the array spun down, UnRAID appears to recognise the returned drive almost immediately.

I believe you are mis-interpreting the messages, but it's understandable why, it's misleading.

The last green message is referring to whether or not the read error column has a non-zero value, and those columns (read / write errors) are set back to zero when the array is started, regardless of whether or not there is a disabled drive slot.

Link to comment

You're right, jonathanm. The drive is still marked as emulated. So what the green message is saying is "There are no read errors because I haven't actually read the disk. And if I do read the disk, I'll be reading the emulation, which by definition, won't have any read errors."

 

So the question arising from this is how do we get the array to acknowledge the physical drive and use that instead of the emulation?

 

-- 

Chris

 

1264297174_UnQNAP_Main_Drivestillemulated.png.8f6b279ea9044f70d3db126d6ec6dd60.png

Link to comment

So we're saying the rebuild is mandatory. That seems to be a real shame.

 

OK, I'm going to spin down the array, leave the drives physically where they are. Then restart the array in maintenance mode, unassign the drive from the Disk 2 slot. Then spin down the array, reassign the same drive to that slot and spin the array up again.

 

It should then rebuild?

 

I see "Replacement disk installed", which does suggest a rebuild is being triggered.

 

And indeed...

 

1084604786_UnQNAP_Main-4DataRebuildStarted.png.bad1c5e2bfcf62f88313fd7da4fa56bf.png

 

But this rebuilding of a perfectly good drive seems to me suboptimal. Is there really no user intervention that can say, trust this drive until the next scheduled parity check*? I assume a workaround would be to leave it as an unassigned drive and share it out from there.

 

Any views on this?

 

(*I guess the problem might be that if the reinserted drive is accepted into the array on trust as I suggest, it's not just the integrity of that drive that's at stake. The parity check across all the drives is in jeopardy.)

 

-- 

Chris

Edited by bidmead
additional explanation and further thoughts
Link to comment
1 hour ago, bidmead said:

confuse UnRAID

Not confused at all. No point in replacing a disk with array started (or even under power) because Unraid won't use it until rebuilt, and it won't begin rebuilding until you assign a disk to the emulated slot.

 

You will have to stop, unassign the slot, start the array with nothing assigned to the slot, stop, reassign, start to begin rebuilding. 

Link to comment
20 minutes ago, trurl said:

Not confused at all. No point in replacing a disk with array started (or even under power) because Unraid won't use it until rebuilt, and it won't begin rebuilding until you assign a disk to the emulated slot.

 

You will have to stop, unassign the slot, start the array with nothing assigned to the slot, stop, reassign, start to begin rebuilding. 

Sorry, trurl, I was late reading this. Yes, that's what I've done. Rebuilding now.

 

-- 

Chris

Link to comment
8 minutes ago, bidmead said:

My newbitude is showing through here. We're not just rebuilding the drive, we're rebuilding trust across the entire array.

 

-- 

Chris

Exactly. Now is a good time to hammer home the point that the failure of ANY drive in the array jeopardizes the recovery of other drives. So, you are risking the entire content of your 18TB drive by keeping that old 300GB Maxtor around. I'm not saying it is or isn't going to fail imminently, just that you must trust all drives in the array to be 100% healthy to rebuild a failed drive. Adding 300GB of capacity to 18TB of space hardly seems worth the risk.

Link to comment
4 minutes ago, bidmead said:

And, of course, you can happily insert a drive while the array is running---as long as it's not part of the array.

 

-- 

Chris

Yes, if your hardware fully supports hot swap you can add and remove devices to be used with the Unassigned Devices plugin, for data transfer or backup. Just don't disturb any array devices in the process.🤣

Link to comment
6 minutes ago, jonathanm said:

Adding 300GB of capacity to 18TB of space hardly seems worth the risk.

Excellent point, of course. But thanks to the small size the rebuild times are short, which is what I need if I'm going to get all this written up before the end of the year. I'm anticipating that the Maxtor will end its days as an unassigned device with a btrfs encrypted file system. Tested Technology is very grateful to Seagate for the donation of these very large drives, but we also want to show readers that UnRAID is good for older, smaller drives too.

 

I take your point, of course, about failed drives jeopardising the entire array. And an array built entirely from 12 year old drives like this Maxstor would be unreliable. But the combination of UnRAID's clear test, its very explicit surfacing of the SMART data, the opportunity for regular parity checks and the option of running a pair of parity drives, to my (newbie) mind does open up the option to get into UnRAID very economically, mustering a bunch of drives that happen to be lying around to create something non-mission-critical like a multimedia server. 

 

-- 

Chris 

Link to comment

As long as you have notifications set up, and immediately act at the first sign of issues, then I'll agree.

 

Too many people get complacent after setting up their array, and allow errors to pile up until data loss is inevitable. We see this quite regularly, where someone comes to the forum with their first post asking for help recovering their array, and it turns out they've been running with a disabled drive for months because nothing seemed wrong to them, because all their data has been available this whole time, until suddenly it's not.

 

You experienced it yourself, where you were convinced the array was fine even though you still had a disabled drive.

 

Perhaps you could recommend a change in how this is handled, since the experience is fresh?

Link to comment
2 minutes ago, jonathanm said:

Perhaps you could recommend a change in how this is handled, since the experience is fresh?

Recommend to UnRAID management how the reporting of the replaced drive might be better handled? Or recommend to readers (as we certainly shall) that keeping an eye open for errors is crucial for a trustworthy NAS?

 

-- 

Chris

Link to comment
Just now, bidmead said:

Recommend to UnRAID management how the reporting of the replaced drive might be better handled?

This, sort of. How could the notifications be worded so that you understood that the specific drive was still disabled, when at that moment in time you believed everything to be OK?

 

Should the disabled drive notice be spammed? Too many notifications can cause people to ignore them, thinking "ok, ok, I got it" when they really don't "got it".

 

I'm trying to find a way for Unraid to help people understand the urgency of the issue without overdoing it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.