Drive replacement

FreeMan · September 4, 2020

I know I've done this before and I'm quite certain I've asked about it, too. The only reference I could find in any of my posts (I did look through about 1/2 of them), though, was from 2014, so I want to be sure nothing's changed since then. Also, I've been around long enough to know that I know just enough to be dangerous and I can easily shoot myself in the foot if I'm not careful.

I need to replace a tired, old, yet functional drive with a newer, larger one. I've run a pre-clear on the new drive as a test - I know that this step is unnecessary except for the level of comfort it brings that it won't die within the first 48 hours - so the drive's installed and ready to be used.

In the Wiki, it states:

To replace a failed disk or disks:

Stop the array.
Power down the unit.
Replace the failed disk(s) with a new one(s).
Power up the unit.
Assign the replacement disk(s) using the Unraid webGui.
Click the checkbox that says Yes I want to do this and then click Start.

The comment made on that 2014 post I found stated:

Stop the array and unassign the drive to be replaced
Start the array so it shows a missing drive
Stop the array and assign the new (replacement) drive to the slot where the drive was missing
Start the array and wait while UnRAID rebuilds the drive.

They are, essentially, the same set of instructions. My only hesitation is that the Wiki states it's for a failed disk. I believe that steps 1 & 2 from the 2014 instructions are to convince unRAID that the disk has failed, then it's continue as per Wiki.

Correct?

ChatNoir · September 4, 2020

I agree that it is a section of the wiki that could be clearer or more detailled for this use case.

If you go this route, I would personnaly do a parity check before remplacing the drive so to be sure that the data used for the rebuild is good.

JorgeB · September 4, 2020

No, but you don't even need to do them for a failed disk, just if you don't have enough ports to have both connected, you also don't need to start the array with the disk unassigned, just stop array, assign new disk, start array.

calvinandh0bbes · September 4, 2020

20 minutes ago, ChatNoir said:

If you go this route, I would personnaly do a parity check before remplacing the drive so to be sure that the data used for the rebuild is good.

? Isn't a parity check pointless as it will calculate parity using the contents of the failed drive, which is now emulated....using parity. Kind of like checking to see if the answer in the back of the book is correct using a photocopy of the answer in the back of the book.

JonathanM · September 4, 2020

1 hour ago, FreeMan said:

I need to replace a tired, old, yet functional drive with a newer, larger one.

54 minutes ago, calvinandh0bbes said:

Isn't a parity check pointless as it will calculate parity using the contents of the failed drive, which is now emulated....using parity.

Technically it isn't a parity check when the drive has already failed, it's a read check, for the reasons you state.

However, in this thread, we are discussing replacing a GOOD drive, and yes, a full parity check with zero errors is crucial before pulling a good drive to be upgraded.

So, if the drive in question is dead, no point in doing a check, rebuilding the drive will do the exact same thing, hopefully getting the array back protected.

If the drive is still good, valid parity is required to successfully replace it, so a check before removing the drive is warranted.

FreeMan · September 4, 2020

37 minutes ago, jonathanm said:

If the drive is still good, valid parity is required to successfully replace it, so a check before removing the drive is warranted.

The monthly parity check just completed 2 days ago, and the drive being replaced is more than full enough that I know nothing's being written to it, so I'm not concerned about losing data there. I believe I've added a grand total of one whole file to the array since completion, so I'm comfortable with the parity being good.

Otherwise, the process is correct?

trurl · September 4, 2020

1 hour ago, FreeMan said:

Otherwise, the process is correct?

3 hours ago, JorgeB said:

just stop array, assign new disk, start array.

FreeMan · September 4, 2020

Stopped array.
Selected the new drive in place of the existing drive for disk 5.
Got a red X and it said it was missing and replaced the selected new disk with "no drive".
Repeated steps 2 & 3 several times
Left it set at "no drive"
Started the array (should have probably selected "maintenance mode", but didn't
Stopped the array
Hung

image.png.d6c96972cdf123e7f33f6746e1a94f8a.png

It's said this for about the last 10 minutes.

The dashboard shows "Array (Stopped)" and lists Parity, Disk 1-4, 6-8 (as expected).

The server is not responding by name, but it is accessible by IP address.

As always when the server's not responding properly: nas-diagnostics-20200904-2053.zip

Should I reboot, which seems to be about my only option, or is there something else I'm not aware of? I'd probably set auto-start to off for this boot, just to save a minute or two.

Edited September 4, 2020 by FreeMan

FreeMan · September 4, 2020

Also, checking quickly via Tappatalk earlier, I had no idea who @JorgeB was. Now that I'm on the website and see the avatar & tag line... adjusts mental notes

Edited September 4, 2020 by FreeMan

trurl · September 4, 2020

Looks like something is keeping docker from shutting down. Can you get to the Docker page?

trurl · September 4, 2020

There is also this in syslog when you try to assign the new disk:

Sep  4 16:24:30 NAS kernel: mdcmd (6): import 5 sde 64 7814026532 0 HGST_HUS728T8TALE6L4_VDKY234M
Sep  4 16:24:30 NAS kernel: md: import disk5: lock_bdev error: -13
Sep  4 16:24:30 NAS kernel: md: import_slot: 5 missing

I don't recall seeing that lock_bdev before but maybe I have just never had a reason to look for it

FreeMan · September 4, 2020

10 minutes ago, trurl said:

Can you get to the Docker page?

"Array must be started to view Docker containers", so that's a no.

I have shut down all SSH connections, Windows explorers, etc. I had my Kodi box running, but I've rebooted the server many times in the past with it up and that didn't present any issues.

The odd thing is that I stopped the array with no issues to remove the disk 5 assignment. It's only after I brought it back up, then tried stopping it again that it's refusing to stop.

It's had this on the status bar of the browser window since I attempted the shutdown:

image.png.c642fdd2b8c25442ebabab050782f90f.png

Also, the WebGUI stopped responding by name and is currently only responding via IP. I don't know if that's an important symptom, but I presume it'll go away once I get it restarted.

trurl · September 4, 2020

Do you have any command line open anywhere? It was also having problems unmounting cache. In fact, your diagnostics showed cache and docker.img still mounted. I also noticed FTP running in syslog. Is there anything connected there?

FreeMan · September 4, 2020

Unless someone hacked in from the outside, no to all of the above.

FreeMan · September 4, 2020

OK, I finally got frustrated and hit the "Reboot" button on the Array Operations tab. I figured the worst that could happen was that it wouldn't do anything. I'd tried to set the array to not start before doing that, but it wouldn't save, so I left it to auto-start the array.

Upon clicking the reboot button, it commenced to shutting down. After about 2 minutes, I got the Login page. After logging in, it showed that the array was up, Dockers had all started and everything was hunky dory with disk5 being emulated.

I stopped the array, assigned the new 8TB drive to disk5 and clicked start. It's now happily rebuilding disk5. Not sure what the malfunction was, but I'm on the right track now. Another 20 or so hours and I'll have an extra 4TB of space and a nice crispy-new drive to use.

JorgeB · September 5, 2020

8 hours ago, trurl said:

I don't recall seeing that lock_bdev before but maybe I have just never had a reason to look for it

Yes, that's not normal there was some issue with that disk, there's also this later:

Sep  4 16:44:45 NAS kernel: sd 9:0:3:0: [sde] tag#2701 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Sep  4 16:44:45 NAS kernel: sd 9:0:3:0: [sde] tag#2701 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00
Sep  4 16:44:45 NAS kernel: print_req_error: I/O error, dev sde, sector 0

But not sure what caused it, UD was spinning down that disk, but I would think that would be unrelated, but if all is good for now ignore.

Drive replacement

Recommended Posts

FreeMan

Link to comment

ChatNoir

Link to comment

JorgeB

Link to comment

calvinandh0bbes

Link to comment

JonathanM

Link to comment

FreeMan

Link to comment

trurl

Link to comment

FreeMan

Link to comment

FreeMan

Link to comment

trurl

Link to comment

trurl

Link to comment

FreeMan

Link to comment

trurl

Link to comment

FreeMan

Link to comment

FreeMan

Link to comment

JorgeB

Link to comment

Join the conversation