replace drive in zfs pool

October 11, 20232 yr

One of my drive in zfs rzidz2 pool started throwing increasing number of pending sectors, so I decided to replace it

I followed the instructions here:

basically stopped array, changed the bad drive in dropdown with new one and started array

Unlike the instructions, I don't see replace happening automatically. status shows the following

status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.

I also noticed another issue. The failing drive was zdata5, and that slot now shows the new drive ID

However, zdata8, a different drive which seems healthy is showing a red cross next to it with message "device is disabled, contents emulated"

In the zfs pool status, there is this in the 8th row

10871034331009088735 UNAVAIL 0 0 0 was /dev/sdq1

but, sdq was the previous identifier for the failed drive (it shows unassigned now) and it was in slot 5 before. The drive in slot 8 is sdo which shows online in zfs status. Is this just a UI glitch?

Anyway, how should I proceed? do I need to run the replace command, or something else has gone wrong?

Diags attached

godaam-diagnostics-20231011-2146.zip

Quote

October 11, 20232 yr

Oct 11 21:43:51 Godaam emhttpd: shcmd (127): /sbin/wipefs -a /dev/sdo
Oct 11 21:43:51 Godaam root: wipefs: error: /dev/sdo: probing initialization failed: Device or resource busy

Replacement failed because of this, this can sometimes happen when the new drive contains partitions from somewhere else, and for some reason wipefs fails, assuming the new drive is /dev/sdo, type

wipefs -af /dev/sdo

Then reboot and post new diags after array start.

Quote

October 11, 20232 yr

Author

5 minutes ago, JorgeB said:

assuming the new drive is /dev/sdo

the new drive is showing as sdv, so I am not sure why it tied to run wipefs on sdo

sdo is the exiting drive from the pool in slot 8 (which is showing a red cross). I did not touch it, yet it is somehow in play

1786730127_Screenshot2023-10-11at10_35_40PM.png.9e82473282deb9d53e32c9676711eef7.png

slot 5 is the new drive, it showing as sdv. the drive before replacement in slot 5 was showing sdq

I did not touch slot 8, that was and is sdo, and became red after I started array after replace

zpool status is showing this

# zpool status -xv
  pool: zdata
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 02:36:08 with 0 errors on Tue Oct  3 09:36:09 2023
config:

	NAME                      STATE     READ WRITE CKSUM
	zdata                     DEGRADED     0     0     0
	  raidz2-0                DEGRADED     0     0     0
	    sdk                   ONLINE       0     0     0
	    sds                   ONLINE       0     0     0
	    sdl                   ONLINE       0     0     0
	    sdo                   ONLINE       0     0     0
	    sdg                   ONLINE       0     0     0
	    sdn                   ONLINE       0     0     0
	    sdh                   ONLINE       0     0     0
	    10871034331009088735  UNAVAIL      0     0     0  was /dev/sdq1
	    sdr                   ONLINE       0     0     0

notice the ordering difference (missing drive is same position as the red drive in UI, but sdo is showing elsewhere in 4th slot. sdv is nowhere)

I will run wipefs on sdv and see what happens

Quote

October 11, 20232 yr

That's strange, but I cannot see what was done before the reboot, and at boot the current drives were already assigned as they are now, usually you do the pol disk replacement after rebooting, as it is now I suggest doing a manual replacement and then re-import the pool.

Quote

October 11, 20232 yr

Author

same result after reboot, it is still somehow trying to run wipefs on /dev/sdo

12 minutes ago, JorgeB said:

That's strange, but I cannot see what was done before the reboot, and at boot the current drives were already assigned as they are now, usually you do the pol disk replacement after rebooting, as it is now I suggest doing a manual replacement and then re-import the pool.

I had 2 slots empty on my server (hot swap drive bays). What I did was

I plugged in the new drive, it showed up under unassigned drives
I stopped the array. changed the drive next to zdata 5 with the new drive
rebooted and started the array. drive in slot 8 showed red mark, but i found it showing online in zpool status command

How should I do manual replacement? would it be

zpool replace zdata /dev/sdq /dev/sdv

Edited October 11, 20232 yr by apandey

Quote

October 11, 20232 yr

Solution

3 minutes ago, apandey said:

rebooted and started the array.

This might have been the issue, something I've never tried, why reboot first?

3 minutes ago, apandey said:
zpool replace zdata /dev/sdq /dev/sdv

That should do it, you may need to use -f if the new disk is not empty, then reimport the pool, see the end of this FAQ entry for how to.

Quote

October 11, 20232 yr

Author

11 minutes ago, apandey said:
How should I do manual replacement? would it be
zpool replace zdata /dev/sdq /dev/sdv

i created partitions on sdv using sgdisk and did replace as above. its resilvering now

Quote

October 11, 20232 yr

Author

there seems to be some odd behaviour on the slots on unraid UI vs the disk order shown by zdb or zpool status. zdb shows the failed disk as 8th drive, but thats a different order than unraid assigned slots. Then unraid seems to be doing operations based on drive index rather than drive IDs (which is why it was trying to target sdo)

I'll see how it looks after resilver and reimport, but if the drives don't line up, this may be an issue in future too

Quote

October 11, 20232 yr

8 minutes ago, apandey said:

Then unraid seems to be doing operations based on drive index rather than drive IDs (which is why it was trying to target sdo)

I think you are onto to something, but the order not being correct would only happen if the pool was created outside Unraid, and then imported in the wrong order.

Quote

October 12, 20232 yr

Author

the pool was created pre 6.12, using the zfs plugin. I don't remember exact steps on how I imported when 6.12 came out

Anyway, replaced drive, resilvered, and then reimported as per your instructions, this time in same order as zdb. All good now

Quote

1

replace drive in zfs pool

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)