ZFS Pool Disk Replacement Procedure

July 12, 20233 yr

Hi,

I have a ZFS pool of 12 disks running RAIDZ1 with 2 groups of 6 devices configuration.

Today, one of the drive is giving warning

offline uncorrectable is 368

current pending sector is 368

Although the disk still showing green light, I want to replace this hard-disk since it is a new disk just purchased 2 days ago.

I'm using Unraid 6.12.2, I used the new ZFS pool feature to create the pool.

My data on this ZFS pool is critical, so I want to know exact step I need to do to remove the hard-drive that's giving warning and replace with the new one and then execute rebuild.

I tried to search online, but most of the ZFS article are for pre-6.12 update, which is using a plugin or CLI.

Can someone provide me the process so I don't lose all my data due to my inexperience with the new ZFS on Unraid 6.12.x?

Thank you

Quote

1

July 12, 20233 yr

Community Expert
Solution

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480419

Quote

November 7, 20232 yr

Jorge, thank you for referencing the post, but the troubleshooting there is from 2016...certainly Unraid's ZFS implementation has made improvements to the replacement strategy by now?

I recently experienced 2 HDD failures in a 60-drive array (4 vdevs @15 each, RAIDZ2). Luckily, the failures occurred in separate vdevs.

However, from the looks of the post you referenced ... if the 2 failures were in the same vdev, I'd have no way to replace/rebuild that vdev, given that RAIDZ2 provides that redundancy? What is the point of this implementation if we cannot recover from 2 failures in a single vdev with RAIDZ2?

Also, I happened to notice that if I removed one failed HDD from one of my vdevs, the zpool kept functioning, albeit in a degraded status. I am waiting for replacement HDDs to arrive, and I was forced to reboot my server (array would not stop on its own so I could pull the failed disks). But now, I cannot even start the array (unraid array either, but specifically: the zpool), because I'm missing 1 disk from a vdev that has RAIDZ2 redundancy? Shouldn't I be able to start the zpool in a degraded status? This is a little bit inconvenient because now I cannot start any array until I have replacement disks, but I would think we should be able to at least start the zpool in a degraded status...certainly we should be able to start the unraid array, but I cannot seem to do that either while waiting for zpool replacement disks...

Edited November 7, 20232 yr by gizmo000

Quote

November 7, 20232 yr

Community Expert

1 hour ago, gizmo000 said:

but the troubleshooting there is from 2016

It was created for btrfs but it was edited recently to include zfs.

1 hour ago, gizmo000 said:

However, from the looks of the post you referenced ... if the 2 failures were in the same vdev, I'd have no way to replace/rebuild that vdev, given that RAIDZ2 provides that redundancy?

Not at the moment, you should be able to do that on v6.13.

1 hour ago, gizmo000 said:

But now, I cannot even start the array (unraid array either, but specifically: the zpool), because I'm missing 1 disk from a vdev that has RAIDZ2 redundancy?

That should not happen, array can be started with a single missing device, not more, even if raidz2, for now.

Quote

November 7, 20232 yr

5 hours ago, JorgeB said:

It was created for btrfs but it was edited recently to include zfs.

Not at the moment, you should be able to do that on v6.13.

That should not happen, array can be started with a single missing device, not more, even if raidz2, for now.

Okay cool thanks for the quick replies

My second drive failed after the reboot (separate vdev) so unraid is preventing me from starting anything (due to 2 missing disks in zpool).

Quote

November 7, 20232 yr

Community Expert

39 minutes ago, gizmo000 said:

so unraid is preventing me from starting anything (due to 2 missing disks in zpool).

Yes, that is normal for now.

Quote

November 7, 20232 yr

7 hours ago, JorgeB said:

Yes, that is normal for now.

It seems even with two new disks installed, I cannot start the array to resilver.

Whenever I try to start my array now (with either 1 or both new disks assigned in the zfs pool), I get "too many wrong or missing devices." My main unraid array will not even start either. I'm DIW, and starting to stress out...

Since I have two drive failures, there is no way for me to get back to just 1 cache drive wrong or missing. If there's a way to do this via command line, I'm comfortable doing so, but I don't want to jeopardize the 300+ TiB I have on there right now.

Any recommendations on how to proceed properly to restore my zpool?

galileo-diagnostics-20231107-1454.zip

Edited November 7, 20232 yr by gizmo000

Quote

November 8, 20232 yr

Community Expert

12 hours ago, gizmo000 said:

It seems even with two new disks installed, I cannot start the array to resilver.

Nope, like mentioned for now it can only replace one disk at a time.

You can do it manually, post the output of

zpool import

Quote

November 8, 20232 yr

So I have been playing with this overnight ...

I "wrote" a new config using the `New Config` tool (pool devices only), and assigned the same cache disks to the same slots, with the exception of my two failed drives.

`zpool import -d /dev/mapper` (and `zpool import` for that matter) shows my two degraded vdevs with most disks online:

pool: mypool
     id: [redacted]
  state: DEGRADED
status: One or more devices contains corrupted data.
 action: The pool can be imported despite missing or damaged devices.  The
	fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
 config:

	mypool                    DEGRADED
	  raidz2-0                ONLINE
	    sdaw1                 ONLINE
	    sdax1                 ONLINE
	    sday1                 ONLINE
	    sdaz1                 ONLINE
	    sdba1                 ONLINE
	    sdbb1                 ONLINE
	    sdbc1                 ONLINE
	    sdbd1                 ONLINE
	    sdbe1                 ONLINE
	    sdbf1                 ONLINE
	    sdbg1                 ONLINE
	    sdbh1                 ONLINE
	    sdbi1                 ONLINE
	    sdbl1                 ONLINE
	    sdbm1                 ONLINE
	  raidz2-1                ONLINE
	    sde1                  ONLINE
	    sdf1                  ONLINE
	    sdg1                  ONLINE
	    sdh1                  ONLINE
	    sdi1                  ONLINE
	    sdj1                  ONLINE
	    sdk1                  ONLINE
	    sdl1                  ONLINE
	    sdm1                  ONLINE
	    sdn1                  ONLINE
	    sdo1                  ONLINE
	    sdp1                  ONLINE
	    sdq1                  ONLINE
	    sdr1                  ONLINE
	    sds1                  ONLINE
	  raidz2-2                DEGRADED
	    sdt1                  ONLINE
	    sdu1                  ONLINE
	    sdv1                  ONLINE
	    sdw1                  ONLINE
	    sdx1                  ONLINE
	    sdy1                  ONLINE
	    12281917106237315780  UNAVAIL  invalid label
	    sdaa1                 ONLINE
	    sdab1                 ONLINE
	    sdac1                 ONLINE
	    sdae1                 ONLINE
	    sdag1                 ONLINE
	    sdaf1                 ONLINE
	    sdah1                 ONLINE
	    sdai1                 ONLINE
	  raidz2-3                DEGRADED
	    sdad1                 ONLINE
	    sdaj1                 ONLINE
	    sdak1                 ONLINE
	    sdal1                 ONLINE
	    sdam1                 ONLINE
	    sdan1                 ONLINE
	    sdao1                 ONLINE
	    sdap1                 ONLINE
	    sdaq1                 ONLINE
	    sdar1                 ONLINE
	    sdas1                 ONLINE
	    11665832322838174263  FAULTED  corrupted data
	    sdat1                 ONLINE
	    sdau1                 ONLINE
	    sdav1                 ONLINE

The array comes online now, which is great ... however, in the gui the "size, used and free" columns for all my zpool cache drives all show `Unmountable: Unsupported or no file system`. Also, at the bottom of the Main page, Unraid is asking to format all those pool drives (since the filesystem is detected as unsupported). If I assign 'no device' to both slots, the behavior is the same: unsupported or no file system (but the main array still starts).

The `invalid label` results from a manual `cryptsetup luksFormat /dev/sdz1` that I was trying to get unraid to recognize in earlier experimenting.

Edited November 8, 20232 yr by gizmo000

Quote

November 8, 20232 yr

Community Expert

You can:

zpool import mypool
zpool replace -f -o ashift=12 mypool 12281917106237315780 /dev/sdX1
zpool replace -f -o ashift=12 mypool 11665832322838174263 /dev/sdY1
zpool export mypool

Replace X and Y and new disks identifiers, if not yet done the new disks should be partitioned first, you can use the UD plugin to do that.

Then and when that's done re-import the pool in Unraid, please ask if you don't know how to do that.

Quote

November 8, 20232 yr

Looks like the re-import is covered here ... does the export save anything I need to keep track of, or just make sure to run the export command?

The only other thing I changed was /dev/mapper/sdX1 (Y1) because I had the drives encrypted. zpool is doing its thing and resilvering now!

Thanks so much @JorgeB for the assistance here ... I was about to lose my mind.

Quote

November 8, 20232 yr

Community Expert

2 minutes ago, gizmo000 said:

Looks like the re-import is covered here ... does the export save anything I need to keep track of, or just make sure to run the export command?

Yes, that works, important part is that the pool must be new or after a new config (blue icons), and the fs should remain set to auto, pool must be exported first or Unraid won't be able to mount it.

Quote

June 28, 20242 yr

I dont find anything about ZFS in https://docs.unraid.net/unraid-os/manual/storage-management/#replacing-disks

Maybe the linked manual should be added there as well?

Also, the docs page is still missing some ZFS parts. They state "Note: Details will need to be added for ZFS file systems after Unraid 6.12 is release with ZFS support built in."

Quote

June 28, 20242 yr

Community Expert

1 hour ago, KluthR said:

Also, the docs page is still missing some ZFS parts

Docs are a WIP, for now you can find some info in the FAQ:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480419

Quote

1

July 2, 20242 yr

@JorgeB I'm on Unraid 6.12.10, every time I do a replace following the guide via the GUI, it seems like it totally brings the old disk offline. For example looks like:

```

NAME STATE READ WRITE CKSUM
primary DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdc1 ONLINE 0 0 0
sdg1 ONLINE 0 0 0
sdj1 ONLINE 0 0 0
replacing-3 DEGRADED 0 0 0
1123165010249792881 UNAVAIL 0 0 0 was /dev/sdi1
sdd1 ONLINE 0 0 0

```

Is there any way to adjust it so that the replace doesn't UNAVAIL the previous disk, and can use it for the reslivering process?

It seems like the partition that was on /dev/sdi is just gone now that its removed from the pool.

# fdisk -l /dev/sdi
Disk /dev/sdi: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
Disk model: WDC WD180EDGZ-11
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Edited July 2, 20242 yr by foreseeable-concertina5279

Quote

ZFS Pool Disk Replacement Procedure

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)