July 12, 20232 yr Hi, I have a ZFS pool of 12 disks running RAIDZ1 with 2 groups of 6 devices configuration. Today, one of the drive is giving warning offline uncorrectable is 368 current pending sector is 368 Although the disk still showing green light, I want to replace this hard-disk since it is a new disk just purchased 2 days ago. I'm using Unraid 6.12.2, I used the new ZFS pool feature to create the pool. My data on this ZFS pool is critical, so I want to know exact step I need to do to remove the hard-drive that's giving warning and replace with the new one and then execute rebuild. I tried to search online, but most of the ZFS article are for pre-6.12 update, which is using a plugin or CLI. Can someone provide me the process so I don't lose all my data due to my inexperience with the new ZFS on Unraid 6.12.x? Thank you
July 12, 20232 yr Solution https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480419
November 7, 20232 yr Jorge, thank you for referencing the post, but the troubleshooting there is from 2016...certainly Unraid's ZFS implementation has made improvements to the replacement strategy by now? I recently experienced 2 HDD failures in a 60-drive array (4 vdevs @15 each, RAIDZ2). Luckily, the failures occurred in separate vdevs. However, from the looks of the post you referenced ... if the 2 failures were in the same vdev, I'd have no way to replace/rebuild that vdev, given that RAIDZ2 provides that redundancy? What is the point of this implementation if we cannot recover from 2 failures in a single vdev with RAIDZ2? Also, I happened to notice that if I removed one failed HDD from one of my vdevs, the zpool kept functioning, albeit in a degraded status. I am waiting for replacement HDDs to arrive, and I was forced to reboot my server (array would not stop on its own so I could pull the failed disks). But now, I cannot even start the array (unraid array either, but specifically: the zpool), because I'm missing 1 disk from a vdev that has RAIDZ2 redundancy? Shouldn't I be able to start the zpool in a degraded status? This is a little bit inconvenient because now I cannot start any array until I have replacement disks, but I would think we should be able to at least start the zpool in a degraded status...certainly we should be able to start the unraid array, but I cannot seem to do that either while waiting for zpool replacement disks... Edited November 7, 20232 yr by gizmo000
November 7, 20232 yr 1 hour ago, gizmo000 said: but the troubleshooting there is from 2016 It was created for btrfs but it was edited recently to include zfs. 1 hour ago, gizmo000 said: However, from the looks of the post you referenced ... if the 2 failures were in the same vdev, I'd have no way to replace/rebuild that vdev, given that RAIDZ2 provides that redundancy? Not at the moment, you should be able to do that on v6.13. 1 hour ago, gizmo000 said: But now, I cannot even start the array (unraid array either, but specifically: the zpool), because I'm missing 1 disk from a vdev that has RAIDZ2 redundancy? That should not happen, array can be started with a single missing device, not more, even if raidz2, for now.
November 7, 20232 yr 5 hours ago, JorgeB said: It was created for btrfs but it was edited recently to include zfs. Not at the moment, you should be able to do that on v6.13. That should not happen, array can be started with a single missing device, not more, even if raidz2, for now. Okay cool thanks for the quick replies My second drive failed after the reboot (separate vdev) so unraid is preventing me from starting anything (due to 2 missing disks in zpool).
November 7, 20232 yr 39 minutes ago, gizmo000 said: so unraid is preventing me from starting anything (due to 2 missing disks in zpool). Yes, that is normal for now.
November 7, 20232 yr 7 hours ago, JorgeB said: Yes, that is normal for now. It seems even with two new disks installed, I cannot start the array to resilver. Whenever I try to start my array now (with either 1 or both new disks assigned in the zfs pool), I get "too many wrong or missing devices." My main unraid array will not even start either. I'm DIW, and starting to stress out... Since I have two drive failures, there is no way for me to get back to just 1 cache drive wrong or missing. If there's a way to do this via command line, I'm comfortable doing so, but I don't want to jeopardize the 300+ TiB I have on there right now. Any recommendations on how to proceed properly to restore my zpool? galileo-diagnostics-20231107-1454.zip Edited November 7, 20232 yr by gizmo000
November 8, 20232 yr 12 hours ago, gizmo000 said: It seems even with two new disks installed, I cannot start the array to resilver. Nope, like mentioned for now it can only replace one disk at a time. You can do it manually, post the output of zpool import
November 8, 20232 yr So I have been playing with this overnight ... I "wrote" a new config using the `New Config` tool (pool devices only), and assigned the same cache disks to the same slots, with the exception of my two failed drives. `zpool import -d /dev/mapper` (and `zpool import` for that matter) shows my two degraded vdevs with most disks online: pool: mypool id: [redacted] state: DEGRADED status: One or more devices contains corrupted data. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J config: mypool DEGRADED raidz2-0 ONLINE sdaw1 ONLINE sdax1 ONLINE sday1 ONLINE sdaz1 ONLINE sdba1 ONLINE sdbb1 ONLINE sdbc1 ONLINE sdbd1 ONLINE sdbe1 ONLINE sdbf1 ONLINE sdbg1 ONLINE sdbh1 ONLINE sdbi1 ONLINE sdbl1 ONLINE sdbm1 ONLINE raidz2-1 ONLINE sde1 ONLINE sdf1 ONLINE sdg1 ONLINE sdh1 ONLINE sdi1 ONLINE sdj1 ONLINE sdk1 ONLINE sdl1 ONLINE sdm1 ONLINE sdn1 ONLINE sdo1 ONLINE sdp1 ONLINE sdq1 ONLINE sdr1 ONLINE sds1 ONLINE raidz2-2 DEGRADED sdt1 ONLINE sdu1 ONLINE sdv1 ONLINE sdw1 ONLINE sdx1 ONLINE sdy1 ONLINE 12281917106237315780 UNAVAIL invalid label sdaa1 ONLINE sdab1 ONLINE sdac1 ONLINE sdae1 ONLINE sdag1 ONLINE sdaf1 ONLINE sdah1 ONLINE sdai1 ONLINE raidz2-3 DEGRADED sdad1 ONLINE sdaj1 ONLINE sdak1 ONLINE sdal1 ONLINE sdam1 ONLINE sdan1 ONLINE sdao1 ONLINE sdap1 ONLINE sdaq1 ONLINE sdar1 ONLINE sdas1 ONLINE 11665832322838174263 FAULTED corrupted data sdat1 ONLINE sdau1 ONLINE sdav1 ONLINE The array comes online now, which is great ... however, in the gui the "size, used and free" columns for all my zpool cache drives all show `Unmountable: Unsupported or no file system`. Also, at the bottom of the Main page, Unraid is asking to format all those pool drives (since the filesystem is detected as unsupported). If I assign 'no device' to both slots, the behavior is the same: unsupported or no file system (but the main array still starts). The `invalid label` results from a manual `cryptsetup luksFormat /dev/sdz1` that I was trying to get unraid to recognize in earlier experimenting. Edited November 8, 20232 yr by gizmo000
November 8, 20232 yr You can: zpool import mypool zpool replace -f -o ashift=12 mypool 12281917106237315780 /dev/sdX1 zpool replace -f -o ashift=12 mypool 11665832322838174263 /dev/sdY1 zpool export mypool Replace X and Y and new disks identifiers, if not yet done the new disks should be partitioned first, you can use the UD plugin to do that. Then and when that's done re-import the pool in Unraid, please ask if you don't know how to do that.
November 8, 20232 yr Looks like the re-import is covered here ... does the export save anything I need to keep track of, or just make sure to run the export command? The only other thing I changed was /dev/mapper/sdX1 (Y1) because I had the drives encrypted. zpool is doing its thing and resilvering now! Thanks so much @JorgeB for the assistance here ... I was about to lose my mind.
November 8, 20232 yr 2 minutes ago, gizmo000 said: Looks like the re-import is covered here ... does the export save anything I need to keep track of, or just make sure to run the export command? Yes, that works, important part is that the pool must be new or after a new config (blue icons), and the fs should remain set to auto, pool must be exported first or Unraid won't be able to mount it.
June 28, 20242 yr I dont find anything about ZFS in https://docs.unraid.net/unraid-os/manual/storage-management/#replacing-disks Maybe the linked manual should be added there as well? Also, the docs page is still missing some ZFS parts. They state "Note: Details will need to be added for ZFS file systems after Unraid 6.12 is release with ZFS support built in."
June 28, 20242 yr 1 hour ago, KluthR said: Also, the docs page is still missing some ZFS parts Docs are a WIP, for now you can find some info in the FAQ: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480419
July 2, 20242 yr @JorgeB I'm on Unraid 6.12.10, every time I do a replace following the guide via the GUI, it seems like it totally brings the old disk offline. For example looks like: ``` NAME STATE READ WRITE CKSUM primary DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 sdc1 ONLINE 0 0 0 sdg1 ONLINE 0 0 0 sdj1 ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 1123165010249792881 UNAVAIL 0 0 0 was /dev/sdi1 sdd1 ONLINE 0 0 0 ``` Is there any way to adjust it so that the replace doesn't UNAVAIL the previous disk, and can use it for the reslivering process? It seems like the partition that was on /dev/sdi is just gone now that its removed from the pool. # fdisk -l /dev/sdi Disk /dev/sdi: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors Disk model: WDC WD180EDGZ-11 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Edited July 2, 20242 yr by foreseeable-concertina5279
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.