ZFS - drives going missing/fault - General Support

December 14, 20232 yr

Current setup - 1 zpool with 3x vdevs with 4x drives each in a raidz1. vdev1 are 4 18tb drives, vdev2 4 14tb drives, vdev3 4 12tb drives.

Everything was running smoothly until one drive failed. I went to replace it and next thing I knew every vdev was having issues. I suppose i'm one more failure away from loosing the pool!

~# zpool status
pool: dumpster
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Dec 13 22:04:11 2023
72.4T scanned at 0B/s, 71.5T issued at 707M/s, 72.9T total
5.88T resilvered, 98.19% done, 00:32:34 to go
config:

NAME STATE READ WRITE CKSUM
dumpster DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
5766548143726475423 UNAVAIL 0 0 0 was /dev/sdf1
sdc1 ONLINE 0 0 0 (resilvering)
sdg ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
raidz1-1 DEGRADED 0 0 0
sdj ONLINE 0 0 470
sdn ONLINE 0 0 470
sdk ONLINE 0 0 470
sdi REMOVED 0 0 0
raidz1-2 DEGRADED 0 0 0
sdh ONLINE 0 0 0
sde ONLINE 0 0 0
sdd ONLINE 0 0 0
11356428966707618947 FAULTED 0 0 0 was /dev/sdh1

errors: 6 data errors, use '-v' for a list

Finished resilvering; did a reboot then this:

~# zpool status
pool: dumpster
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Dec 14 11:42:10 2023
55.1T scanned at 30.1G/s, 39.0T issued at 21.2G/s, 72.4T total
154G resilvered, 53.84% done, 00:26:49 to go
config:

NAME STATE READ WRITE CKSUM
dumpster DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdc1 ONLINE 0 0 0
2364323260111293387 UNAVAIL 0 0 0 was /dev/sdg1
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
raidz1-1 DEGRADED 0 0 0
sdj ONLINE 0 0 0
replacing-1 DEGRADED 0 0 0
11133145800720317235 FAULTED 0 0 0 was /dev/sdn1
sdg1 ONLINE 0 0 0 (resilvering)
sdk ONLINE 0 0 0
sdi ONLINE 0 0 0
raidz1-2 DEGRADED 0 0 0
sdh ONLINE 0 0 0
sde ONLINE 0 0 0
sdd ONLINE 0 0 0
11356428966707618947 FAULTED 0 0 0 was /dev/sdh1

errors: 7819 data errors, use '-v' for a list

At one point it said sdf was "fault" and on the Unraid GUI, an 'x' instead of a green dot saying "Device is disabled, contents emulated" then after the above restart, sdf had no faults (although it also isn't showing any read or write movement) and now sdd has the 'x' but still shows reads and writes.

I don't even know how to begin figuring out wtf is going on or diagnose the problem. Any advise would be appreciated.

Quote

December 14, 20232 yr

Community Expert

Please post the diagnostics.

Quote

December 14, 20232 yr

Author

here you go.. thank you

kk-shang-zeus-diagnostics-20231214-1320.zip

Quote

December 14, 20232 yr

Community Expert

Was this pool create outside Unraid and then imported? The pool must be assigned in the same order as the zpool status, or there will be issues if you attempt to replace a device.

Wait for the the resilver to finish then reimport the pool in the correct order, sdc, sdl, sdm, etc, to reimport the pool you just need to stop the array, unassing all pool devices, start array, stop array, re-assign all pool devices in the correct order, start array.

Quote

December 14, 20232 yr

Author

Originally, I created the pool a few years ago (way before native support) using spaceinvaderone's videos. When there was native support (and the ZFS Master Plugin was depreciated) I followed the instructions on here (I can find the link if needed) to export and re import the pool.

What was really weird is, the first time I did it, I had created a pool (named tank) using the following command:

zpool add dumpster raidz sdc sdd sde sdh

Then I read how that's a bad idea due to possibility of the designations changing on reboot, so I destroyed the pool and re-did it using:

zpool add dumpster raidz ata-WDC_WD181KRYZ-01AGBB0_3FHR6YST ata-WDC_WD181KRYZ-01AGBB0_3FHS6EBT ata-WDC_WD181KRYZ-01AGBB0_3GKUE1VE ata-WDC_WD181KRYZ-01AGBB0_3GKZVZBE

At this stage, i'm assuming when I exported the pool and re-added it, I wasn't paying attention and accidentally did so using the sdc instead of their SN?

Understood. I will follow the above instructions and revert back.

Thank you!

Quote

December 15, 20232 yr

Author

@JorgeB

The "order" of the drives as per the last zpool status is incorrect... drives from my vdev are now in another vdev. Should I be adding the drives in order as per what the vdevs were supposed to be?

Quote

December 15, 20232 yr

Community Expert
Solution

8 hours ago, v3life said:

Should I be adding the drives in order as per what the vdevs were supposed to be?

The pool should be imported in the exact same order as the zpool status shows.

Quote

December 19, 20232 yr

Author

To close out this thread:

Even after resilvering/reimporting the pool was nurfed
Every attempt to recover presented new problems

I had a full backup of the pool on a qnap server (which runs nightly) so I took everything offline, formatted the drives, removed the pool (then turned on the array, turned it off again), re-created the zpool in the unraid GUI, added the drives replacing the drives i wanted to replace to begin with, started it up, and transferred the 80TB back into unraid.

Note - before i did all of the above, I turned of VMS and docker so they didn't all freak out, once everything was transferred back I went ahead and reactivated VMS & Docker

Although there was probably a way to solve the problem, the downtown and general agitation from the whole ordeal lead me to brute force a solution I was confident would work... thankfully it did!

Thank you @JorgeB for the advice and guidance

-K

Quote

1

ZFS - drives going missing/fault

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)