ZFS pool unmountable, disk errors detected, no pools found

Christobol · March 22

My unraid crashed today and when I checked the command line before rebooting saw a kernal panic and I think a macvlan issue. (macvlan issue was created a couple of days ago when I wanted to start assigning dockers to certain vlans)

Running 6.12.8

ZFS pool created using SpaceInvaderOne's video guide once ZFS was official (forgot what release that was). I doubt this is related, my Plex server went into database migration a few days ago and hasn't worked since. I also had another crash ~2 DAYS ago for which I didn't get any log info and didn't investiage.

image.png.e5ee8c3df875e147e838cb15d7e10a19.png

This is where I store my appdata, system etc, so all of my dockers are dead right now.

Reading about this I found people who had other issues, nothing quite the same but I tried to use zpool to get more info and get:

zpool list (import export etc instead of list)

no pools available

the Fix Common Problems plugin reported this:

* **cache_zfs_nvme (Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R601069W) has file system errors ()**

Though I just tried to run it again and it didn't show up

I ran

ls -l /dev/disk/by-id/

and here is a partial screen shot since I have so many drives in my primary array.

image.png.7a1ec294d963d1eabdbc3fb79eb0c327.png

I am at a loss as to why zpool doesn't see the array, and I don't know how to correct the corruption found on the first drive. I'm not sure what is best to do from here, I thought with this drive running zfs I wouldn't need to worry about a single nvme failure. Currently I can't load any dockers to start getting services back since my appdata etc were on that drive.

atlas-diagnostics-20240322-1354.zip

Edited March 22 by Christobol
Added more detail

JorgeB · March 23

Since the pool is encrypted, first start the array to decrypt the devices, then post the output of

zpool import

Christobol · March 25

As I mentioned above I tried zpool import when the array was running, along with: export and list.

I spent about about 5 hours, without a reboot or me doing anything other than having the array running, reading dozens of support threads and trying to figure out why Fix Common Problems plugin reported this:

* **cache_zfs_nvme (Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R601069W) has file system errors ()**

and that error was gone.

So I decided to run:

zpool list --> nothing showed up again

zpool import

and suddenly my pool was showing:

@Atlas:/mnt# zpool import
pool: cache_zfs_nvme
id: 7422096033263261955
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

cache_zfs_nvme ONLINE
raidz1-0 ONLINE
nvme2n1p1 ONLINE
nvme3n1p1 ONLINE
nvme4n1p1 ONLINE
nvme5n1p1 ONLINE
nvme0n1p1 ONLINE

----

@Atlas:/mnt# zpool import cache_zfs_nvme
cannot import 'cache_zfs_nvme': I/O error
Recovery is possible, but will result in some data loss.
Returning the pool to its state as of Fri 22 Mar 2024 09:37:58 AM CDT
should correct the problem. Approximately 6 seconds of data
must be discarded, irreversibly. Recovery can be attempted
by executing 'zpool import -F cache_zfs_nvme'. A scrub of the pool
is strongly recommended after recovery.

----

@Atlas:/mnt# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
cache_zfs_nvme 9.09T 1.53T 7.57T - - 0% 16% 1.00x ONLINE -

I stopped the array and then started it again and my data was back. Now I'm showing a different error:

When I hover over the red lock, I get device locked with unknown error. After stopping the array and starting it again the red lock was gone.

I don't know what to do at this point, or why it suddenly was able to be located and repaired. I'm concerned about this occuring again.

I did lose a number of files in my plex docker directory (and apparently only plex). I am confused about how zfs protected my files if file corruption was possible at the disk level and not recoverable.

JorgeB · March 25

2 minutes ago, Christobol said:

As I mentioned above I tried zpool import when the array was running

Are you sure the array was running? I don't see how that is possible unless you have an intermittent pool issue, which would be very strange.

3 minutes ago, Christobol said:

I stopped the array and then started it again and my data was back. Now I'm showing a different error:

Post new diags.

Christobol · April 6

I actually started and stopped the array a few times to run the commands and realized that when it was stopped it wouldn't work. Very strange. Since then I let an update run so I imagine finding the problem might be problematic in the diagnostics.

atlas-diagnostics-20240405-2238.zip

JorgeB · April 6

6 hours ago, Christobol said:

I actually started and stopped the array a few times to run the commands and realized that when it was stopped it wouldn't work.

Do you mean that zpool import does not work with the array stopped? That's normal, as mentioned you pool is encrypted, so it won't work until the devices are decrypted.

ZFS pool unmountable, disk errors detected, no pools found

Recommended Posts

Christobol

Link to comment

JorgeB

Link to comment

Christobol

Link to comment

JorgeB

Link to comment

Christobol

Link to comment

JorgeB

Link to comment

Join the conversation