ZFS pool unmountable, disk errors detected, no pools found


Recommended Posts

My unraid crashed today and when I checked the command line before rebooting saw a kernal panic and I think a macvlan issue.  (macvlan issue was created a couple of days ago when I wanted to start assigning dockers to certain vlans)

 

Running 6.12.8

ZFS pool created using SpaceInvaderOne's video guide once ZFS was official (forgot what release that was).  I doubt this is related, my Plex server went into database migration a few days ago and hasn't worked since. I also had another crash ~2 DAYS ago for which I didn't get any log info and didn't investiage. 

 

image.thumb.png.fab95758bbcc88a206fdd87d1afe299b.png

image.png.e5ee8c3df875e147e838cb15d7e10a19.png

This is where I store my appdata, system etc, so all of my dockers are dead right now.  

 

Reading about this I found people who had other issues, nothing quite the same but I tried to use zpool to get more info and get:

zpool list  (import export etc instead of list)

no pools available

 

the Fix Common Problems plugin reported this:

* **cache_zfs_nvme (Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R601069W) has file system errors ()**

Though I just tried to run it again and it didn't show up

 

I ran

ls -l /dev/disk/by-id/ 

and here is a partial screen shot since I have so many drives in my primary array.

image.png.7a1ec294d963d1eabdbc3fb79eb0c327.png

 

I am at a loss as to why zpool doesn't see the array, and I don't know how to correct the corruption found on the first drive.  I'm not sure what is best to do from here, I thought with this drive running zfs I wouldn't need to worry about a single nvme failure. Currently I can't load any dockers to start getting services back since my appdata etc were on that drive.

 

atlas-diagnostics-20240322-1354.zip

Edited by Christobol
Added more detail
Link to comment

As I mentioned above I tried zpool import when the array was running, along with: export and list.

 

I spent about about 5 hours, without a reboot or me doing anything other than having the array running, reading  dozens of support threads and trying to figure out why Fix Common Problems plugin reported this:

* **cache_zfs_nvme (Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R601069W) has file system errors ()**

and that error was gone.

 

So I decided to run:

zpool list --> nothing showed up again

zpool import

and suddenly my pool was showing:

@Atlas:/mnt# zpool import
   pool: cache_zfs_nvme
     id: 7422096033263261955
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        cache_zfs_nvme  ONLINE
          raidz1-0      ONLINE
            nvme2n1p1   ONLINE
            nvme3n1p1   ONLINE
            nvme4n1p1   ONLINE
            nvme5n1p1   ONLINE
            nvme0n1p1   ONLINE

----

@Atlas:/mnt# zpool import cache_zfs_nvme
cannot import 'cache_zfs_nvme': I/O error
        Recovery is possible, but will result in some data loss.
        Returning the pool to its state as of Fri 22 Mar 2024 09:37:58 AM CDT
        should correct the problem.  Approximately 6 seconds of data
        must be discarded, irreversibly.  Recovery can be attempted
        by executing 'zpool import -F cache_zfs_nvme'.  A scrub of the pool
        is strongly recommended after recovery.

----

@Atlas:/mnt# zpool list
NAME             SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
cache_zfs_nvme  9.09T  1.53T  7.57T        -         -     0%    16%  1.00x    ONLINE  -

 

I stopped the array and then started it again and my data was back.  Now I'm showing a different error:

 

image.thumb.png.52afcc9a19ad9f831414f39b413decb1.png

 

When I hover over the red lock, I get device locked with unknown error.  After stopping the array and starting it again the red lock was gone.

 

I don't know what to do at this point, or why it suddenly was able to be located and repaired.  I'm concerned about this occuring again.

 

I did lose a number of files in my plex docker directory (and apparently only plex).  I am confused about how zfs protected my files if file corruption was possible at the disk level and not recoverable.  

Link to comment
2 minutes ago, Christobol said:

As I mentioned above I tried zpool import when the array was running

Are you sure the array was running? I don't see how that is possible unless you have an intermittent pool issue, which would be very strange.

 

3 minutes ago, Christobol said:

I stopped the array and then started it again and my data was back.  Now I'm showing a different error:

Post new diags.

Link to comment
  • 2 weeks later...
6 hours ago, Christobol said:

I actually started and stopped the array a few times to run the commands and realized that when it was stopped it wouldn't work. 

Do you mean that zpool import does not work with the array stopped? That's normal, as mentioned you pool is encrypted, so it won't work until the devices are decrypted.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.