Christobol Posted March 22 Share Posted March 22 (edited) My unraid crashed today and when I checked the command line before rebooting saw a kernal panic and I think a macvlan issue. (macvlan issue was created a couple of days ago when I wanted to start assigning dockers to certain vlans) Running 6.12.8 ZFS pool created using SpaceInvaderOne's video guide once ZFS was official (forgot what release that was). I doubt this is related, my Plex server went into database migration a few days ago and hasn't worked since. I also had another crash ~2 DAYS ago for which I didn't get any log info and didn't investiage. This is where I store my appdata, system etc, so all of my dockers are dead right now. Reading about this I found people who had other issues, nothing quite the same but I tried to use zpool to get more info and get: zpool list (import export etc instead of list) no pools available the Fix Common Problems plugin reported this: * **cache_zfs_nvme (Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R601069W) has file system errors ()** Though I just tried to run it again and it didn't show up I ran ls -l /dev/disk/by-id/ and here is a partial screen shot since I have so many drives in my primary array. I am at a loss as to why zpool doesn't see the array, and I don't know how to correct the corruption found on the first drive. I'm not sure what is best to do from here, I thought with this drive running zfs I wouldn't need to worry about a single nvme failure. Currently I can't load any dockers to start getting services back since my appdata etc were on that drive. atlas-diagnostics-20240322-1354.zip Edited March 22 by Christobol Added more detail Quote Link to comment
JorgeB Posted March 23 Share Posted March 23 Since the pool is encrypted, first start the array to decrypt the devices, then post the output of zpool import Quote Link to comment
Christobol Posted March 25 Author Share Posted March 25 As I mentioned above I tried zpool import when the array was running, along with: export and list. I spent about about 5 hours, without a reboot or me doing anything other than having the array running, reading dozens of support threads and trying to figure out why Fix Common Problems plugin reported this: * **cache_zfs_nvme (Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R601069W) has file system errors ()** and that error was gone. So I decided to run: zpool list --> nothing showed up again zpool import and suddenly my pool was showing: @Atlas:/mnt# zpool import pool: cache_zfs_nvme id: 7422096033263261955 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: cache_zfs_nvme ONLINE raidz1-0 ONLINE nvme2n1p1 ONLINE nvme3n1p1 ONLINE nvme4n1p1 ONLINE nvme5n1p1 ONLINE nvme0n1p1 ONLINE ---- @Atlas:/mnt# zpool import cache_zfs_nvme cannot import 'cache_zfs_nvme': I/O error Recovery is possible, but will result in some data loss. Returning the pool to its state as of Fri 22 Mar 2024 09:37:58 AM CDT should correct the problem. Approximately 6 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool import -F cache_zfs_nvme'. A scrub of the pool is strongly recommended after recovery. ---- @Atlas:/mnt# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT cache_zfs_nvme 9.09T 1.53T 7.57T - - 0% 16% 1.00x ONLINE - I stopped the array and then started it again and my data was back. Now I'm showing a different error: When I hover over the red lock, I get device locked with unknown error. After stopping the array and starting it again the red lock was gone. I don't know what to do at this point, or why it suddenly was able to be located and repaired. I'm concerned about this occuring again. I did lose a number of files in my plex docker directory (and apparently only plex). I am confused about how zfs protected my files if file corruption was possible at the disk level and not recoverable. Quote Link to comment
JorgeB Posted March 25 Share Posted March 25 2 minutes ago, Christobol said: As I mentioned above I tried zpool import when the array was running Are you sure the array was running? I don't see how that is possible unless you have an intermittent pool issue, which would be very strange. 3 minutes ago, Christobol said: I stopped the array and then started it again and my data was back. Now I'm showing a different error: Post new diags. Quote Link to comment
Christobol Posted April 6 Author Share Posted April 6 I actually started and stopped the array a few times to run the commands and realized that when it was stopped it wouldn't work. Very strange. Since then I let an update run so I imagine finding the problem might be problematic in the diagnostics. atlas-diagnostics-20240405-2238.zip Quote Link to comment
JorgeB Posted April 6 Share Posted April 6 6 hours ago, Christobol said: I actually started and stopped the array a few times to run the commands and realized that when it was stopped it wouldn't work. Do you mean that zpool import does not work with the array stopped? That's normal, as mentioned you pool is encrypted, so it won't work until the devices are decrypted. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.