NGMK Posted March 14 Share Posted March 14 recently I move my server to a bigger case everything was working fine, then couple of days later I upgraded my cache pool from 2 (1TB) sata ssd btrs to 3 (2TB) NVME using ZFS. after moving the appdata and recreating the docker image file, everything seem to be fine. after couple of days the server stopped responding and needed to be restarted, couple of days later even thou all other dockers were working, Plex was failing to start. After some trouble shooting I realized the docker image was corrupted and had to deleted and recreated it. The very next day the server crash again but this time after rebooting the server the array will get stuck starting but will never start. I rebooted the server several times with no luckI even tried safe mode. Suspecting the cache pool having something to do with this issues I decided to remove one the drive, I was able to start the array in safe mode, after that I stopped the array and re added the removed drive back to the cache pool and now unraid is recognizing the drive as a new drive and stating that all data will be erase from the drive if I start the array. All my appdata is in the cache, I do not have any VM. Any help will be appreciated. poseidon-diagnostics-20240313-2112.zip Quote Link to comment
JorgeB Posted March 14 Share Posted March 14 Log is being spammed with rootshare related errors, disable that and post new diags after a reboot. Quote Link to comment
NGMK Posted March 14 Author Share Posted March 14 This is a new log file after rebooting the server, I do not know how to disable rootshare related errors unless I get the log from syslog, and am not sure if those are okay to be openly shared. poseidon-diagnostics-20240314-0852.zip Quote Link to comment
trurl Posted March 14 Share Posted March 14 29 minutes ago, NGMK said: rootshare The recommended way to handle this now is with Unassigned Devices plugin. Looks like you are doing it in smb-extra.conf instead. Settings - SMB - SMB Extras 1 Quote Link to comment
JorgeB Posted March 14 Share Posted March 14 Remove the rootshare from SMB extras and post the output of zpool import Quote Link to comment
NGMK Posted March 14 Author Share Posted March 14 1 hour ago, JorgeB said: Remove the rootshare from SMB extras and post the output of zpool import pool: cache id: 2664919203947636995 state: DEGRADED status: One or more devices contains corrupted data. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J config: cache DEGRADED raidz1-0 DEGRADED nvme0n1 UNAVAIL invalid label nvme1n1p1 ONLINE nvme2n1p1 ONLINE before opening this post I found another post where someone was having an issues starting the array and was recommended by someone else to remove the cache pool drives. I attempted to do this but although I unassigned all three NVME drives from the pool in the UI when I started the array I realized that only the first drive was actually unassigned. This may be the cause of these results. Quote Link to comment
JorgeB Posted March 14 Share Posted March 14 See if the pool imports with the current status, so then you can try to fix it, unassign all pool devices, start array, stop array, reassign all pool devices in the correct order as zpool import shows, start array, post new diags. Quote Link to comment
NGMK Posted March 15 Author Share Posted March 15 Array now starts after removing every nvme cache drives from the pool, starting the array without any drive in the pool, stoping the array, adding all three drives back to the cache pool in the same order. I left the FS as auto, now Unraid wants me to format the drives as right this moment the drives are not mountable. Whats next. poseidon-diagnostics-20240314-2127.zip Quote Link to comment
JorgeB Posted March 15 Share Posted March 15 Pool is not importing because the first device doesn't have a valid fs, try this: sfdisk /dev/nvme0n1 then type 2048 and hit enter, finally post the output/screenshot of the results Quote Link to comment
NGMK Posted March 15 Author Share Posted March 15 (edited) Edited March 15 by NGMK Quote Link to comment
JorgeB Posted March 15 Share Posted March 15 Type N to keep the signature and enter, then type write and enter, after that re-start the array and post new diags. Quote Link to comment
NGMK Posted March 16 Author Share Posted March 16 when you run the command [ sfdisk /dev/nvme0n1 ] then type [2048] then [N] to not remove the signature I get the following, asking for the other devices in the pool. I type write close the command line and try starting the array but it wont start. below if the screen shot and the new diagnostics P.S. I really appreciated you taking the time to help. poseidon-diagnostics-20240315-2031.zip Quote Link to comment
JorgeB Posted March 16 Share Posted March 16 That is not for the other devices, it's if you wanted to create a second partition, just type write and enter, not clear if you already did that or not. Quote Link to comment
NGMK Posted March 16 Author Share Posted March 16 Yes I already did type [write] after, and I did this whole process more than once but the Array wont after, and I always end up having to reboot the server. the only data valuable inside the cache pool was my appdata and I have a 2 weeks old backup in the main array, however Im very concern on how this cache pool became so corrupted, this is the very first time me using ZFS and red on another post that zfs1 with 3 drives in a cache pool was only recommended in a experimental setting and not in a mission critical server. What else can we try here. I will prefer to save the pool if possible. New Diagnostics attached. poseidon-diagnostics-20240316-1430.zip Quote Link to comment
Solution JorgeB Posted March 17 Solution Share Posted March 17 16 hours ago, NGMK said: post that zfs1 with 3 drives in a cache pool was only recommended in a experimental ZFS raidz1 is far from experimental, it has been considerable stable for a long time. 16 hours ago, NGMK said: Array wont after, and I always end up having to reboot the server. That suggests the pool is crashing the server on mount, before starting the server type: zpool import -o readonly=on cache If successful then start the array, the GUI will show the pool unmountable but the data should be under /mnt/cache, then backup and re-create the pool Quote Link to comment
NGMK Posted March 17 Author Share Posted March 17 Yes I already tried zpool import -o readonly=on cache and was able to start the array with the cache on read only status, the cache pool is available on the gui file explorer, I tried coping the appdata folder to one of the array disks and all was going well until it just stayed on a single file transferring it forever Quote Link to comment
JorgeB Posted March 17 Share Posted March 17 Check the main page for write speeds, to see if it's still going, also the syslog for any errors. Quote Link to comment
NGMK Posted March 17 Author Share Posted March 17 So I created a new share in a another cache pool I have with only one sata ssd and copied the appdata directory to it and judging by it size i believe all files are there. Should I just give up on the nvme pool (main) reformat and recreated the pool? Quote Link to comment
JorgeB Posted March 18 Share Posted March 18 Once the data is backed up you will need to re-format the pool. Quote Link to comment
NGMK Posted March 19 Author Share Posted March 19 consider this one solved, the array is back online alone with the cache pool. I transferred the appdata recovered from the failed pool, I hope plex is able to recover. Thanks JorgeB for your assistance. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.