Can't get the array to start.

NGMK · March 14

recently I move my server to a bigger case everything was working fine, then couple of days later I upgraded my cache pool from 2 (1TB) sata ssd btrs to 3 (2TB) NVME using ZFS. after moving the appdata and recreating the docker image file, everything seem to be fine. after couple of days the server stopped responding and needed to be restarted, couple of days later even thou all other dockers were working, Plex was failing to start. After some trouble shooting I realized the docker image was corrupted and had to deleted and recreated it. The very next day the server crash again but this time after rebooting the server the array will get stuck starting but will never start. I rebooted the server several times with no luckI even tried safe mode. Suspecting the cache pool having something to do with this issues I decided to remove one the drive, I was able to start the array in safe mode, after that I stopped the array and re added the removed drive back to the cache pool and now unraid is recognizing the drive as a new drive and stating that all data will be erase from the drive if I start the array. All my appdata is in the cache, I do not have any VM. Any help will be appreciated.

poseidon-diagnostics-20240313-2112.zip

JorgeB · March 14

Log is being spammed with rootshare related errors, disable that and post new diags after a reboot.

NGMK · March 14

This is a new log file after rebooting the server, I do not know how to disable rootshare related errors unless I get the log from syslog, and am not sure if those are okay to be openly shared.

poseidon-diagnostics-20240314-0852.zip

trurl · March 14

29 minutes ago, NGMK said:

rootshare

The recommended way to handle this now is with Unassigned Devices plugin.

Looks like you are doing it in smb-extra.conf instead.

Settings - SMB - SMB Extras

JorgeB · March 14

Remove the rootshare from SMB extras and post the output of

zpool import

NGMK · March 14

1 hour ago, JorgeB said:
Remove the rootshare from SMB extras and post the output of
zpool import

pool: cache
id: 2664919203947636995
state: DEGRADED
status: One or more devices contains corrupted data.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

cache DEGRADED
raidz1-0 DEGRADED
nvme0n1 UNAVAIL invalid label
nvme1n1p1 ONLINE
nvme2n1p1 ONLINE

before opening this post I found another post where someone was having an issues starting the array and was recommended by someone else to remove the cache pool drives. I attempted to do this but although I unassigned all three NVME drives from the pool in the UI when I started the array I realized that only the first drive was actually unassigned. This may be the cause of these results.

JorgeB · March 14

See if the pool imports with the current status, so then you can try to fix it, unassign all pool devices, start array, stop array, reassign all pool devices in the correct order as zpool import shows, start array, post new diags.

NGMK · March 15

Array now starts after removing every nvme cache drives from the pool, starting the array without any drive in the pool, stoping the array, adding all three drives back to the cache pool in the same order. I left the FS as auto, now Unraid wants me to format the drives as right this moment the drives are not mountable. Whats next.

poseidon-diagnostics-20240314-2127.zip

JorgeB · March 15

Pool is not importing because the first device doesn't have a valid fs, try this:

sfdisk /dev/nvme0n1

then type

and hit enter, finally post the output/screenshot of the results

NGMK · March 15

Edited March 15 by NGMK

JorgeB · March 15

Type N to keep the signature and enter, then type

write

and enter, after that re-start the array and post new diags.

NGMK · March 16

when you run the command [ sfdisk /dev/nvme0n1 ] then type [2048] then [N] to not remove the signature I get the following, asking for the other devices in the pool. I type write close the command line and try starting the array but it wont start. below if the screen shot and the new diagnostics

P.S. I really appreciated you taking the time to help.

poseidon-diagnostics-20240315-2031.zip

JorgeB · March 16

That is not for the other devices, it's if you wanted to create a second partition, just type write and enter, not clear if you already did that or not.

NGMK · March 16

Yes I already did type [write] after, and I did this whole process more than once but the Array wont after, and I always end up having to reboot the server. the only data valuable inside the cache pool was my appdata and I have a 2 weeks old backup in the main array, however Im very concern on how this cache pool became so corrupted, this is the very first time me using ZFS and red on another post that zfs1 with 3 drives in a cache pool was only recommended in a experimental setting and not in a mission critical server. What else can we try here. I will prefer to save the pool if possible. New Diagnostics attached.

poseidon-diagnostics-20240316-1430.zip

JorgeB · March 17

16 hours ago, NGMK said:

post that zfs1 with 3 drives in a cache pool was only recommended in a experimental

ZFS raidz1 is far from experimental, it has been considerable stable for a long time.

16 hours ago, NGMK said:

Array wont after, and I always end up having to reboot the server.

That suggests the pool is crashing the server on mount, before starting the server type:

zpool import -o readonly=on cache

If successful then start the array, the GUI will show the pool unmountable but the data should be under /mnt/cache, then backup and re-create the pool

NGMK · March 17

Yes I already tried

zpool import -o readonly=on cache

and was able to start the array with the cache on read only status, the cache pool is available on the gui file explorer, I tried coping the appdata folder to one of the array disks and all was going well until it just stayed on a single file transferring it forever

JorgeB · March 17

Check the main page for write speeds, to see if it's still going, also the syslog for any errors.

NGMK · March 17

So I created a new share in a another cache pool I have with only one sata ssd and copied the appdata directory to it and judging by it size i believe all files are there. Should I just give up on the nvme pool (main) reformat and recreated the pool?

JorgeB · March 18

Once the data is backed up you will need to re-format the pool.

NGMK · March 19

consider this one solved, the array is back online alone with the cache pool. I transferred the appdata recovered from the failed pool, I hope plex is able to recover.

Thanks JorgeB for your assistance.

Can't get the array to start.

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation