Jump to content

unraid stopped working while out of town. boots in safemode only. zfs storage


Recommended Posts

went on a trip out of town recently and had alot of storms here. no power outages for more than a minute or two. server is on backup power so dont think it shut down. During bootup i get a panic and doesnt go past there. it looks like the drives are all being found and ran test but thats as far as im able to get. This is the trace log i found. i have all my storage disks in a zfs pool. no zfs pools being found. but not sure if its related to error. this is what im getting on normal boot and the trace from logs.

 

freezes at panic

if anyone could point me in the right direction that would help alot

this is not on the new unraid with zfs.

Jun  2 00:38:03 Tower kernel: PANIC: zfs: removing nonexistent segment from range tree (offset=1f10c3514000 size=2000)
Jun  2 00:38:03 Tower kernel: Showing stack for process 57252
Jun  2 00:38:03 Tower kernel: CPU: 81 PID: 57252 Comm: z_wr_iss Tainted: P           O      5.19.17-Unraid #2
Jun  2 00:38:03 Tower kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.13.0 05/14/2021
Jun  2 00:38:03 Tower kernel: Call Trace:
Jun  2 00:38:03 Tower kernel: <TASK>
Jun  2 00:38:03 Tower kernel: dump_stack_lvl+0x44/0x5c
Jun  2 00:38:03 Tower kernel: vcmn_err+0x86/0xc3 [spl]
Jun  2 00:38:03 Tower kernel: ? pn_free+0x2a/0x2a [zfs]
Jun  2 00:38:03 Tower kernel: ? bt_grow_leaf+0xc3/0xd6 [zfs]
Jun  2 00:38:03 Tower kernel: ? zfs_btree_insert_leaf_impl+0x21/0x44 [zfs]
Jun  2 00:38:03 Tower kernel: ? pn_free+0x2a/0x2a [zfs]
Jun  2 00:38:03 Tower kernel: ? zfs_btree_find_in_buf+0x4b/0x97 [zfs]
Jun  2 00:38:03 Tower kernel: zfs_panic_recover+0x6d/0x88 [zfs]
Jun  2 00:38:03 Tower kernel: range_tree_remove_impl+0xd3/0x416 [zfs]
Jun  2 00:38:03 Tower kernel: space_map_load_callback+0x70/0x79 [zfs]
Jun  2 00:38:03 Tower kernel: space_map_iterate+0x2ec/0x341 [zfs]
Jun  2 00:38:03 Tower kernel: ? spa_stats_destroy+0x16c/0x16c [zfs]
Jun  2 00:38:03 Tower kernel: space_map_load_length+0x94/0xd0 [zfs]
Jun  2 00:38:03 Tower kernel: metaslab_load+0x34d/0x6f5 [zfs]
Jun  2 00:38:03 Tower kernel: ? spl_kmem_alloc_impl+0xc6/0xf7 [spl]
Jun  2 00:38:03 Tower kernel: ? __kmalloc_node+0x1b4/0x1df
Jun  2 00:38:03 Tower kernel: metaslab_activate+0x3b/0x1f4 [zfs]
Jun  2 00:38:03 Tower kernel: metaslab_alloc_dva+0x7e2/0xf39 [zfs]
Jun  2 00:38:03 Tower kernel: ? spl_kmem_cache_alloc+0x4a/0x608 [spl]
Jun  2 00:38:03 Tower kernel: metaslab_alloc+0xfd/0x1f6 [zfs]
Jun  2 00:38:03 Tower kernel: zio_dva_allocate+0xe8/0x738 [zfs]
Jun  2 00:38:03 Tower kernel: ? spl_kmem_alloc_impl+0xc6/0xf7 [spl]
Jun  2 00:38:03 Tower kernel: ? preempt_latency_start+0x2b/0x46
Jun  2 00:38:03 Tower kernel: ? _raw_spin_lock+0x13/0x1c
Jun  2 00:38:03 Tower kernel: ? _raw_spin_unlock+0x14/0x29
Jun  2 00:38:03 Tower kernel: ? tsd_hash_search+0x74/0x81 [spl]
Jun  2 00:38:03 Tower kernel: zio_execute+0xb2/0xdd [zfs]
Jun  2 00:38:03 Tower kernel: taskq_thread+0x277/0x3a5 [spl]
Jun  2 00:38:03 Tower kernel: ? wake_up_q+0x44/0x44
Jun  2 00:38:03 Tower kernel: ? zio_taskq_member.constprop.0.isra.0+0x4f/0x4f [zfs]
Jun  2 00:38:03 Tower kernel: ? taskq_dispatch_delay+0x115/0x115 [spl]
Jun  2 00:38:03 Tower kernel: kthread+0xe7/0xef
Jun  2 00:38:03 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
Jun  2 00:38:03 Tower kernel: ret_from_fork+0x22/0x30
Jun  2 00:38:03 Tower kernel: </TASK>
Jun  2 00:41:36 Tower kernel: md: sync done. time=729sec
Jun  2 00:41:36 Tower kernel: md: recovery thread: exit status: 0
Jun  2 00:53:25 Tower kernel: mdcmd (37): nocheck cancel

 

Link to comment

I notice 1 of the lights to a hard drive tray isnt lighting up, but the disk is spinning up. bad light or something more, not sure. tried booting without disk in to see if different error but same thing. ive looked over all connections and nothing seamed off.

Link to comment

i just did this. if i remove the zfs plugin it boots. I also went a step further and unmounted all my drives and installed one at a time to see if a disk was causing this. what ive found out is i have a 4disk backplane inside server for the 4th vdev, if i plug one of the disks into it i get the error -.-.  when i run without plugin i checked connected disks in console and it sees all my disks. but with plugin installed i cant boot with a disk plugged in. could this be a backplane failure or possible vdev failure of those 4 disk.

 

i didnt check diags while doing all this but i can reinstall the drives and get them if it would help.

Edited by myths
Link to comment

Note that some data loss can occur with these:

 

zpool import -F

"Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported."

 

zpool import -FX

"Used with the -F recovery option. Determines whether extreme measures to find a valid txg should take place. This allows the pool to be rolled back to a txg which is no longer guaranteed to be consistent. Pools imported at an inconsistent txg may contain uncorrectable checksum errors. For more details about pool recovery mode, see the -F option, above. WARNING: This option can be extremely hazardous to the health of your pool and should only be used as a last resort."

Link to comment

im wondering if the fault could be with my cache. i was reading over this post https://forums.unraid.net/topic/129408-solved-read-only-file-system-after-crash-of-cache-pool/ 
this is the very first error i got before the zfs error. i only have a picture of it so will sum it up.

 

btrfs fritical device nvme unable to find logical device

page cache invalidation failure on direct io

file /var/cache/netdata/dbengine/datafile-

then a second invalidationfauluer in same folder but different file.

the file was datafile-1-0000002124.ndf pid 5903

the second was 2177 pid 59017

 

i had already tried the -f import before. im wondering if it could possible the with the nvme?

Edited by myths
Link to comment

im running a hardware diag on server to see if it finds anything as well. also going to boot up another os in a few days and try to connect zfs to it. i see some people with these panic errors are able to open on another computer or in true/free nas whatever its called now change the zfs commands to bypass checks and start pool. not found a way to do that on there. pretty much grabbing at straws to see what i can do before rebuild

Link to comment
6 minutes ago, myths said:

on another computer or in true/free nas whatever its called now change the zfs commands to bypass checks and start pool. not found a way to do that on there.

The commands are the same in any OS, they are zfs specific, like the examples I posted above.

Link to comment

i didnt see anywhere to put them. the guides say to edit the zfs boot files and add command lines to them. unraid just now supports zfs officially so maybe its hiding somewhere ive not looked yet. did the scans with zero errors. so last thing to do is try that. ill try the fx but i think in the boot file i saw commants to bypass fail safe checks and other checks before loading the zfs as in not to check for any curroption. not sure im half asleep right now. 2 days of reading up on all this XD. thanks for the help.

 

said x was an invalid option

Edited by myths
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...