myths Posted June 2, 2023 Share Posted June 2, 2023 went on a trip out of town recently and had alot of storms here. no power outages for more than a minute or two. server is on backup power so dont think it shut down. During bootup i get a panic and doesnt go past there. it looks like the drives are all being found and ran test but thats as far as im able to get. This is the trace log i found. i have all my storage disks in a zfs pool. no zfs pools being found. but not sure if its related to error. this is what im getting on normal boot and the trace from logs. freezes at panic if anyone could point me in the right direction that would help alot this is not on the new unraid with zfs. Jun 2 00:38:03 Tower kernel: PANIC: zfs: removing nonexistent segment from range tree (offset=1f10c3514000 size=2000) Jun 2 00:38:03 Tower kernel: Showing stack for process 57252 Jun 2 00:38:03 Tower kernel: CPU: 81 PID: 57252 Comm: z_wr_iss Tainted: P O 5.19.17-Unraid #2 Jun 2 00:38:03 Tower kernel: Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.13.0 05/14/2021 Jun 2 00:38:03 Tower kernel: Call Trace: Jun 2 00:38:03 Tower kernel: <TASK> Jun 2 00:38:03 Tower kernel: dump_stack_lvl+0x44/0x5c Jun 2 00:38:03 Tower kernel: vcmn_err+0x86/0xc3 [spl] Jun 2 00:38:03 Tower kernel: ? pn_free+0x2a/0x2a [zfs] Jun 2 00:38:03 Tower kernel: ? bt_grow_leaf+0xc3/0xd6 [zfs] Jun 2 00:38:03 Tower kernel: ? zfs_btree_insert_leaf_impl+0x21/0x44 [zfs] Jun 2 00:38:03 Tower kernel: ? pn_free+0x2a/0x2a [zfs] Jun 2 00:38:03 Tower kernel: ? zfs_btree_find_in_buf+0x4b/0x97 [zfs] Jun 2 00:38:03 Tower kernel: zfs_panic_recover+0x6d/0x88 [zfs] Jun 2 00:38:03 Tower kernel: range_tree_remove_impl+0xd3/0x416 [zfs] Jun 2 00:38:03 Tower kernel: space_map_load_callback+0x70/0x79 [zfs] Jun 2 00:38:03 Tower kernel: space_map_iterate+0x2ec/0x341 [zfs] Jun 2 00:38:03 Tower kernel: ? spa_stats_destroy+0x16c/0x16c [zfs] Jun 2 00:38:03 Tower kernel: space_map_load_length+0x94/0xd0 [zfs] Jun 2 00:38:03 Tower kernel: metaslab_load+0x34d/0x6f5 [zfs] Jun 2 00:38:03 Tower kernel: ? spl_kmem_alloc_impl+0xc6/0xf7 [spl] Jun 2 00:38:03 Tower kernel: ? __kmalloc_node+0x1b4/0x1df Jun 2 00:38:03 Tower kernel: metaslab_activate+0x3b/0x1f4 [zfs] Jun 2 00:38:03 Tower kernel: metaslab_alloc_dva+0x7e2/0xf39 [zfs] Jun 2 00:38:03 Tower kernel: ? spl_kmem_cache_alloc+0x4a/0x608 [spl] Jun 2 00:38:03 Tower kernel: metaslab_alloc+0xfd/0x1f6 [zfs] Jun 2 00:38:03 Tower kernel: zio_dva_allocate+0xe8/0x738 [zfs] Jun 2 00:38:03 Tower kernel: ? spl_kmem_alloc_impl+0xc6/0xf7 [spl] Jun 2 00:38:03 Tower kernel: ? preempt_latency_start+0x2b/0x46 Jun 2 00:38:03 Tower kernel: ? _raw_spin_lock+0x13/0x1c Jun 2 00:38:03 Tower kernel: ? _raw_spin_unlock+0x14/0x29 Jun 2 00:38:03 Tower kernel: ? tsd_hash_search+0x74/0x81 [spl] Jun 2 00:38:03 Tower kernel: zio_execute+0xb2/0xdd [zfs] Jun 2 00:38:03 Tower kernel: taskq_thread+0x277/0x3a5 [spl] Jun 2 00:38:03 Tower kernel: ? wake_up_q+0x44/0x44 Jun 2 00:38:03 Tower kernel: ? zio_taskq_member.constprop.0.isra.0+0x4f/0x4f [zfs] Jun 2 00:38:03 Tower kernel: ? taskq_dispatch_delay+0x115/0x115 [spl] Jun 2 00:38:03 Tower kernel: kthread+0xe7/0xef Jun 2 00:38:03 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b Jun 2 00:38:03 Tower kernel: ret_from_fork+0x22/0x30 Jun 2 00:38:03 Tower kernel: </TASK> Jun 2 00:41:36 Tower kernel: md: sync done. time=729sec Jun 2 00:41:36 Tower kernel: md: recovery thread: exit status: 0 Jun 2 00:53:25 Tower kernel: mdcmd (37): nocheck cancel Quote Link to comment
myths Posted June 2, 2023 Author Share Posted June 2, 2023 I notice 1 of the lights to a hard drive tray isnt lighting up, but the disk is spinning up. bad light or something more, not sure. tried booting without disk in to see if different error but same thing. ive looked over all connections and nothing seamed off. Quote Link to comment
JorgeB Posted June 2, 2023 Share Posted June 2, 2023 Rename the zfs plugin plg file so it doesn't install and post diags if it boots. Quote Link to comment
myths Posted June 2, 2023 Author Share Posted June 2, 2023 (edited) i just did this. if i remove the zfs plugin it boots. I also went a step further and unmounted all my drives and installed one at a time to see if a disk was causing this. what ive found out is i have a 4disk backplane inside server for the 4th vdev, if i plug one of the disks into it i get the error -.-. when i run without plugin i checked connected disks in console and it sees all my disks. but with plugin installed i cant boot with a disk plugged in. could this be a backplane failure or possible vdev failure of those 4 disk. i didnt check diags while doing all this but i can reinstall the drives and get them if it would help. Edited June 2, 2023 by myths Quote Link to comment
JorgeB Posted June 2, 2023 Share Posted June 2, 2023 You could try re-installing the plugin but I believe the plugin tries to auto import all the pools? Not sure since I've never used it, you could also update to v6.12-rc6 then try to import the pool read only to see if zfs doesn't panic. Quote Link to comment
myths Posted June 2, 2023 Author Share Posted June 2, 2023 i tried to do a fresh install of plugin, during install it would cause the panic and freeze. i also tried to update but couldnt find much info on importing and rolled back. Quote Link to comment
JorgeB Posted June 2, 2023 Share Posted June 2, 2023 Update to rc6 then try importing the pool read-only using the CLI: zpool import -o readonly=on pool_name If it works backup data and recreate the pool. Quote Link to comment
myths Posted June 3, 2023 Author Share Posted June 3, 2023 ahh, hmm, know of any alternatives besides backup? backing up 200tb of data would require quite alot in drives and a new server. Quote Link to comment
JorgeB Posted June 3, 2023 Share Posted June 3, 2023 There fsck for zfs, if the filesystem is corrupt not many options to fix it, you can try reverting some transitions as a last resort. Quote Link to comment
myths Posted June 4, 2023 Author Share Posted June 4, 2023 how would i go about reverting some, ive got snapshots but in read only im not able to. Quote Link to comment
JorgeB Posted June 4, 2023 Share Posted June 4, 2023 Note that some data loss can occur with these: zpool import -F "Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported." zpool import -FX "Used with the -F recovery option. Determines whether extreme measures to find a valid txg should take place. This allows the pool to be rolled back to a txg which is no longer guaranteed to be consistent. Pools imported at an inconsistent txg may contain uncorrectable checksum errors. For more details about pool recovery mode, see the -F option, above. WARNING: This option can be extremely hazardous to the health of your pool and should only be used as a last resort." Quote Link to comment
myths Posted June 5, 2023 Author Share Posted June 5, 2023 (edited) im wondering if the fault could be with my cache. i was reading over this post https://forums.unraid.net/topic/129408-solved-read-only-file-system-after-crash-of-cache-pool/ this is the very first error i got before the zfs error. i only have a picture of it so will sum it up. btrfs fritical device nvme unable to find logical device page cache invalidation failure on direct io file /var/cache/netdata/dbengine/datafile- then a second invalidationfauluer in same folder but different file. the file was datafile-1-0000002124.ndf pid 5903 the second was 2177 pid 59017 i had already tried the -f import before. im wondering if it could possible the with the nvme? Edited June 5, 2023 by myths Quote Link to comment
JorgeB Posted June 5, 2023 Share Posted June 5, 2023 If the NVMe is using btrfs it won't have anything to do with zfs, though if there were issues with both pools at the same time it could suggest a hardware issue. Quote Link to comment
myths Posted June 5, 2023 Author Share Posted June 5, 2023 im running a hardware diag on server to see if it finds anything as well. also going to boot up another os in a few days and try to connect zfs to it. i see some people with these panic errors are able to open on another computer or in true/free nas whatever its called now change the zfs commands to bypass checks and start pool. not found a way to do that on there. pretty much grabbing at straws to see what i can do before rebuild Quote Link to comment
JorgeB Posted June 5, 2023 Share Posted June 5, 2023 6 minutes ago, myths said: on another computer or in true/free nas whatever its called now change the zfs commands to bypass checks and start pool. not found a way to do that on there. The commands are the same in any OS, they are zfs specific, like the examples I posted above. Quote Link to comment
myths Posted June 5, 2023 Author Share Posted June 5, 2023 (edited) i didnt see anywhere to put them. the guides say to edit the zfs boot files and add command lines to them. unraid just now supports zfs officially so maybe its hiding somewhere ive not looked yet. did the scans with zero errors. so last thing to do is try that. ill try the fx but i think in the boot file i saw commants to bypass fail safe checks and other checks before loading the zfs as in not to check for any curroption. not sure im half asleep right now. 2 days of reading up on all this XD. thanks for the help. said x was an invalid option Edited June 5, 2023 by myths Quote Link to comment
JorgeB Posted June 6, 2023 Share Posted June 6, 2023 9 hours ago, myths said: i didnt see anywhere to put them You use the CLI (terminal). Quote Link to comment
myths Posted June 7, 2023 Author Share Posted June 7, 2023 it seams all the errors im getting are isolated to 1 vdev. Do you know if its possible to just sacrafice that vdev and have the pool work with just the other 3? Quote Link to comment
JorgeB Posted June 7, 2023 Share Posted June 7, 2023 2 hours ago, myths said: Do you know if its possible to just sacrafice that vdev and have the pool work with just the other 3? That's not possible, with zfs raidz if one vdev is dead the pool is dead. Quote Link to comment
myths Posted June 9, 2023 Author Share Posted June 9, 2023 thanks ive got some new drives ordered to backup what i can. i had thought that each vdev was independant of other vdevs. looks like they strip across all so that makes sense. thanks for all your help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.