July 5, 2025Jul 5 I got tired of cache disks failing (about once a year) and restoring from appdata and VM backups, some of which always failed to work, so I made a ZFS cache pool of three mirrored disks.I just got finished setting everything back up, when some dockers stopped working today. I was unable to stop the failed dockers, so I killed the docker service and shut down the array, but due to active processes on the cache pool which did not respond to kill -9 i had to hard reset the server. Starting up again the cache pool of three disks is unmountable. All three disks complete SMART short tests without error.Is there anything I can do to save the pool?
July 5, 2025Jul 5 Author root@Shrek:~# zpool import pool: cachepool id: 624089324922130285 state: ONLINEaction: The pool can be imported using its name or numeric identifier.config: cachepool ONLINE mirror-0 ONLINE sdi1 ONLINE sdh1 ONLINE sdl1 ONLINEshrek-diagnostics-20250705-1203.zip
July 5, 2025Jul 5 Community Expert Post the output fromzpool import cachepool -o readonly=onIf that fails, and it will likely will, post the output fromzdb -l /dev/sdizdb -l /dev/sdhzdb -l /dev/sdl
July 5, 2025Jul 5 Author root@Shrek:~# zpool import cachepool -o readonly=oncannot import 'cachepool': I/O error Destroy and re-create the pool from a backup source.root@Shrek:~# zdb -l /dev/sdifailed to unpack label 0failed to unpack label 1failed to unpack label 2failed to unpack label 3root@Shrek:~# zdb -l /dev/sdhfailed to unpack label 0failed to unpack label 1failed to unpack label 2failed to unpack label 3root@Shrek:~# zdb -l /dev/sdlfailed to unpack label 0failed to unpack label 1failed to unpack label 2failed to unpack label 3root@Shrek:~#
July 5, 2025Jul 5 Community Expert Sorry, wrong commands, should be:zdb -l /dev/sdi1zdb -l /dev/sdh1zdb -l /dev/sdl1
July 5, 2025Jul 5 Author root@Shrek:~# zdb -l /dev/sdi1------------------------------------LABEL 0------------------------------------ version: 5000 name: 'cachepool' state: 0 txg: 613982 pool_guid: 624089324922130285 errata: 0 hostname: 'Shrek' top_guid: 3293130853435037943 guid: 11309557061343029751 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 3293130853435037943 whole_disk: 0 metaslab_array: 132 metaslab_shift: 34 ashift: 12 asize: 8001558413312 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 11309557061343029751 path: '/dev/sdi1' whole_disk: 0 DTL: 96 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 15152360535842760272 path: '/dev/sdh1' whole_disk: 0 DTL: 88 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 17771728439540138945 path: '/dev/sdl1' whole_disk: 0 DTL: 97 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data com.klarasystems:vdev_zaps_v2 labels = 0 1 2 3root@Shrek:~# zdb -l /dev/sdh1------------------------------------LABEL 0------------------------------------ version: 5000 name: 'cachepool' state: 0 txg: 613982 pool_guid: 624089324922130285 errata: 0 hostname: 'Shrek' top_guid: 3293130853435037943 guid: 15152360535842760272 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 3293130853435037943 whole_disk: 0 metaslab_array: 132 metaslab_shift: 34 ashift: 12 asize: 8001558413312 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 11309557061343029751 path: '/dev/sdi1' whole_disk: 0 DTL: 96 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 15152360535842760272 path: '/dev/sdh1' whole_disk: 0 DTL: 88 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 17771728439540138945 path: '/dev/sdl1' whole_disk: 0 DTL: 97 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data com.klarasystems:vdev_zaps_v2 labels = 0 1 2 3root@Shrek:~# zdb -l /dev/sdl1------------------------------------LABEL 0------------------------------------ version: 5000 name: 'cachepool' state: 0 txg: 613982 pool_guid: 624089324922130285 errata: 0 hostname: 'Shrek' top_guid: 3293130853435037943 guid: 17771728439540138945 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 3293130853435037943 whole_disk: 0 metaslab_array: 132 metaslab_shift: 34 ashift: 12 asize: 8001558413312 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 11309557061343029751 path: '/dev/sdi1' whole_disk: 0 DTL: 96 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 15152360535842760272 path: '/dev/sdh1' whole_disk: 0 DTL: 88 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 17771728439540138945 path: '/dev/sdl1' whole_disk: 0 DTL: 97 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data com.klarasystems:vdev_zaps_v2 labels = 0 1 2 3
July 5, 2025Jul 5 Community Expert 1 minute ago, MHDFreefall said:txg: 613982This is the same for all 3 devices, which is good, but not sure why it's failing to mount, possibly some metadata corruption, you can try this, it will try to rewind the pool a few txgs, so if it works you may lose the last few seconds of data:zpool import -fF -o readonly=on cachepoolIf it still fails, post the output from:tail -n 200 /proc/spl/kstat/zfs/dbgmsg
July 5, 2025Jul 5 Author root@Shrek:~# zpool import -fF -o readonly=on cachepoolcannot import 'cachepool': I/O error Destroy and re-create the pool from a backup source.dbgmsg.txt
July 5, 2025Jul 5 Community Expert Solution So the problem does appear to be corrupt metadata:1751711042 ffff88829ab09100 spa_misc.c:429:spa_load_note(): spa_load(cachepool, config trusted): spa_load_verify found 1 metadata errors and 1 data errors1751711042 ffff88829ab09100 spa_misc.c:415:spa_load_failed(): spa_load(cachepool, config trusted): FAILED: spa_load_verify failed [error=5]You can try to disable ZFS data verification, note that doing this may return corrupt data, but possibly still allow you to backup most of it, then reformat the pool:echo 0 >/sys/module/zfs/parameters/spa_load_verify_dataecho 0 >/sys/module/zfs/parameters/spa_load_verify_metadataThen try again importing the pool read-onlyzpool import -o readonly=on cachepoolIf it works, start the array, pool will still show unmountable on the GUI, but the data should be under /mnt/cachepool, then backup what you can to another disk/pool, when done reboot to reset the ZFS settings.
July 5, 2025Jul 5 Author Great, disk mounted! Thanks a lot for the help. Even if there is an error somewhere most of the files should be ok.Is there any reason to not trust the disks? I will run a long SMART test for a while to see if any errors pop up. Do you know what can have caused the corrupt data if not disk failure?I guess there is no advantage to having a three disk mirror when the file system fails. I will make the disks into a two disk mirrored pool, and set up some kind of periodic backup to the last disk in order to have a ready to go but slightly out of date mirror if the cache pool fails again.
July 5, 2025Jul 5 Author Do I need to do anything to turn ZFS data verification back on, or will it revert after the next reboot?
July 5, 2025Jul 5 Community Expert 8 minutes ago, MHDFreefall said:Is there any reason to not trust the disks?I would not suspect the disks, but if it happens again, there may be an underlying hardware issue, like bad RAM or a controller/firmware issue.7 minutes ago, MHDFreefall said:or will it revert after the next reboot?This.
July 5, 2025Jul 5 Author Perfect, thanks again. Procedure saved to my "unraidhowto.txt" for future reference.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.