Array Stuck on Mounting Cache

preventive-dungeon1911 · February 27

Hi all, hope you're doing well.

After a restart, when attempting to start the array, my cache drive fails to mount. I searched around, and it seems like it's most likely a hardware issue, but I'm not sure which component is causing it (I see my motherboard logged but I'm not sure if that matters). I ran a memtest and didn't receive any errors after around 10 hours a few weeks ago after I rebuilt my cache from btrfs to zfs and that was fine. Below is a snippet of the of the System Log right after I try to start the array. The diagnostics zip is also attached.

Feb 27 18:34:17 HS kernel: mdcmd (31): set md_num_stripes 1280
Feb 27 18:34:17 HS kernel: mdcmd (32): set md_queue_limit 80
Feb 27 18:34:17 HS kernel: mdcmd (33): set md_sync_limit 5
Feb 27 18:34:17 HS kernel: mdcmd (34): set md_write_method
Feb 27 18:34:17 HS kernel: mdcmd (35): start STOPPED
Feb 27 18:34:17 HS kernel: unraid: allocating 20870K for 1280 stripes (4 disks)
Feb 27 18:34:17 HS kernel: md1p1: running, size: 7814026532 blocks
Feb 27 18:34:17 HS kernel: md2p1: running, size: 7814026532 blocks
Feb 27 18:34:17 HS emhttpd: shcmd (3626): udevadm settle
Feb 27 18:34:17 HS emhttpd: Opening encrypted volumes...
Feb 27 18:34:17 HS emhttpd: shcmd (3627): touch /boot/config/forcesync
Feb 27 18:34:17 HS emhttpd: Mounting disks...
Feb 27 18:34:17 HS emhttpd: mounting /mnt/disk1
Feb 27 18:34:17 HS emhttpd: shcmd (3628): mkdir -p /mnt/disk1
Feb 27 18:34:17 HS emhttpd: shcmd (3629): mount -t xfs -o noatime,nouuid /dev/md1p1 /mnt/disk1
Feb 27 18:34:17 HS kernel: SGI XFS with ACLs, security attributes, no debug enabled
Feb 27 18:34:17 HS kernel: XFS (md1p1): Mounting V5 Filesystem
Feb 27 18:34:17 HS kernel: XFS (md1p1): Starting recovery (logdev: internal)
Feb 27 18:34:17 HS kernel: XFS (md1p1): Ending recovery (logdev: internal)
Feb 27 18:34:17 HS kernel: xfs filesystem being mounted at /mnt/disk1 supports timestamps until 2038 (0x7fffffff)
Feb 27 18:34:17 HS emhttpd: shcmd (3630): xfs_growfs /mnt/disk1
Feb 27 18:34:17 HS root: meta-data=/dev/md1p1             isize=512    agcount=8, agsize=268435455 blks
Feb 27 18:34:17 HS root:          =                       sectsz=512   attr=2, projid32bit=1
Feb 27 18:34:17 HS root:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Feb 27 18:34:17 HS root:          =                       reflink=1    bigtime=0 inobtcount=0 nrext64=0
Feb 27 18:34:17 HS root: data     =                       bsize=4096   blocks=1953506633, imaxpct=5
Feb 27 18:34:17 HS root:          =                       sunit=0      swidth=0 blks
Feb 27 18:34:17 HS root: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Feb 27 18:34:17 HS root: log      =internal log           bsize=4096   blocks=521728, version=2
Feb 27 18:34:17 HS root:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Feb 27 18:34:17 HS root: realtime =none                   extsz=4096   blocks=0, rtextents=0
Feb 27 18:34:17 HS emhttpd: mounting /mnt/disk2
Feb 27 18:34:17 HS emhttpd: shcmd (3631): mkdir -p /mnt/disk2
Feb 27 18:34:17 HS emhttpd: shcmd (3632): mount -t xfs -o noatime,nouuid /dev/md2p1 /mnt/disk2
Feb 27 18:34:17 HS kernel: XFS (md2p1): Mounting V5 Filesystem
Feb 27 18:34:17 HS kernel: XFS (md2p1): Starting recovery (logdev: internal)
Feb 27 18:34:18 HS kernel: XFS (md2p1): Ending recovery (logdev: internal)
Feb 27 18:34:18 HS emhttpd: shcmd (3633): xfs_growfs /mnt/disk2
Feb 27 18:34:18 HS kernel: xfs filesystem being mounted at /mnt/disk2 supports timestamps until 2038 (0x7fffffff)
Feb 27 18:34:18 HS root: meta-data=/dev/md2p1             isize=512    agcount=8, agsize=268435455 blks
Feb 27 18:34:18 HS root:          =                       sectsz=512   attr=2, projid32bit=1
Feb 27 18:34:18 HS root:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Feb 27 18:34:18 HS root:          =                       reflink=1    bigtime=0 inobtcount=0 nrext64=0
Feb 27 18:34:18 HS root: data     =                       bsize=4096   blocks=1953506633, imaxpct=5
Feb 27 18:34:18 HS root:          =                       sunit=0      swidth=0 blks
Feb 27 18:34:18 HS root: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Feb 27 18:34:18 HS root: log      =internal log           bsize=4096   blocks=521728, version=2
Feb 27 18:34:18 HS root:          =                       sectsz=512   sunit=0 blks, lazy-count=1
Feb 27 18:34:18 HS root: realtime =none                   extsz=4096   blocks=0, rtextents=0
Feb 27 18:34:18 HS emhttpd: mounting /mnt/cache
Feb 27 18:34:18 HS emhttpd: shcmd (3634): mkdir -p /mnt/cache
Feb 27 18:34:18 HS emhttpd: /usr/sbin/zpool import -f -d /dev/sdd1 2>&1
Feb 27 18:34:18 HS emhttpd:    pool: cache
Feb 27 18:34:18 HS emhttpd:      id: 12260050682816622686
Feb 27 18:34:18 HS emhttpd: shcmd (3635): /usr/sbin/zpool import -f -N -o autoexpand=on  -d /dev/sdd1 12260050682816622686 cache
Feb 27 18:34:21 HS kernel: VERIFY3(size <= rt->rt_space) failed (281442900013056 <= 3966509056)
Feb 27 18:34:21 HS kernel: PANIC at range_tree.c:436:range_tree_remove_impl()
Feb 27 18:34:21 HS kernel: Showing stack for process 3045
Feb 27 18:34:21 HS kernel: CPU: 0 PID: 3045 Comm: z_wr_iss Tainted: P           O       6.1.74-Unraid #1
Feb 27 18:34:21 HS kernel: Hardware name: ASUS All Series/MAXIMUS VI FORMULA, BIOS 1603 08/15/2014
Feb 27 18:34:21 HS kernel: Call Trace:
Feb 27 18:34:21 HS kernel: <TASK>
Feb 27 18:34:21 HS kernel: dump_stack_lvl+0x44/0x5c
Feb 27 18:34:21 HS kernel: spl_panic+0xd0/0xe8 [spl]
Feb 27 18:34:21 HS kernel: ? preempt_latency_start+0x2b/0x46
Feb 27 18:34:21 HS kernel: ? _raw_spin_lock+0x13/0x1c
Feb 27 18:34:21 HS kernel: ? bt_grow_leaf+0xc3/0xd6 [zfs]
Feb 27 18:34:21 HS kernel: ? pn_free+0x24/0x24 [zfs]
Feb 27 18:34:21 HS kernel: ? zfs_btree_find_in_buf+0x4f/0x94 [zfs]
Feb 27 18:34:21 HS kernel: ? zfs_btree_find+0x16d/0x1b0 [zfs]
Feb 27 18:34:21 HS kernel: ? rs_get_start+0xc/0x1d [zfs]
Feb 27 18:34:21 HS kernel: range_tree_remove_impl+0x77/0x406 [zfs]
Feb 27 18:34:21 HS kernel: space_map_load_callback+0x70/0x79 [zfs]
Feb 27 18:34:21 HS kernel: space_map_iterate+0x2d6/0x324 [zfs]
Feb 27 18:34:21 HS kernel: ? spa_stats_destroy+0x16c/0x16c [zfs]
Feb 27 18:34:21 HS kernel: space_map_load_length+0x93/0xcb [zfs]
Feb 27 18:34:21 HS kernel: metaslab_load+0x33b/0x6e3 [zfs]
Feb 27 18:34:21 HS kernel: ? slab_post_alloc_hook+0x4d/0x15e
Feb 27 18:34:21 HS kernel: ? __slab_free+0x83/0x229
Feb 27 18:34:21 HS kernel: ? spl_kmem_alloc_impl+0xc1/0xf2 [spl]
Feb 27 18:34:21 HS kernel: ? __kmem_cache_alloc_node+0x118/0x147
Feb 27 18:34:21 HS kernel: metaslab_activate+0x36/0x1f1 [zfs]
Feb 27 18:34:21 HS kernel: metaslab_alloc_dva+0x8bc/0xfce [zfs]
Feb 27 18:34:21 HS kernel: ? preempt_latency_start+0x2b/0x46
Feb 27 18:34:21 HS kernel: metaslab_alloc+0x107/0x1fd [zfs]
Feb 27 18:34:21 HS kernel: zio_dva_allocate+0xee/0x73f [zfs]
Feb 27 18:34:21 HS kernel: ? spl_kmem_alloc_impl+0xc1/0xf2 [spl]
Feb 27 18:34:21 HS kernel: ? preempt_latency_start+0x2b/0x46
Feb 27 18:34:21 HS kernel: ? _raw_spin_lock+0x13/0x1c
Feb 27 18:34:21 HS kernel: ? _raw_spin_unlock+0x14/0x29
Feb 27 18:34:21 HS kernel: ? tsd_hash_search+0x70/0x7d [spl]
Feb 27 18:34:21 HS kernel: zio_execute+0xb4/0xdf [zfs]
Feb 27 18:34:21 HS kernel: taskq_thread+0x269/0x38a [spl]
Feb 27 18:34:21 HS kernel: ? wake_up_q+0x44/0x44
Feb 27 18:34:21 HS kernel: ? zio_subblock+0x22/0x22 [zfs]
Feb 27 18:34:21 HS kernel: ? taskq_dispatch_delay+0x106/0x106 [spl]
Feb 27 18:34:21 HS kernel: kthread+0xe7/0xef
Feb 27 18:34:21 HS kernel: ? kthread_complete_and_exit+0x1b/0x1b
Feb 27 18:34:21 HS kernel: ret_from_fork+0x22/0x30
Feb 27 18:34:21 HS kernel: </TASK>

Notable Errors (IMO, I might be looking at the wrong logs)

Feb 27 18:34:21 HS kernel: VERIFY3(size <= rt->rt_space) failed (281442900013056 <= 3966509056)
Feb 27 18:34:21 HS kernel: PANIC at range_tree.c:436:range_tree_remove_impl()
Feb 27 18:34:21 HS kernel: Showing stack for process 3045
Feb 27 18:34:21 HS kernel: CPU: 0 PID: 3045 Comm: z_wr_iss Tainted: P           O       6.1.74-Unraid #1
Feb 27 18:34:21 HS kernel: Hardware name: ASUS All Series/MAXIMUS VI FORMULA, BIOS 1603 08/15/2014

Please let me know if I'm missing any additional information.

Thank you!

hs-diagnostics-20240227-1857.zip

JorgeB · February 28

There's filesystem corruption on the pool, see if it mounts read-only:

zpool import -o readonly=on cache

If yes, start the array, GUI will still show the pool unmountable, but the data will be under /mnt/cache, then backup and re-format.

preventive-dungeon1911 · February 28

Thank you for the reply, I'll try this ASAP. Any idea what could cause filesystem corruption? I read it might be caused by bad ram. Curious if I should swap the ram then rebuild the cache IF I can mount the drive.

JorgeB · February 28

5 minutes ago, preventive-dungeon1911 said:

Any idea what could cause filesystem corruption? I read it might be caused by bad ram.

That's a good candidate.

preventive-dungeon1911 · March 1

Looks like it might have been the RAM. I formatted my cache and installed new ram; it's been going well so far.

Array Stuck on Mounting Cache

Recommended Posts

preventive-dungeon1911

Link to comment

JorgeB

Link to comment

preventive-dungeon1911

Link to comment

JorgeB

Link to comment

preventive-dungeon1911

Link to comment

Join the conversation