Jump to content

Add cache disk to pool & array will no longer start


Recommended Posts

I tried to add a second disk to the cache pool which the original disk was formatted in BTRFS so it should be compatible with the pool. Once I've added the disk it was stuck at mounting for 5-6 hours and it was eventually rebooted. Since rebooting it gets past mounting but gets stuck at Starting Services. 

 

It seems to start fine in maintenance mode but will not start up normally. Safemode also doesn't seem to help the situation. I cannot seem to gather diagnostics as it seems to get stuck downloading. All I can see in the sys log:

 

May 24 20:29:36 Tower kernel: CPU: 3 PID: 58 Comm: kworker/u16:2 Tainted: G        W         5.10.28-Unraid #1
May 24 20:29:36 Tower kernel: Hardware name: BASE_BOARD_MANUFACTURER MODEL_NAME/132-SE-E775, BIOS 4.6.5 04/09/2018
May 24 20:29:36 Tower kernel: Workqueue: events_unbound btrfs_async_reclaim_data_space
May 24 20:29:36 Tower kernel: Call Trace:
May 24 20:29:36 Tower kernel: <IRQ>
May 24 20:29:36 Tower kernel: dump_stack+0x6b/0x83
May 24 20:29:36 Tower kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
May 24 20:29:36 Tower kernel: nmi_cpu_backtrace+0x7d/0x8f
May 24 20:29:36 Tower kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
May 24 20:29:36 Tower kernel: rcu_dump_cpu_stacks+0x9f/0xc6
May 24 20:29:36 Tower kernel: rcu_sched_clock_irq+0x1ec/0x543
May 24 20:29:36 Tower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
May 24 20:29:36 Tower kernel: update_process_times+0x50/0x6e
May 24 20:29:36 Tower kernel: tick_sched_timer+0x36/0x64
May 24 20:29:36 Tower kernel: __hrtimer_run_queues+0xb7/0x10b
May 24 20:29:36 Tower kernel: ? tick_sched_do_timer+0x39/0x39
May 24 20:29:36 Tower kernel: hrtimer_interrupt+0x8d/0x15b
May 24 20:29:36 Tower kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
May 24 20:29:36 Tower kernel: asm_call_irq_on_stack+0x12/0x20
May 24 20:29:36 Tower kernel: </IRQ>
May 24 20:29:36 Tower kernel: sysvec_apic_timer_interrupt+0x71/0x95
May 24 20:29:36 Tower kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
May 24 20:29:36 Tower kernel: RIP: 0010:btrfs_async_reclaim_data_space+0x40/0xf4
May 24 20:29:36 Tower kernel: Code: 4c 8d a5 b8 00 00 00 48 89 ef e8 2d 9a 42 00 48 8b 85 b8 00 00 00 49 39 c4 74 62 48 89 ef 4c 8b ad d0 00 00 00 e8 56 f3 ff ff <f6> 45 40 01 75 1e 4c 89 f7 b9 08 00 00 00 48 83 ca ff 48 89 ee e8
May 24 20:29:36 Tower kernel: RSP: 0018:ffffc90000233e78 EFLAGS: 00000287
May 24 20:29:36 Tower kernel: RAX: ffffc90000a43d58 RBX: ffff8881003ca000 RCX: 0000000000000000
May 24 20:29:36 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888104baa000
May 24 20:29:36 Tower kernel: RBP: ffff888104baa000 R08: 0000000000000001 R09: 0000646e756f626e
May 24 20:29:36 Tower kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: ffff888104baa0b8
May 24 20:29:36 Tower kernel: R13: 0000000000000000 R14: ffff88814205d000 R15: 0000000000000000
May 24 20:29:36 Tower kernel: ? btrfs_async_reclaim_data_space+0x40/0xf4
May 24 20:29:36 Tower kernel: process_one_work+0x13c/0x1d5
May 24 20:29:36 Tower kernel: worker_thread+0x18b/0x22f
May 24 20:29:36 Tower kernel: ? process_scheduled_works+0x27/0x27
May 24 20:29:36 Tower kernel: kthread+0xe5/0xea
May 24 20:29:36 Tower kernel: ? __kthread_bind_mask+0x57/0x57
May 24 20:29:36 Tower kernel: ret_from_fork+0x22/0x30

 

Any thoughts?

 

I've attached a full syslog

tower-syslog-20210525-0135.zip

Edited by Mike S
Link to comment
  • Mike S changed the title to Add cache disk to pool & array will no longer start

According to the log there's a new cache device but there's also a device missing:

 

May 24 20:25:31 Tower emhttpd: /mnt/cache TotDevices: 2
May 24 20:25:31 Tower emhttpd: /mnt/cache NumDevices: 2
May 24 20:25:31 Tower emhttpd: /mnt/cache NumFound: 1
May 24 20:25:31 Tower emhttpd: /mnt/cache NumMissing: 1
May 24 20:25:31 Tower emhttpd: /mnt/cache NumMisplaced: 0
May 24 20:25:31 Tower emhttpd: /mnt/cache NumExtra: 1
May 24 20:25:31 Tower emhttpd: /mnt/cache LuksState: 0
May 24 20:25:31 Tower emhttpd: shcmd (332): mount -t btrfs -o noatime,space_cache=v2,discard=async,degraded -U 8af1fbf7-4e95-4aa6-aa41-91cf4fcabeec /mnt/cache
May 24 20:25:31 Tower kernel: BTRFS info (device sdf1): turning on async discard
May 24 20:25:31 Tower kernel: BTRFS info (device sdf1): allowing degraded mounts
May 24 20:25:31 Tower kernel: BTRFS info (device sdf1): using free space tree
May 24 20:25:31 Tower kernel: BTRFS info (device sdf1): has skinny extents
May 24 20:25:31 Tower kernel: BTRFS warning (device sdf1): devid 2 uuid 34efdbf3-fad8-42bb-acdc-bba1ed3bcaf4 is missing
May 24 20:25:31 Tower kernel: BTRFS info (device sdf1): enabling ssd optimizations
May 24 20:25:31 Tower kernel: BTRFS error (device sdf1): balance: invalid convert data profile raid1
May 24 20:25:31 Tower kernel: BTRFS warning (device sdf1): Skipping commit of aborted transaction.

 

Do you still have the missing device?

  • Thanks 1
Link to comment

The only device that changed was adding the new one, but it crashed on the original mounting so maybe its recognizing that as another device that is missing? That looks like it's looking for 2 total but found 1 and 1 extra?

 

I also in that message see:

May 24 20:25:31 Tower kernel: BTRFS error (device sdf1): balance: invalid convert data profile raid1

Does that have anything to do with it?

Edited by Mike S
Link to comment

Total devices means the pool has 2 devices, and 2 devices are assigned (num devices), but the most important part is this one:

 

May 24 20:25:31 Tower emhttpd: /mnt/cache NumFound: 1
May 24 20:25:31 Tower emhttpd: /mnt/cache NumMissing: 1
May 24 20:25:31 Tower emhttpd: /mnt/cache NumMisplaced: 0
May 24 20:25:31 Tower emhttpd: /mnt/cache NumExtra: 1

 

This means only 1 pool device was found, there's 1 missing and 1 extra (new device).

 

1 hour ago, Mike S said:

Does that have anything to do with it?

Yes, it can't convert to raid 1 because of the missing device.

 

 

 

 

Link to comment

Still stuck at "Starting Services" with only the original see this now in the syslog instead:

May 25 09:56:08 Tower emhttpd: shcmd (356): mkdir -p /mnt/cache
May 25 09:56:08 Tower emhttpd: /mnt/cache uuid: 8af1fbf7-4e95-4aa6-aa41-91cf4fcabeec
May 25 09:56:08 Tower emhttpd: /mnt/cache TotDevices: 2
May 25 09:56:08 Tower emhttpd: /mnt/cache NumDevices: 1
May 25 09:56:08 Tower emhttpd: /mnt/cache NumFound: 1
May 25 09:56:08 Tower emhttpd: /mnt/cache NumMissing: 1
May 25 09:56:08 Tower emhttpd: /mnt/cache NumMisplaced: 0
May 25 09:56:08 Tower emhttpd: /mnt/cache NumExtra: 0
May 25 09:56:08 Tower emhttpd: /mnt/cache LuksState: 0
May 25 09:56:08 Tower emhttpd: shcmd (357): mount -t btrfs -o noatime,space_cache=v2,discard=async,degraded -U 8af1fbf7-4e95-4aa6-aa41-91cf4fcabeec /mnt/cache

 

Link to comment

You'll need to recreate the pool, but if there's still important data there you can try to recover with this:

 

First create a temp dir:

mkdir /x

then try to mount with skip balance:

mount -o degraded,skip_balance /dev/sdf1 /x

If that doesn't work try read-only:

mount -o degraded,ro /dev/sdf1 /x

If either works you can browse /x and copy any important data to the array.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...