Unraid hangs after array startup


Go to solution Solved by JorgeB,

Recommended Posts

I have a very strange issue where Unraid becomes unresponsive after array startup. I was able to get Diagnostics directly after boot before I started the Array. I was also able to update all Plugins and Unraid itself before I started up the Array. After that, I am not able to interact with Unraid via the webUI nor the webTerminal nor an SSH session. If I am already logged in via SSH when I start the Array, I can issue simple commands like ``htop`` but the ``diagnostics`` command or the ``reboot`` command don't work. Everything I can do is to pull the plug and crash the System. When I start the System again, the problem reoccurs.

 

To answer the question: What started this behavior?

I had an issue with a Windows VM and changed some VM configurations multiple times. That's when the System came unresponsive the first time.

 

Note: Docker and VM Engines are not startet with the array, I start them manually afterwards.

 

I am fairly confident that it's not a hardware problem:

 

- I ran memtest

- I successfully bootet from a Windows drive and ran

- Cinebench

- Furmark

 

I am close to a point where I would recreate the Unraid USB Stick... Is there hope for another solution?

unraid-server-diagnostics-20221203-1933.zip

Link to comment
Quote

Unrelated but disk 1 shows some corruption detected, once this is fixed you should run a scrub.

Yeah I noticed that as well. I also need to run a parity check due to all those unclean shutdowns.

 

Quote

Nothing obvious that I can see, try again after booting in safe mode.

I just tried that and had success in doing so. Now I need to find out which plugin caused the issue...

 

If you have any suspicion let me know. Here's a list of the plugins I use:

  • CA Auto Update Applications, ver 2021.09.24

  • Community Applications, ver 2022.12.03 

  • Dynamix System Information, ver 2020.06.21

  • Dynamix System Temperature, ver 2022.09.16b

  • Fix Common Problems, ver 2022.11.18

  • My Servers, ver 2022.11.29.0742

  • NerdTools, ver 2022.11.01

  • rclone, ver 2022.09.02 

  • Tips and Tweaks, ver 2022.11.18

  • Unassigned Devices, ver 2022.11.12a

  • Unassigned Devices Preclear, ver 2022.09.02

  • User Scripts, ver 2022.08.01

  • Virtual Machine Wake On Lan, ver 2020.04.29

of which these are critical to me:

  • Community Applications
  • My Servers

  • NerdTools

  • rclone

  • Unassigned Devices

  • User Scripts

  • Virtual Machine Wake On Lan

Link to comment

So I slowly uninstalled and reinstalled the plugins and finally found a probable cause of my problems. Just after I reinstalled NerdTools with the packages

  • borgbackup-1.1.15-x86_64-1
  • llfuse-1.4.2-x86_64-1_SBo
  • python-setuptools-57.5.0-x86_64-2
  • python3-3.9.10-x86_64-1

and started the array, the server became unresponsive again. I then tried to stop the array and found these messages in the Syslog (full syslog attached):

Dec  4 17:42:17 unraid-server kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 21079 jiffies s: 6361 root: 0x1/.
Dec  4 17:42:17 unraid-server kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x10/.
Dec  4 17:42:17 unraid-server kernel: Task dump for CPU 4:
Dec  4 17:42:17 unraid-server kernel: task:kworker/u64:2   state:R  running task     stack:    0 pid:27283 ppid:     2 flags:0x00004008
Dec  4 17:42:17 unraid-server kernel: Workqueue: btrfs-cache btrfs_work_helper
Dec  4 17:42:17 unraid-server kernel: Call Trace:
Dec  4 17:42:17 unraid-server kernel: <TASK>
Dec  4 17:42:17 unraid-server kernel: ? __schedule+0x59e/0x5f6
Dec  4 17:42:17 unraid-server kernel: ? chksum_update+0x13/0x1e
Dec  4 17:42:17 unraid-server kernel: ? crc32c+0x2f/0x62
Dec  4 17:42:17 unraid-server kernel: ? folio_wait_bit_common+0x1ce/0x241
Dec  4 17:42:17 unraid-server kernel: ? filemap_alloc_folio+0x82/0x82
Dec  4 17:42:17 unraid-server kernel: ? folio_mkclean+0x5c/0xa1
Dec  4 17:42:17 unraid-server kernel: ? page_vma_mkclean_one.constprop.0+0x138/0x138
Dec  4 17:42:17 unraid-server kernel: ? virt_to_slab+0x5/0x19
Dec  4 17:42:17 unraid-server kernel: ? memcg_slab_free_hook+0x4b/0xf9
Dec  4 17:42:17 unraid-server kernel: ? sysvec_apic_timer_interrupt+0x92/0xa6
Dec  4 17:42:17 unraid-server kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20
Dec  4 17:42:17 unraid-server kernel: ? native_queued_spin_lock_slowpath+0x7f/0x1d0
Dec  4 17:42:17 unraid-server kernel: ? do_raw_spin_lock+0x14/0x1a
Dec  4 17:42:17 unraid-server kernel: ? __btrfs_remove_free_space_cache+0xe/0x2d
Dec  4 17:42:17 unraid-server kernel: ? load_free_space_cache+0x230/0x2dd
Dec  4 17:42:17 unraid-server kernel: ? kmem_cache_free+0xdb/0x176
Dec  4 17:42:17 unraid-server kernel: ? caching_thread+0x7e/0x43d
Dec  4 17:42:17 unraid-server kernel: ? move_linked_works+0x2f/0x6a
Dec  4 17:42:17 unraid-server kernel: ? pwq_adjust_max_active+0x88/0xa7
Dec  4 17:42:17 unraid-server kernel: ? btrfs_work_helper+0x114/0x2a5
Dec  4 17:42:17 unraid-server kernel: ? process_one_work+0x1ab/0x295
Dec  4 17:42:17 unraid-server kernel: ? worker_thread+0x18b/0x244
Dec  4 17:42:17 unraid-server kernel: ? rescuer_thread+0x281/0x281
Dec  4 17:42:17 unraid-server kernel: ? kthread+0xe7/0xef
Dec  4 17:42:17 unraid-server kernel: ? kthread_complete_and_exit+0x1b/0x1b
Dec  4 17:42:17 unraid-server kernel: ? ret_from_fork+0x22/0x30
Dec  4 17:42:17 unraid-server kernel: </TASK>
Dec  4 17:42:51 unraid-server kernel: rcu: INFO: rcu_preempt self-detected stall on CPU
Dec  4 17:42:51 unraid-server kernel: rcu:      4-....: (59999 ticks this GP) idle=07b/1/0x4000000000000000 softirq=84652/84652 fqs=14996
Dec  4 17:42:51 unraid-server kernel:   (t=60000 jiffies g=186673 q=396746 ncpus=12)
Dec  4 17:42:51 unraid-server kernel: NMI backtrace for cpu 4
Dec  4 17:42:51 unraid-server kernel: CPU: 4 PID: 27283 Comm: kworker/u64:2 Not tainted 5.19.17-Unraid #2
Dec  4 17:42:51 unraid-server kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Gaming-ITX/ac, BIOS P4.80 03/01/2022
Dec  4 17:42:51 unraid-server kernel: Workqueue: btrfs-cache btrfs_work_helper
Dec  4 17:42:51 unraid-server kernel: Call Trace:
Dec  4 17:42:51 unraid-server kernel: <IRQ>
Dec  4 17:42:51 unraid-server kernel: dump_stack_lvl+0x44/0x5c
Dec  4 17:42:51 unraid-server kernel: nmi_cpu_backtrace+0xda/0x104
Dec  4 17:42:51 unraid-server kernel: ? lapic_can_unplug_cpu+0x95/0x95
Dec  4 17:42:51 unraid-server kernel: nmi_trigger_cpumask_backtrace+0x72/0x107
Dec  4 17:42:51 unraid-server kernel: trigger_single_cpu_backtrace+0x2a/0x31
Dec  4 17:42:51 unraid-server kernel: rcu_dump_cpu_stacks+0xa0/0xd6
Dec  4 17:42:51 unraid-server kernel: rcu_sched_clock_irq+0x44b/0xaf7
Dec  4 17:42:51 unraid-server kernel: ? do_set_msr+0x12/0x12 [kvm]
Dec  4 17:42:51 unraid-server kernel: ? notifier_call_chain+0x38/0x5a
Dec  4 17:42:51 unraid-server kernel: ? timekeeping_update+0xe8/0x117
Dec  4 17:42:51 unraid-server kernel: ? tick_nohz_restart_sched_tick+0x83/0x83
Dec  4 17:42:51 unraid-server kernel: update_process_times+0x62/0x81
Dec  4 17:42:51 unraid-server kernel: tick_sched_timer+0x43/0x71
Dec  4 17:42:51 unraid-server kernel: __hrtimer_run_queues+0x110/0x1b7
Dec  4 17:42:51 unraid-server kernel: hrtimer_interrupt+0x9c/0x16e
Dec  4 17:42:51 unraid-server kernel: __sysvec_apic_timer_interrupt+0xc5/0x12f
Dec  4 17:42:51 unraid-server kernel: sysvec_apic_timer_interrupt+0x80/0xa6
Dec  4 17:42:51 unraid-server kernel: </IRQ>
Dec  4 17:42:51 unraid-server kernel: <TASK>
Dec  4 17:42:51 unraid-server kernel: asm_sysvec_apic_timer_interrupt+0x16/0x20
Dec  4 17:42:51 unraid-server kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x7f/0x1d0
Dec  4 17:42:51 unraid-server kernel: Code: 0f ba 2b 08 8b 03 0f 92 c2 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 76 0c 0f ba e0 08 72 1e c6 43 01 00 eb 18 85 c0 74 0a <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 32 01 00 00 e8 0f
Dec  4 17:42:51 unraid-server kernel: RSP: 0018:ffffc9001284fc60 EFLAGS: 00000202
Dec  4 17:42:51 unraid-server kernel: RAX: 0000000000000101 RBX: ffffc9001284fcc8 RCX: 000000000f5cb504
Dec  4 17:42:51 unraid-server kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffc9001284fcc8
Dec  4 17:42:51 unraid-server kernel: RBP: ffff88823ceb0d20 R08: ffffc9001284fcd0 R09: ffffc9001284fcd8
Dec  4 17:42:51 unraid-server kernel: R10: ffff88823ceb0d20 R11: 0000000000000324 R12: 0000000000000001
Dec  4 17:42:51 unraid-server kernel: R13: ffff88834d83abd8 R14: ffff888106fbe000 R15: 000000003f9df000
Dec  4 17:42:51 unraid-server kernel: do_raw_spin_lock+0x14/0x1a
Dec  4 17:42:51 unraid-server kernel: __btrfs_remove_free_space_cache+0xe/0x2d
Dec  4 17:42:51 unraid-server kernel: load_free_space_cache+0x230/0x2dd
Dec  4 17:42:51 unraid-server kernel: ? kmem_cache_free+0xdb/0x176
Dec  4 17:42:51 unraid-server kernel: caching_thread+0x7e/0x43d
Dec  4 17:42:51 unraid-server kernel: ? move_linked_works+0x2f/0x6a
Dec  4 17:42:51 unraid-server kernel: ? pwq_adjust_max_active+0x88/0xa7
Dec  4 17:42:51 unraid-server kernel: btrfs_work_helper+0x114/0x2a5
Dec  4 17:42:51 unraid-server kernel: process_one_work+0x1ab/0x295
Dec  4 17:42:51 unraid-server kernel: worker_thread+0x18b/0x244
Dec  4 17:42:51 unraid-server kernel: ? rescuer_thread+0x281/0x281
Dec  4 17:42:51 unraid-server kernel: kthread+0xe7/0xef
Dec  4 17:42:51 unraid-server kernel: ? kthread_complete_and_exit+0x1b/0x1b
Dec  4 17:42:51 unraid-server kernel: ret_from_fork+0x22/0x30
Dec  4 17:42:51 unraid-server kernel: </TASK>
Dec  4 17:43:04 unraid-server kernel: mdcmd (58): nocheck cancel
Dec  4 17:43:05 unraid-server  emhttpd: Spinning up all drives...
Dec  4 17:43:05 unraid-server  emhttpd: read SMART /dev/sdd
Dec  4 17:43:05 unraid-server  emhttpd: read SMART /dev/sde
Dec  4 17:43:05 unraid-server  emhttpd: read SMART /dev/sdb
Dec  4 17:43:05 unraid-server  emhttpd: read SMART /dev/sdc
Dec  4 17:43:05 unraid-server  emhttpd: read SMART /dev/nvme0n1
Dec  4 17:43:05 unraid-server  emhttpd: read SMART /dev/sda
Dec  4 17:43:05 unraid-server  emhttpd: Stopping services...
Dec  4 17:43:05 unraid-server  emhttpd: shcmd (826): /etc/rc.d/rc.libvirt stop
Dec  4 17:43:05 unraid-server root: Stopping libvirtd...
Dec  4 17:43:21 unraid-server kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 85591 jiffies s: 6361 root: 0x1/.
Dec  4 17:43:21 unraid-server kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x10/.
Dec  4 17:43:21 unraid-server kernel: Task dump for CPU 4:
Dec  4 17:43:21 unraid-server kernel: task:kworker/u64:2   state:R  running task     stack:    0 pid:27283 ppid:     2 flags:0x00004008
Dec  4 17:43:21 unraid-server kernel: Workqueue: btrfs-cache btrfs_work_helper
Dec  4 17:43:21 unraid-server kernel: Call Trace:
Dec  4 17:43:21 unraid-server kernel: <TASK>
Dec  4 17:43:21 unraid-server kernel: ? __schedule+0x59e/0x5f6
Dec  4 17:43:21 unraid-server kernel: ? chksum_update+0x13/0x1e
Dec  4 17:43:21 unraid-server kernel: ? crc32c+0x2f/0x62
Dec  4 17:43:21 unraid-server kernel: ? folio_wait_bit_common+0x1ce/0x241
Dec  4 17:43:21 unraid-server kernel: ? filemap_alloc_folio+0x82/0x82
Dec  4 17:43:21 unraid-server kernel: ? folio_mkclean+0x5c/0xa1
Dec  4 17:43:21 unraid-server kernel: ? page_vma_mkclean_one.constprop.0+0x138/0x138
Dec  4 17:43:21 unraid-server kernel: ? virt_to_slab+0x5/0x19
Dec  4 17:43:21 unraid-server kernel: ? memcg_slab_free_hook+0x4b/0xf9
Dec  4 17:43:21 unraid-server kernel: ? sysvec_apic_timer_interrupt+0x92/0xa6
Dec  4 17:43:21 unraid-server kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20
Dec  4 17:43:21 unraid-server kernel: ? native_queued_spin_lock_slowpath+0x7f/0x1d0
Dec  4 17:43:21 unraid-server kernel: ? do_raw_spin_lock+0x14/0x1a
Dec  4 17:43:21 unraid-server kernel: ? __btrfs_remove_free_space_cache+0xe/0x2d
Dec  4 17:43:21 unraid-server kernel: ? load_free_space_cache+0x230/0x2dd
Dec  4 17:43:21 unraid-server kernel: ? kmem_cache_free+0xdb/0x176
Dec  4 17:43:21 unraid-server kernel: ? caching_thread+0x7e/0x43d
Dec  4 17:43:21 unraid-server kernel: ? move_linked_works+0x2f/0x6a
Dec  4 17:43:21 unraid-server kernel: ? pwq_adjust_max_active+0x88/0xa7
Dec  4 17:43:21 unraid-server kernel: ? btrfs_work_helper+0x114/0x2a5
Dec  4 17:43:21 unraid-server kernel: ? process_one_work+0x1ab/0x295
Dec  4 17:43:21 unraid-server kernel: ? worker_thread+0x18b/0x244
Dec  4 17:43:21 unraid-server kernel: ? rescuer_thread+0x281/0x281
Dec  4 17:43:21 unraid-server kernel: ? kthread+0xe7/0xef
Dec  4 17:43:21 unraid-server kernel: ? kthread_complete_and_exit+0x1b/0x1b
Dec  4 17:43:21 unraid-server kernel: ? ret_from_fork+0x22/0x30
Dec  4 17:43:21 unraid-server kernel: </TASK>

 

syslog.txt

Link to comment

So NerdTools was a false flag. I must have overseen something. After all, I can say that the issue is either caused by the Unassigned Devices Plugin or the 2.5in USB drive I have connected to the machine. The Server works perfectly fine either if I have the Plugin not installed or the drive is disconnected. The drive is on automount so it is mounted directly after boot. Curiously, the error occurs just after I startup the array, long time after the USB Drive is mounted. I don't really know what to do. I rely on the drive for borgbackups...

Link to comment
46 minutes ago, dlandon said:

Did you install UD and not mount the drive as a test?

So thats interesting. I did install UD and disabled automount for USB drives. Manually mounting the drive after array startup (GUI button in main tab) did result in the same error as before: 1 CPU thread at 100% and no way to stop unraid except hard crash.

Link to comment
  • Solution

Recently there have been a couple of issues reported with space cache v1, I would expect if that was the problem here it also wouldn't mount manually, but it won't hurt to try, run:

 

btrfs check --clear-space-cache v1 /dev/sdX1

 

Replace X with correct letter, then try mounting with UD again.

 

 

  • Like 1
Link to comment
1 hour ago, JorgeB said:

Recently there have been a couple of issues reported with space cache v1, I would expect if that was the problem here it also wouldn't mount manually, but it won't hurt to try, run:

 

btrfs check --clear-space-cache v1 /dev/sdX1

 

Replace X with correct letter, then try mounting with UD again.

 

 

So this seems to have solved the issue. The array is running now for over an hour without issue. Thank you both @JorgeB and @dlandon. Can I buy you a coffee somewhere?

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.