Tuetenk0pp Posted December 3, 2022 Share Posted December 3, 2022 I have a very strange issue where Unraid becomes unresponsive after array startup. I was able to get Diagnostics directly after boot before I started the Array. I was also able to update all Plugins and Unraid itself before I started up the Array. After that, I am not able to interact with Unraid via the webUI nor the webTerminal nor an SSH session. If I am already logged in via SSH when I start the Array, I can issue simple commands like ``htop`` but the ``diagnostics`` command or the ``reboot`` command don't work. Everything I can do is to pull the plug and crash the System. When I start the System again, the problem reoccurs. To answer the question: What started this behavior? I had an issue with a Windows VM and changed some VM configurations multiple times. That's when the System came unresponsive the first time. Note: Docker and VM Engines are not startet with the array, I start them manually afterwards. I am fairly confident that it's not a hardware problem: - I ran memtest - I successfully bootet from a Windows drive and ran - Cinebench - Furmark I am close to a point where I would recreate the Unraid USB Stick... Is there hope for another solution? unraid-server-diagnostics-20221203-1933.zip Quote Link to comment
JorgeB Posted December 4, 2022 Share Posted December 4, 2022 Using the console get the syslog after array start: cp /var/log/syslog /boot/syslog.txt Then attach it here. Quote Link to comment
Tuetenk0pp Posted December 4, 2022 Author Share Posted December 4, 2022 Hi @JorgeB, thanks for your reply. Here is the requested syslog. I hope it helps. I also attached a screenshot of htop: high cpu usage on one thread. syslog.txt Quote Link to comment
JorgeB Posted December 4, 2022 Share Posted December 4, 2022 Nothing obvious that I can see, try again after booting in safe mode. Unrelated but disk 1 shows some corruption detected, once this is fixed you should run a scrub. Quote Link to comment
Tuetenk0pp Posted December 4, 2022 Author Share Posted December 4, 2022 Quote Unrelated but disk 1 shows some corruption detected, once this is fixed you should run a scrub. Yeah I noticed that as well. I also need to run a parity check due to all those unclean shutdowns. Quote Nothing obvious that I can see, try again after booting in safe mode. I just tried that and had success in doing so. Now I need to find out which plugin caused the issue... If you have any suspicion let me know. Here's a list of the plugins I use: CA Auto Update Applications, ver 2021.09.24 Community Applications, ver 2022.12.03 Dynamix System Information, ver 2020.06.21 Dynamix System Temperature, ver 2022.09.16b Fix Common Problems, ver 2022.11.18 My Servers, ver 2022.11.29.0742 NerdTools, ver 2022.11.01 rclone, ver 2022.09.02 Tips and Tweaks, ver 2022.11.18 Unassigned Devices, ver 2022.11.12a Unassigned Devices Preclear, ver 2022.09.02 User Scripts, ver 2022.08.01 Virtual Machine Wake On Lan, ver 2020.04.29 of which these are critical to me: Community Applications My Servers NerdTools rclone Unassigned Devices User Scripts Virtual Machine Wake On Lan Quote Link to comment
Tuetenk0pp Posted December 4, 2022 Author Share Posted December 4, 2022 So I slowly uninstalled and reinstalled the plugins and finally found a probable cause of my problems. Just after I reinstalled NerdTools with the packages borgbackup-1.1.15-x86_64-1 llfuse-1.4.2-x86_64-1_SBo python-setuptools-57.5.0-x86_64-2 python3-3.9.10-x86_64-1 and started the array, the server became unresponsive again. I then tried to stop the array and found these messages in the Syslog (full syslog attached): Dec 4 17:42:17 unraid-server kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 21079 jiffies s: 6361 root: 0x1/. Dec 4 17:42:17 unraid-server kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x10/. Dec 4 17:42:17 unraid-server kernel: Task dump for CPU 4: Dec 4 17:42:17 unraid-server kernel: task:kworker/u64:2 state:R running task stack: 0 pid:27283 ppid: 2 flags:0x00004008 Dec 4 17:42:17 unraid-server kernel: Workqueue: btrfs-cache btrfs_work_helper Dec 4 17:42:17 unraid-server kernel: Call Trace: Dec 4 17:42:17 unraid-server kernel: <TASK> Dec 4 17:42:17 unraid-server kernel: ? __schedule+0x59e/0x5f6 Dec 4 17:42:17 unraid-server kernel: ? chksum_update+0x13/0x1e Dec 4 17:42:17 unraid-server kernel: ? crc32c+0x2f/0x62 Dec 4 17:42:17 unraid-server kernel: ? folio_wait_bit_common+0x1ce/0x241 Dec 4 17:42:17 unraid-server kernel: ? filemap_alloc_folio+0x82/0x82 Dec 4 17:42:17 unraid-server kernel: ? folio_mkclean+0x5c/0xa1 Dec 4 17:42:17 unraid-server kernel: ? page_vma_mkclean_one.constprop.0+0x138/0x138 Dec 4 17:42:17 unraid-server kernel: ? virt_to_slab+0x5/0x19 Dec 4 17:42:17 unraid-server kernel: ? memcg_slab_free_hook+0x4b/0xf9 Dec 4 17:42:17 unraid-server kernel: ? sysvec_apic_timer_interrupt+0x92/0xa6 Dec 4 17:42:17 unraid-server kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20 Dec 4 17:42:17 unraid-server kernel: ? native_queued_spin_lock_slowpath+0x7f/0x1d0 Dec 4 17:42:17 unraid-server kernel: ? do_raw_spin_lock+0x14/0x1a Dec 4 17:42:17 unraid-server kernel: ? __btrfs_remove_free_space_cache+0xe/0x2d Dec 4 17:42:17 unraid-server kernel: ? load_free_space_cache+0x230/0x2dd Dec 4 17:42:17 unraid-server kernel: ? kmem_cache_free+0xdb/0x176 Dec 4 17:42:17 unraid-server kernel: ? caching_thread+0x7e/0x43d Dec 4 17:42:17 unraid-server kernel: ? move_linked_works+0x2f/0x6a Dec 4 17:42:17 unraid-server kernel: ? pwq_adjust_max_active+0x88/0xa7 Dec 4 17:42:17 unraid-server kernel: ? btrfs_work_helper+0x114/0x2a5 Dec 4 17:42:17 unraid-server kernel: ? process_one_work+0x1ab/0x295 Dec 4 17:42:17 unraid-server kernel: ? worker_thread+0x18b/0x244 Dec 4 17:42:17 unraid-server kernel: ? rescuer_thread+0x281/0x281 Dec 4 17:42:17 unraid-server kernel: ? kthread+0xe7/0xef Dec 4 17:42:17 unraid-server kernel: ? kthread_complete_and_exit+0x1b/0x1b Dec 4 17:42:17 unraid-server kernel: ? ret_from_fork+0x22/0x30 Dec 4 17:42:17 unraid-server kernel: </TASK> Dec 4 17:42:51 unraid-server kernel: rcu: INFO: rcu_preempt self-detected stall on CPU Dec 4 17:42:51 unraid-server kernel: rcu: 4-....: (59999 ticks this GP) idle=07b/1/0x4000000000000000 softirq=84652/84652 fqs=14996 Dec 4 17:42:51 unraid-server kernel: (t=60000 jiffies g=186673 q=396746 ncpus=12) Dec 4 17:42:51 unraid-server kernel: NMI backtrace for cpu 4 Dec 4 17:42:51 unraid-server kernel: CPU: 4 PID: 27283 Comm: kworker/u64:2 Not tainted 5.19.17-Unraid #2 Dec 4 17:42:51 unraid-server kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Gaming-ITX/ac, BIOS P4.80 03/01/2022 Dec 4 17:42:51 unraid-server kernel: Workqueue: btrfs-cache btrfs_work_helper Dec 4 17:42:51 unraid-server kernel: Call Trace: Dec 4 17:42:51 unraid-server kernel: <IRQ> Dec 4 17:42:51 unraid-server kernel: dump_stack_lvl+0x44/0x5c Dec 4 17:42:51 unraid-server kernel: nmi_cpu_backtrace+0xda/0x104 Dec 4 17:42:51 unraid-server kernel: ? lapic_can_unplug_cpu+0x95/0x95 Dec 4 17:42:51 unraid-server kernel: nmi_trigger_cpumask_backtrace+0x72/0x107 Dec 4 17:42:51 unraid-server kernel: trigger_single_cpu_backtrace+0x2a/0x31 Dec 4 17:42:51 unraid-server kernel: rcu_dump_cpu_stacks+0xa0/0xd6 Dec 4 17:42:51 unraid-server kernel: rcu_sched_clock_irq+0x44b/0xaf7 Dec 4 17:42:51 unraid-server kernel: ? do_set_msr+0x12/0x12 [kvm] Dec 4 17:42:51 unraid-server kernel: ? notifier_call_chain+0x38/0x5a Dec 4 17:42:51 unraid-server kernel: ? timekeeping_update+0xe8/0x117 Dec 4 17:42:51 unraid-server kernel: ? tick_nohz_restart_sched_tick+0x83/0x83 Dec 4 17:42:51 unraid-server kernel: update_process_times+0x62/0x81 Dec 4 17:42:51 unraid-server kernel: tick_sched_timer+0x43/0x71 Dec 4 17:42:51 unraid-server kernel: __hrtimer_run_queues+0x110/0x1b7 Dec 4 17:42:51 unraid-server kernel: hrtimer_interrupt+0x9c/0x16e Dec 4 17:42:51 unraid-server kernel: __sysvec_apic_timer_interrupt+0xc5/0x12f Dec 4 17:42:51 unraid-server kernel: sysvec_apic_timer_interrupt+0x80/0xa6 Dec 4 17:42:51 unraid-server kernel: </IRQ> Dec 4 17:42:51 unraid-server kernel: <TASK> Dec 4 17:42:51 unraid-server kernel: asm_sysvec_apic_timer_interrupt+0x16/0x20 Dec 4 17:42:51 unraid-server kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x7f/0x1d0 Dec 4 17:42:51 unraid-server kernel: Code: 0f ba 2b 08 8b 03 0f 92 c2 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 76 0c 0f ba e0 08 72 1e c6 43 01 00 eb 18 85 c0 74 0a <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 32 01 00 00 e8 0f Dec 4 17:42:51 unraid-server kernel: RSP: 0018:ffffc9001284fc60 EFLAGS: 00000202 Dec 4 17:42:51 unraid-server kernel: RAX: 0000000000000101 RBX: ffffc9001284fcc8 RCX: 000000000f5cb504 Dec 4 17:42:51 unraid-server kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffc9001284fcc8 Dec 4 17:42:51 unraid-server kernel: RBP: ffff88823ceb0d20 R08: ffffc9001284fcd0 R09: ffffc9001284fcd8 Dec 4 17:42:51 unraid-server kernel: R10: ffff88823ceb0d20 R11: 0000000000000324 R12: 0000000000000001 Dec 4 17:42:51 unraid-server kernel: R13: ffff88834d83abd8 R14: ffff888106fbe000 R15: 000000003f9df000 Dec 4 17:42:51 unraid-server kernel: do_raw_spin_lock+0x14/0x1a Dec 4 17:42:51 unraid-server kernel: __btrfs_remove_free_space_cache+0xe/0x2d Dec 4 17:42:51 unraid-server kernel: load_free_space_cache+0x230/0x2dd Dec 4 17:42:51 unraid-server kernel: ? kmem_cache_free+0xdb/0x176 Dec 4 17:42:51 unraid-server kernel: caching_thread+0x7e/0x43d Dec 4 17:42:51 unraid-server kernel: ? move_linked_works+0x2f/0x6a Dec 4 17:42:51 unraid-server kernel: ? pwq_adjust_max_active+0x88/0xa7 Dec 4 17:42:51 unraid-server kernel: btrfs_work_helper+0x114/0x2a5 Dec 4 17:42:51 unraid-server kernel: process_one_work+0x1ab/0x295 Dec 4 17:42:51 unraid-server kernel: worker_thread+0x18b/0x244 Dec 4 17:42:51 unraid-server kernel: ? rescuer_thread+0x281/0x281 Dec 4 17:42:51 unraid-server kernel: kthread+0xe7/0xef Dec 4 17:42:51 unraid-server kernel: ? kthread_complete_and_exit+0x1b/0x1b Dec 4 17:42:51 unraid-server kernel: ret_from_fork+0x22/0x30 Dec 4 17:42:51 unraid-server kernel: </TASK> Dec 4 17:43:04 unraid-server kernel: mdcmd (58): nocheck cancel Dec 4 17:43:05 unraid-server emhttpd: Spinning up all drives... Dec 4 17:43:05 unraid-server emhttpd: read SMART /dev/sdd Dec 4 17:43:05 unraid-server emhttpd: read SMART /dev/sde Dec 4 17:43:05 unraid-server emhttpd: read SMART /dev/sdb Dec 4 17:43:05 unraid-server emhttpd: read SMART /dev/sdc Dec 4 17:43:05 unraid-server emhttpd: read SMART /dev/nvme0n1 Dec 4 17:43:05 unraid-server emhttpd: read SMART /dev/sda Dec 4 17:43:05 unraid-server emhttpd: Stopping services... Dec 4 17:43:05 unraid-server emhttpd: shcmd (826): /etc/rc.d/rc.libvirt stop Dec 4 17:43:05 unraid-server root: Stopping libvirtd... Dec 4 17:43:21 unraid-server kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 4-... } 85591 jiffies s: 6361 root: 0x1/. Dec 4 17:43:21 unraid-server kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x10/. Dec 4 17:43:21 unraid-server kernel: Task dump for CPU 4: Dec 4 17:43:21 unraid-server kernel: task:kworker/u64:2 state:R running task stack: 0 pid:27283 ppid: 2 flags:0x00004008 Dec 4 17:43:21 unraid-server kernel: Workqueue: btrfs-cache btrfs_work_helper Dec 4 17:43:21 unraid-server kernel: Call Trace: Dec 4 17:43:21 unraid-server kernel: <TASK> Dec 4 17:43:21 unraid-server kernel: ? __schedule+0x59e/0x5f6 Dec 4 17:43:21 unraid-server kernel: ? chksum_update+0x13/0x1e Dec 4 17:43:21 unraid-server kernel: ? crc32c+0x2f/0x62 Dec 4 17:43:21 unraid-server kernel: ? folio_wait_bit_common+0x1ce/0x241 Dec 4 17:43:21 unraid-server kernel: ? filemap_alloc_folio+0x82/0x82 Dec 4 17:43:21 unraid-server kernel: ? folio_mkclean+0x5c/0xa1 Dec 4 17:43:21 unraid-server kernel: ? page_vma_mkclean_one.constprop.0+0x138/0x138 Dec 4 17:43:21 unraid-server kernel: ? virt_to_slab+0x5/0x19 Dec 4 17:43:21 unraid-server kernel: ? memcg_slab_free_hook+0x4b/0xf9 Dec 4 17:43:21 unraid-server kernel: ? sysvec_apic_timer_interrupt+0x92/0xa6 Dec 4 17:43:21 unraid-server kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20 Dec 4 17:43:21 unraid-server kernel: ? native_queued_spin_lock_slowpath+0x7f/0x1d0 Dec 4 17:43:21 unraid-server kernel: ? do_raw_spin_lock+0x14/0x1a Dec 4 17:43:21 unraid-server kernel: ? __btrfs_remove_free_space_cache+0xe/0x2d Dec 4 17:43:21 unraid-server kernel: ? load_free_space_cache+0x230/0x2dd Dec 4 17:43:21 unraid-server kernel: ? kmem_cache_free+0xdb/0x176 Dec 4 17:43:21 unraid-server kernel: ? caching_thread+0x7e/0x43d Dec 4 17:43:21 unraid-server kernel: ? move_linked_works+0x2f/0x6a Dec 4 17:43:21 unraid-server kernel: ? pwq_adjust_max_active+0x88/0xa7 Dec 4 17:43:21 unraid-server kernel: ? btrfs_work_helper+0x114/0x2a5 Dec 4 17:43:21 unraid-server kernel: ? process_one_work+0x1ab/0x295 Dec 4 17:43:21 unraid-server kernel: ? worker_thread+0x18b/0x244 Dec 4 17:43:21 unraid-server kernel: ? rescuer_thread+0x281/0x281 Dec 4 17:43:21 unraid-server kernel: ? kthread+0xe7/0xef Dec 4 17:43:21 unraid-server kernel: ? kthread_complete_and_exit+0x1b/0x1b Dec 4 17:43:21 unraid-server kernel: ? ret_from_fork+0x22/0x30 Dec 4 17:43:21 unraid-server kernel: </TASK> syslog.txt Quote Link to comment
JorgeB Posted December 5, 2022 Share Posted December 5, 2022 Test those packages one at a time and if you find the culprit report it in the Nerdtools support thread. Quote Link to comment
Tuetenk0pp Posted December 5, 2022 Author Share Posted December 5, 2022 So NerdTools was a false flag. I must have overseen something. After all, I can say that the issue is either caused by the Unassigned Devices Plugin or the 2.5in USB drive I have connected to the machine. The Server works perfectly fine either if I have the Plugin not installed or the drive is disconnected. The drive is on automount so it is mounted directly after boot. Curiously, the error occurs just after I startup the array, long time after the USB Drive is mounted. I don't really know what to do. I rely on the drive for borgbackups... Quote Link to comment
Tuetenk0pp Posted December 5, 2022 Author Share Posted December 5, 2022 I just tested with this and had no problems mount -t btrfs /dev/sda1 /mnt/usb/test Quote Link to comment
dlandon Posted December 5, 2022 Share Posted December 5, 2022 Did you install UD and not mount the drive as a test? UD queries the drive after it is mounted to get stats and that might be the issue. Manually mounting the drive is fine, but UD wouldn't be running to query the disk. Quote Link to comment
Tuetenk0pp Posted December 5, 2022 Author Share Posted December 5, 2022 46 minutes ago, dlandon said: Did you install UD and not mount the drive as a test? So thats interesting. I did install UD and disabled automount for USB drives. Manually mounting the drive after array startup (GUI button in main tab) did result in the same error as before: 1 CPU thread at 100% and no way to stop unraid except hard crash. Quote Link to comment
dlandon Posted December 5, 2022 Share Posted December 5, 2022 Set up the syslog server to save the syslog, reboot the server with the drive set to not auto mount, then go to UD and mount the drive. Post the syslog. I think your drive is hanging UD with something and the syslog should show what is hanging. An alternative is to try another drive. Quote Link to comment
Tuetenk0pp Posted December 6, 2022 Author Share Posted December 6, 2022 I accidently uploaded the wrong syslog. Here's the right one. I also ran a short SMART test against the drive since the syslog says: ``has skinny extents`` WD_Elements_10B8_SMART_report.txt syslog Quote Link to comment
Solution JorgeB Posted December 6, 2022 Solution Share Posted December 6, 2022 Recently there have been a couple of issues reported with space cache v1, I would expect if that was the problem here it also wouldn't mount manually, but it won't hurt to try, run: btrfs check --clear-space-cache v1 /dev/sdX1 Replace X with correct letter, then try mounting with UD again. 1 Quote Link to comment
Tuetenk0pp Posted December 6, 2022 Author Share Posted December 6, 2022 1 hour ago, JorgeB said: Recently there have been a couple of issues reported with space cache v1, I would expect if that was the problem here it also wouldn't mount manually, but it won't hurt to try, run: btrfs check --clear-space-cache v1 /dev/sdX1 Replace X with correct letter, then try mounting with UD again. So this seems to have solved the issue. The array is running now for over an hour without issue. Thank you both @JorgeB and @dlandon. Can I buy you a coffee somewhere? 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.