sv3ndev Posted May 16, 2020 Posted May 16, 2020 Hello, I have been using UnRaid for over 4 years now and have been very satisfied with it. However, in the past 2 months, I have encountered a problem with my server that makes it almost unusable. This problem also existed with Unraid 6.8rc4, I updated to 6.9.0-beta1 in the hopes that this would fix the problem. The problem is that after booting the server, after 8-72 hours, the server will become unresponsive. Ping attempts are unsuccessful. The machine itself is still running, but I cannot interact with UnRaid in any way. This means that once a day, I have to force restart the server, which is not a viable option for the future. I've looked through the syslog and it appears that whenever this happens, the following output is logged every 3 minutes: May 16 00:54:29 NAS kernel: rcu: INFO: rcu_sched self-detected stall on CPU May 16 00:54:29 NAS kernel: rcu: 6-....: (599938 ticks this GP) idle=6ee/1/0x4000000000000002 softirq=92762467/92762467 fqs=149960 May 16 00:54:29 NAS kernel: (t=600010 jiffies g=105284805 q=28641) May 16 00:54:29 NAS kernel: NMI backtrace for cpu 6 May 16 00:54:29 NAS kernel: CPU: 6 PID: 3531 Comm: du Tainted: G D 5.5.8-Unraid #1 May 16 00:54:29 NAS kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C2750D4I, BIOS P2.90 01/26/2016 May 16 00:54:29 NAS kernel: Call Trace: May 16 00:54:29 NAS kernel: <IRQ> May 16 00:54:29 NAS kernel: dump_stack+0x64/0x7c May 16 00:54:29 NAS kernel: ? lapic_can_unplug_cpu+0x8e/0x8e May 16 00:54:29 NAS kernel: nmi_cpu_backtrace+0x73/0x85 May 16 00:54:29 NAS kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 May 16 00:54:29 NAS kernel: rcu_dump_cpu_stacks+0x89/0xb0 May 16 00:54:29 NAS kernel: rcu_sched_clock_irq+0x1e4/0x513 May 16 00:54:29 NAS kernel: update_process_times+0x1f/0x3d May 16 00:54:29 NAS kernel: tick_sched_timer+0x33/0x62 May 16 00:54:29 NAS kernel: __hrtimer_run_queues+0xb7/0x10b May 16 00:54:29 NAS kernel: ? tick_sched_do_timer+0x39/0x39 May 16 00:54:29 NAS kernel: hrtimer_interrupt+0x8d/0x160 May 16 00:54:29 NAS kernel: smp_apic_timer_interrupt+0x6a/0x7a May 16 00:54:29 NAS kernel: apic_timer_interrupt+0xf/0x20 May 16 00:54:29 NAS kernel: </IRQ> May 16 00:54:29 NAS kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x9b/0x1f2 May 16 00:54:29 NAS kernel: Code: b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 89 44 24 04 74 0c 0f ba e0 08 72 1e c6 47 01 00 eb 18 85 c0 74 0a 8b 07 <84> c0 74 04 f3 90 eb f6 66 c7 07 01 00 e9 2b 01 00 00 48 c7 c0 00 May 16 00:54:29 NAS kernel: RSP: 0018:ffffc9000db2fe20 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 May 16 00:54:29 NAS kernel: RAX: 0000000000180101 RBX: ffff8880ad4abcc0 RCX: 000000000000001d May 16 00:54:29 NAS kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881e6db5c00 May 16 00:54:29 NAS kernel: RBP: ffff8880ad4ab900 R08: 0000000000000000 R09: 0000000000000000 May 16 00:54:29 NAS kernel: R10: ffff8880955dd500 R11: ffff888050864b10 R12: ffff8880955dd7e8 May 16 00:54:29 NAS kernel: R13: 000000000000001d R14: 0000000000038800 R15: ffff8881e6db5c00 May 16 00:54:29 NAS kernel: queued_spin_lock_slowpath+0x7/0xa May 16 00:54:29 NAS kernel: do_raw_spin_lock+0x38/0x52 May 16 00:54:29 NAS kernel: fuse_prepare_release+0x63/0xd2 May 16 00:54:29 NAS kernel: fuse_release_common+0x32/0x83 May 16 00:54:29 NAS kernel: fuse_dir_release+0xd/0x10 May 16 00:54:29 NAS kernel: __fput+0x108/0x1d1 May 16 00:54:29 NAS kernel: task_work_run+0x77/0x88 May 16 00:54:29 NAS kernel: prepare_exit_to_usermode+0xa6/0x126 May 16 00:54:29 NAS kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 May 16 00:54:29 NAS kernel: RIP: 0033:0x15493e186fb3 May 16 00:54:29 NAS kernel: Code: e9 47 ff ff ff b8 ff ff ff ff e9 3d ff ff ff 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 May 16 00:54:29 NAS kernel: RSP: 002b:00007ffc8c6bf508 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 May 16 00:54:29 NAS kernel: RAX: 0000000000000000 RBX: 0000000000433f20 RCX: 000015493e186fb3 May 16 00:54:29 NAS kernel: RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008 May 16 00:54:29 NAS kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000006 May 16 00:54:29 NAS kernel: R10: 000000001ea355fb R11: 0000000000000246 R12: 0000000000000005 May 16 00:54:29 NAS kernel: R13: 0000000000444830 R14: 0000000000433f20 R15: 0000000000000005 I've also attached the full syslog in case anyone wants to read through it. I would very much appreciate any help you could give me! I am using this server for my company and am desperate for a fix. Thank you! syslog Quote
trurl Posted May 16, 2020 Posted May 16, 2020 Go to Tools-diagnostics and attach the complete Diagnostics ZIP file to your NEXT post. Quote
sv3ndev Posted May 16, 2020 Author Posted May 16, 2020 Thanks for your response. I looked at that but the problem is that it seems that the logs in the diagnostics zip are stored in RAM. So, when I force restart it, everything gets wiped and the logs are completely empty. Do you know how I can prevent them from getting erased and store them somehow? The only way I got my syslog was by enabling logging to flash. Here's an old diagnostics zip I downloaded immediately after the first time I encountered the problem. Hopefully it'll be of use. nas-diagnostics-20200429-1020.zip Quote
trurl Posted May 16, 2020 Posted May 16, 2020 12 minutes ago, sv3ndev said: Do you know how I can prevent them from getting erased and store them somehow? Quote
sv3ndev Posted May 16, 2020 Author Posted May 16, 2020 Thanks, but I'm a bit confused. That article is about the syslog server, and I did enable that - but all it gives me is the syslog, which I posted. I've looked through the diagnostics.zip and it appears as if most of them are basic info about the config, so they wouldn't really change, and I attached the full syslog in my earlier post. I'm really sorry, but I'm new to UnRaid debugging because it's always worked so well - so I apologize for these questions! But I think that I uploaded everything relevant now, right? I'm going to reboot into safe mode just to eliminate any possibility of plugin problems, as some of them are deprecated. Unfortunately, because of the nature of the problem, I won't really know if it works until next week. Nevertheless, thank you so much for your help! Quote
trurl Posted May 16, 2020 Posted May 16, 2020 4 minutes ago, sv3ndev said: I've looked through the diagnostics.zip and it appears as if most of them are basic info about the config, so they wouldn't really change Not a question of whether they changed. Just trying to get a more complete picture. You didn't tell us anything really about your hardware, and rather than pulling teeth to get that information, diagnostics makes it easier for everyone. And other things a user can do to their configuration can lead to problems. Many things can get diagnosed by looking at things other than syslog, If you had been running Ryzen, for example, I would have had other suggestions for that since there are some tweaks that can help with those CPUs and crashes. 3 minutes ago, sv3ndev said: I uploaded everything relevant now, right? yes Quote
sv3ndev Posted May 16, 2020 Author Posted May 16, 2020 23 minutes ago, trurl said: Not a question of whether they changed. Just trying to get a more complete picture. You didn't tell us anything really about your hardware, and rather than pulling teeth to get that information, diagnostics makes it easier for everyone. And other things a user can do to their configuration can lead to problems. Many things can get diagnosed by looking at things other than syslog Ah okay, thanks! The thing is that this system has been running perfectly nonstop for 4 years, and the last thing I changed was a VM I added 2 months ago, long before any problems started. So, I'd be surprised if it's a configuration problem because I wouldn't expect those to just suddenly occur. That's why I'm really confused - I mainly work in IT Security so dealing with things like this isn't my specialty, and because I didn't change anything I don't really have a lead on where to begin debugging. Quote
sv3ndev Posted May 18, 2020 Author Posted May 18, 2020 As an update for anyone with a similar problem, I've been running the server in Safe Mode with GUI enabled for 2 days now, and so far without a problem. It seems as if the problem was most likely a deprecated or otherwise malfunctioning plugin. Quote
Squid Posted May 18, 2020 Posted May 18, 2020 34 minutes ago, sv3ndev said: It seems as if the problem was most likely a deprecated or otherwise malfunctioning plugin. The diagnostics didn't cover enough time for FCP to have listed things, but it would have been telling you Resilio.plg - 2016.09.17.1 --- Unknown and shouldn't be installed (In truth though this is a PhAzE plugin which has been removed from CA because it's not supported, not updated, and very likely will cause problems) newransomware.bait.plg - 2018.07.02 - Deprecated ca.backup.plg - 2017.10.28 - Deprecated - Known to cause issues under certain circumstances and replaced by a v2 version dynamix.cache.dirs.plg - 2018.12.04 - Way way way out of date Quote
sv3ndev Posted May 18, 2020 Author Posted May 18, 2020 13 minutes ago, Squid said: The diagnostics didn't cover enough time for FCP to have listed things, but it would have been telling you Resilio.plg - 2016.09.17.1 --- Unknown and shouldn't be installed (In truth though this is a PhAzE plugin which has been removed from CA because it's not supported, not updated, and very likely will cause problems) newransomware.bait.plg - 2018.07.02 - Deprecated ca.backup.plg - 2017.10.28 - Deprecated - Known to cause issues under certain circumstances and replaced by a v2 version dynamix.cache.dirs.plg - 2018.12.04 - Way way way out of date Thanks! I have to admit, I've been using UnRaid mainly as a NAS in a "set it and forget it" way - and I definitely have some maintenance work to do. I'll leave it running in safe mode a while longer to make sure it's really stable now, and then I'll begin reactivating the plugins that are still being maintained. I'll definitely remove those plugins though, they're definitely more trouble than they're worth. Quote
trurl Posted May 18, 2020 Posted May 18, 2020 1 hour ago, sv3ndev said: "set it and forget it" Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Quote
sv3ndev Posted May 18, 2020 Author Posted May 18, 2020 (edited) 30 minutes ago, trurl said: Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Yes, I have email notifications configured - I used to get them all the time because my drives were getting a bit warm, but since fixing that I barely have any problems. It never warned me of any impending problem, the cpu stall warnings would just suddenly appear in the syslog, nothing else. Edited May 18, 2020 by sv3ndev Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.