omartian Posted June 8, 2021 Share Posted June 8, 2021 Hi Everyone- Woke up to day to find that my unraid server is unresponsive. I took a pic of what the server was sending to my monitor and attached the diagnostics. Last time this happened 4 days ago, it said that my btrfs 2x cache drives were unmountable and had to be reformatted. I was able to recover my cache data (plex metadata and binhex krusader docker data) and re-integrate it into my server. I ran a parity check that finished yesterday and didn't have any new sync errors. When it occured this morning, i couldn't initiate a clean shutdown bc the keyboard was unresponsive and i couldn't access the server remotely. When the server came back up, it initiated a parity check which i cancelled. This time, the cache drives are still mounted and part of the pool. In the past week, i've run memtest for 10 passes (>24 hrs) and reseated ram and sata cables. I haven't the slightest idea why this keeps happening. A little guidance would be helpful. Thank you. Server Down.zip Quote Link to comment
JorgeB Posted June 8, 2021 Share Posted June 8, 2021 Diags after rebooting are not much help for this, you can try enabling syslog mirror to flash then post that log after a crash. Quote Link to comment
omartian Posted June 8, 2021 Author Share Posted June 8, 2021 7 minutes ago, JorgeB said: Diags after rebooting are not much help for this, you can try enabling syslog mirror to flash then post that log after a crash. ok. does this option write to my usb boot drive? I added an additional usb thumb drive to my server. Local sys log is diabled, remote syslog is blank, mirror syslog to flash is set to yes. Looks ok? Quote Link to comment
omartian Posted June 8, 2021 Author Share Posted June 8, 2021 Can this be SW related. I updated to 6.9.2 last week. looks like some weird kernel error on my monitor. Quote Link to comment
omartian Posted June 9, 2021 Author Share Posted June 9, 2021 13 hours ago, JorgeB said: Diags after rebooting are not much help for this, you can try enabling syslog mirror to flash then post that log after a crash. Happened again tonight. Had to powercycle the server. This is what I see in syslog before the kernel panic: Jun 8 19:12:30 Nasgard emhttpd: spinning down /dev/sdm Jun 8 19:57:31 Nasgard emhttpd: spinning down /dev/sdm Jun 8 20:42:32 Nasgard emhttpd: spinning down /dev/sdm Jun 8 20:43:27 Nasgard kernel: rcu: INFO: rcu_sched self-detected stall on CPU Jun 8 20:43:27 Nasgard kernel: rcu: 3-....: (59999 ticks this GP) idle=0fe/1/0x4000000000000000 softirq=1334521/1334530 fqs=14397 Jun 8 20:43:27 Nasgard kernel: (t=60001 jiffies g=3697701 q=125290) Jun 8 20:43:27 Nasgard kernel: NMI backtrace for cpu 3 Jun 8 20:43:27 Nasgard kernel: CPU: 3 PID: 24985 Comm: 7 Not tainted 5.10.28-Unraid #1 Jun 8 20:43:27 Nasgard kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS PRO WIFI/B450 AORUS PRO WIFI-CF, BIOS F60e 12/09/2020 Jun 8 20:43:27 Nasgard kernel: Call Trace: Jun 8 20:43:27 Nasgard kernel: <IRQ> Jun 8 20:43:27 Nasgard kernel: dump_stack+0x6b/0x83 Jun 8 20:43:27 Nasgard kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Jun 8 20:43:27 Nasgard kernel: nmi_cpu_backtrace+0x7d/0x8f Jun 8 20:43:27 Nasgard kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Jun 8 20:43:27 Nasgard kernel: rcu_dump_cpu_stacks+0x9f/0xc6 Jun 8 20:43:27 Nasgard kernel: rcu_sched_clock_irq+0x1ec/0x543 Jun 8 20:43:27 Nasgard kernel: ? trigger_load_balance+0x5a/0x1ca Jun 8 20:43:27 Nasgard kernel: update_process_times+0x50/0x6e Jun 8 20:43:27 Nasgard kernel: tick_sched_timer+0x36/0x64 Jun 8 20:43:27 Nasgard kernel: __hrtimer_run_queues+0xb7/0x10b Jun 8 20:43:27 Nasgard kernel: ? tick_sched_do_timer+0x39/0x39 Jun 8 20:43:27 Nasgard kernel: hrtimer_interrupt+0x8d/0x15b Jun 8 20:43:27 Nasgard kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Jun 8 20:43:27 Nasgard kernel: asm_call_irq_on_stack+0x12/0x20 Jun 8 20:43:27 Nasgard kernel: </IRQ> Jun 8 20:43:27 Nasgard kernel: sysvec_apic_timer_interrupt+0x71/0x95 Jun 8 20:43:27 Nasgard kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Jun 8 20:43:27 Nasgard kernel: RIP: 0010:xas_descend+0x45/0x49 Jun 8 20:43:27 Nasgard kernel: Code: 44 c6 08 48 89 77 18 4c 89 c7 e8 77 ff ff ff 84 c0 74 13 49 c1 e8 02 44 89 c1 45 89 c0 49 83 c0 04 4e 8b 44 c6 08 41 88 49 12 <4c> 89 c0 c3 4c 8b 4f 18 48 89 fe 41 f6 c1 03 75 6b 4d 85 c9 4c 8b Jun 8 20:43:27 Nasgard kernel: RSP: 0018:ffffc90001467ba8 EFLAGS: 00000246 Jun 8 20:43:27 Nasgard kernel: RAX: 0000000000000000 RBX: ffff888106325370 RCX: 0000000000000007 Jun 8 20:43:27 Nasgard kernel: RDX: 0000000000000000 RSI: ffff888100686480 RDI: ffffea000464dd80 Jun 8 20:43:27 Nasgard kernel: RBP: 0000000000000b47 R08: ffffea000464dd80 R09: ffffc90001467bb8 Jun 8 20:43:27 Nasgard kernel: R10: ffffc90001467bb8 R11: ffff88842e8e2400 R12: ffffea000464dd80 Jun 8 20:43:27 Nasgard kernel: R13: ffff88810005c480 R14: 0000000000000b47 R15: 00000000ffffffe5 Jun 8 20:43:27 Nasgard kernel: ? xas_descend+0x2a/0x49 Jun 8 20:43:27 Nasgard kernel: xas_load+0x2d/0x39 Jun 8 20:43:27 Nasgard kernel: find_get_entry+0x57/0xba Jun 8 20:43:27 Nasgard kernel: find_lock_entry+0x15/0x4a Jun 8 20:43:27 Nasgard kernel: shmem_getpage_gfp.isra.0+0xf6/0x543 Jun 8 20:43:27 Nasgard kernel: shmem_getpage+0x12/0x14 Jun 8 20:43:27 Nasgard kernel: shmem_file_read_iter+0xaa/0x22f Jun 8 20:43:27 Nasgard kernel: generic_file_splice_read+0xf0/0x15e Jun 8 20:43:27 Nasgard kernel: splice_direct_to_actor+0xe4/0x1cd Jun 8 20:43:27 Nasgard kernel: ? generic_file_splice_read+0x15e/0x15e Jun 8 20:43:27 Nasgard kernel: do_splice_direct+0x94/0xbd Jun 8 20:43:27 Nasgard kernel: do_sendfile+0x185/0x24f Jun 8 20:43:27 Nasgard kernel: __do_sys_sendfile64+0x81/0xa7 Jun 8 20:43:27 Nasgard kernel: do_syscall_64+0x5d/0x6a Jun 8 20:43:27 Nasgard kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 8 20:43:27 Nasgard kernel: RIP: 0033:0x951dca Full syslog attached. syslog Quote Link to comment
SpeedyRS2 Posted June 9, 2021 Share Posted June 9, 2021 I had the same problems, since I upgraded from 6.8.3 to 6.9.2. Permanent crashes (every day) . Syslog was not helpful. Yesterday I downgraded back to 6.8.3. I need Unraid and Docker for my smarthome. Stability is very important here. Quote Link to comment
omartian Posted June 9, 2021 Author Share Posted June 9, 2021 18 minutes ago, SpeedyRS2 said: I had the same problems, since I upgraded from 6.8.3 to 6.9.2. Permanent crashes (every day) . Syslog was not helpful. Yesterday I downgraded back to 6.8.3. I need Unraid and Docker for my smarthome. Stability is very important here. Just downgraded to 6.8.3. It unassigned my cache drives oddly enough, but they're back up now. After a bit of snooping, seeing similar kernel errors on 6.9.2. Im running a parity check now and will see how it goes. Let me know if you find an update that works for you. Quote Link to comment
SpeedyRS2 Posted June 9, 2021 Share Posted June 9, 2021 18 minutes ago, omartian said: Just downgraded to 6.8.3. It unassigned my cache drives oddly enough, but they're back up now. After a bit of snooping, seeing similar kernel errors on 6.9.2. Im running a parity check now and will see how it goes. Let me know if you find an update that works for you. My Cashdrive was also not mounted. All my dockers were gone. at that moment I almost had a heart attack. But then I realized, that the cachdrive was unassigned. After assining it, everything was fine. In any case, I will not make any more attempts to update the system at the moment. The stability is too important for me. Quote Link to comment
JorgeB Posted June 9, 2021 Share Posted June 9, 2021 Logged call traces look more like hardware related, not the macvlan/nf_nat 6.9.x related crashes. Quote Link to comment
omartian Posted June 9, 2021 Author Share Posted June 9, 2021 CPU issue? How to determine what hardware. Quote Link to comment
JorgeB Posted June 9, 2021 Share Posted June 9, 2021 1 hour ago, omartian said: CPU issue? How to determine what hardware. Difficult to say without starting swapping things around, I would start with board/CPU. Quote Link to comment
Staite Posted June 10, 2021 Share Posted June 10, 2021 I had similar crashes on my new Server, one after six days than another after nine days running. Disabled all my d ockers that used br0 net, now running since almost 17 days without crash. Quote Link to comment
JorgeB Posted June 10, 2021 Share Posted June 10, 2021 1 hour ago, Staite said: I had similar crashes Those are not similar, they mention nf_conntrack and those are typical of the macvaln related crashes with v6.9.2 Quote Link to comment
Staite Posted June 10, 2021 Share Posted June 10, 2021 Thanks for clarifying that. I'm pretty new to crashes that not show an blue screen. Quote Link to comment
SpeedyRS2 Posted June 11, 2021 Share Posted June 11, 2021 16 hours ago, Staite said: I had similar crashes on my new Server, one after six days than another after nine days running. Disabled all my d ockers that used br0 net, now running since almost 17 days without crash. I have similar experiences. Had the macvaln crashes with the "CALL TRACE" in the syslog. crashes every day since I was an 6.9.2 (coming direct von 6.8.3) . Revmoved all fixed assigned IPs from Docker container in the br0 network. Then no crash for over a week. But suddenly the crashes were back. Every day. I had no idea what the reason was. No more CALL TRACE in the log. That was the time, where I downgraded back to 6.8.3. Since that downgrade unraid is stable again. So I don`t think, that theses crashes are Hardware related. 1 Quote Link to comment
omartian Posted June 11, 2021 Author Share Posted June 11, 2021 57 minutes ago, SpeedyRS2 said: I have similar experiences. Had the macvaln crashes with the "CALL TRACE" in the syslog. crashes every day since I was an 6.9.2 (coming direct von 6.8.3) . Revmoved all fixed assigned IPs from Docker container in the br0 network. Then no crash for over a week. But suddenly the crashes were back. Every day. I had no idea what the reason was. No more CALL TRACE in the log. That was the time, where I downgraded back to 6.8.3. Since that downgrade unraid is stable again. So I don`t think, that theses crashes are Hardware related. I'm hoping that's the case for me. Currently running a parity check with zero errors and that the system has not crashed in the last 36 hours. Fingers crossed. Quote Link to comment
omartian Posted June 15, 2021 Author Share Posted June 15, 2021 On 6/11/2021 at 1:47 AM, SpeedyRS2 said: I have similar experiences. Had the macvaln crashes with the "CALL TRACE" in the syslog. crashes every day since I was an 6.9.2 (coming direct von 6.8.3) . Revmoved all fixed assigned IPs from Docker container in the br0 network. Then no crash for over a week. But suddenly the crashes were back. Every day. I had no idea what the reason was. No more CALL TRACE in the log. That was the time, where I downgraded back to 6.8.3. Since that downgrade unraid is stable again. So I don`t think, that theses crashes are Hardware related. Server has been up for over 4 days w/o crashing. Hoping it's the latest update. Let me know if you have success on a future update. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.