Unraid Server Crashing 2x this week

omartian · June 8, 2021

Hi Everyone-

Woke up to day to find that my unraid server is unresponsive. I took a pic of what the server was sending to my monitor and attached the diagnostics.

Last time this happened 4 days ago, it said that my btrfs 2x cache drives were unmountable and had to be reformatted. I was able to recover my cache data (plex metadata and binhex krusader docker data) and re-integrate it into my server. I ran a parity check that finished yesterday and didn't have any new sync errors.

When it occured this morning, i couldn't initiate a clean shutdown bc the keyboard was unresponsive and i couldn't access the server remotely. When the server came back up, it initiated a parity check which i cancelled. This time, the cache drives are still mounted and part of the pool.

In the past week, i've run memtest for 10 passes (>24 hrs) and reseated ram and sata cables. I haven't the slightest idea why this keeps happening. A little guidance would be helpful. Thank you.

Server Down.zip

JorgeB · June 8, 2021

Diags after rebooting are not much help for this, you can try enabling syslog mirror to flash then post that log after a crash.

omartian · June 8, 2021

7 minutes ago, JorgeB said:

Diags after rebooting are not much help for this, you can try enabling syslog mirror to flash then post that log after a crash.

ok. does this option write to my usb boot drive?

I added an additional usb thumb drive to my server. Local sys log is diabled, remote syslog is blank, mirror syslog to flash is set to yes. Looks ok?

omartian · June 8, 2021

Can this be SW related. I updated to 6.9.2 last week. looks like some weird kernel error on my monitor.

omartian · June 9, 2021

13 hours ago, JorgeB said:

Diags after rebooting are not much help for this, you can try enabling syslog mirror to flash then post that log after a crash.

Happened again tonight. Had to powercycle the server.

This is what I see in syslog before the kernel panic:

Jun  8 19:12:30 Nasgard emhttpd: spinning down /dev/sdm
Jun  8 19:57:31 Nasgard emhttpd: spinning down /dev/sdm
Jun  8 20:42:32 Nasgard emhttpd: spinning down /dev/sdm
Jun  8 20:43:27 Nasgard kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Jun  8 20:43:27 Nasgard kernel: rcu:     3-....: (59999 ticks this GP) idle=0fe/1/0x4000000000000000 softirq=1334521/1334530 fqs=14397 
Jun  8 20:43:27 Nasgard kernel:     (t=60001 jiffies g=3697701 q=125290)
Jun  8 20:43:27 Nasgard kernel: NMI backtrace for cpu 3
Jun  8 20:43:27 Nasgard kernel: CPU: 3 PID: 24985 Comm: 7 Not tainted 5.10.28-Unraid #1
Jun  8 20:43:27 Nasgard kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS PRO WIFI/B450 AORUS PRO WIFI-CF, BIOS F60e 12/09/2020
Jun  8 20:43:27 Nasgard kernel: Call Trace:
Jun  8 20:43:27 Nasgard kernel: <IRQ>
Jun  8 20:43:27 Nasgard kernel: dump_stack+0x6b/0x83
Jun  8 20:43:27 Nasgard kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
Jun  8 20:43:27 Nasgard kernel: nmi_cpu_backtrace+0x7d/0x8f
Jun  8 20:43:27 Nasgard kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
Jun  8 20:43:27 Nasgard kernel: rcu_dump_cpu_stacks+0x9f/0xc6
Jun  8 20:43:27 Nasgard kernel: rcu_sched_clock_irq+0x1ec/0x543
Jun  8 20:43:27 Nasgard kernel: ? trigger_load_balance+0x5a/0x1ca
Jun  8 20:43:27 Nasgard kernel: update_process_times+0x50/0x6e
Jun  8 20:43:27 Nasgard kernel: tick_sched_timer+0x36/0x64
Jun  8 20:43:27 Nasgard kernel: __hrtimer_run_queues+0xb7/0x10b
Jun  8 20:43:27 Nasgard kernel: ? tick_sched_do_timer+0x39/0x39
Jun  8 20:43:27 Nasgard kernel: hrtimer_interrupt+0x8d/0x15b
Jun  8 20:43:27 Nasgard kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
Jun  8 20:43:27 Nasgard kernel: asm_call_irq_on_stack+0x12/0x20
Jun  8 20:43:27 Nasgard kernel: </IRQ>
Jun  8 20:43:27 Nasgard kernel: sysvec_apic_timer_interrupt+0x71/0x95
Jun  8 20:43:27 Nasgard kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Jun  8 20:43:27 Nasgard kernel: RIP: 0010:xas_descend+0x45/0x49
Jun  8 20:43:27 Nasgard kernel: Code: 44 c6 08 48 89 77 18 4c 89 c7 e8 77 ff ff ff 84 c0 74 13 49 c1 e8 02 44 89 c1 45 89 c0 49 83 c0 04 4e 8b 44 c6 08 41 88 49 12 <4c> 89 c0 c3 4c 8b 4f 18 48 89 fe 41 f6 c1 03 75 6b 4d 85 c9 4c 8b
Jun  8 20:43:27 Nasgard kernel: RSP: 0018:ffffc90001467ba8 EFLAGS: 00000246
Jun  8 20:43:27 Nasgard kernel: RAX: 0000000000000000 RBX: ffff888106325370 RCX: 0000000000000007
Jun  8 20:43:27 Nasgard kernel: RDX: 0000000000000000 RSI: ffff888100686480 RDI: ffffea000464dd80
Jun  8 20:43:27 Nasgard kernel: RBP: 0000000000000b47 R08: ffffea000464dd80 R09: ffffc90001467bb8
Jun  8 20:43:27 Nasgard kernel: R10: ffffc90001467bb8 R11: ffff88842e8e2400 R12: ffffea000464dd80
Jun  8 20:43:27 Nasgard kernel: R13: ffff88810005c480 R14: 0000000000000b47 R15: 00000000ffffffe5
Jun  8 20:43:27 Nasgard kernel: ? xas_descend+0x2a/0x49
Jun  8 20:43:27 Nasgard kernel: xas_load+0x2d/0x39
Jun  8 20:43:27 Nasgard kernel: find_get_entry+0x57/0xba
Jun  8 20:43:27 Nasgard kernel: find_lock_entry+0x15/0x4a
Jun  8 20:43:27 Nasgard kernel: shmem_getpage_gfp.isra.0+0xf6/0x543
Jun  8 20:43:27 Nasgard kernel: shmem_getpage+0x12/0x14
Jun  8 20:43:27 Nasgard kernel: shmem_file_read_iter+0xaa/0x22f
Jun  8 20:43:27 Nasgard kernel: generic_file_splice_read+0xf0/0x15e
Jun  8 20:43:27 Nasgard kernel: splice_direct_to_actor+0xe4/0x1cd
Jun  8 20:43:27 Nasgard kernel: ? generic_file_splice_read+0x15e/0x15e
Jun  8 20:43:27 Nasgard kernel: do_splice_direct+0x94/0xbd
Jun  8 20:43:27 Nasgard kernel: do_sendfile+0x185/0x24f
Jun  8 20:43:27 Nasgard kernel: __do_sys_sendfile64+0x81/0xa7
Jun  8 20:43:27 Nasgard kernel: do_syscall_64+0x5d/0x6a
Jun  8 20:43:27 Nasgard kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun  8 20:43:27 Nasgard kernel: RIP: 0033:0x951dca

Full syslog attached.

syslog

SpeedyRS2 · June 9, 2021

I had the same problems, since I upgraded from 6.8.3 to 6.9.2. Permanent crashes (every day) . Syslog was not helpful.
Yesterday I downgraded back to 6.8.3. I need Unraid and Docker for my smarthome. Stability is very important here.

omartian · June 9, 2021

18 minutes ago, SpeedyRS2 said:

I had the same problems, since I upgraded from 6.8.3 to 6.9.2. Permanent crashes (every day) . Syslog was not helpful.
Yesterday I downgraded back to 6.8.3. I need Unraid and Docker for my smarthome. Stability is very important here.

Just downgraded to 6.8.3. It unassigned my cache drives oddly enough, but they're back up now.

After a bit of snooping, seeing similar kernel errors on 6.9.2.

Im running a parity check now and will see how it goes. Let me know if you find an update that works for you.

SpeedyRS2 · June 9, 2021

18 minutes ago, omartian said:

Just downgraded to 6.8.3. It unassigned my cache drives oddly enough, but they're back up now.

After a bit of snooping, seeing similar kernel errors on 6.9.2.

Im running a parity check now and will see how it goes. Let me know if you find an update that works for you.

My Cashdrive was also not mounted. All my dockers were gone. at that moment I almost had a heart attack. But then I realized, that the cachdrive was unassigned.
After assining it, everything was fine. In any case, I will not make any more attempts to update the system at the moment. The stability is too important for me.

JorgeB · June 9, 2021

Logged call traces look more like hardware related, not the macvlan/nf_nat 6.9.x related crashes.

omartian · June 9, 2021

CPU issue? How to determine what hardware.

JorgeB · June 9, 2021

1 hour ago, omartian said:

CPU issue? How to determine what hardware.

Difficult to say without starting swapping things around, I would start with board/CPU.

Staite · June 10, 2021

I had similar crashes on my new Server, one after six days than another after nine days running.

Disabled all my d ockers that used br0 net, now running since almost 17 days without crash.

JorgeB · June 10, 2021

1 hour ago, Staite said:

I had similar crashes

Those are not similar, they mention nf_conntrack and those are typical of the macvaln related crashes with v6.9.2

Staite · June 10, 2021

Thanks for clarifying that.

I'm pretty new to crashes that not show an blue screen.

SpeedyRS2 · June 11, 2021

16 hours ago, Staite said:

I had similar crashes on my new Server, one after six days than another after nine days running.

Disabled all my d ockers that used br0 net, now running since almost 17 days without crash.

I have similar experiences. Had the macvaln crashes with the "CALL TRACE" in the syslog. crashes every day since I was an 6.9.2 (coming direct von 6.8.3) . Revmoved all fixed assigned IPs from Docker container in the br0 network.
Then no crash for over a week. But suddenly the crashes were back. Every day. I had no idea what the reason was. No more CALL TRACE in the log. That was the time, where I downgraded back to 6.8.3. Since that downgrade unraid is stable again.

So I don`t think, that theses crashes are Hardware related.

omartian · June 11, 2021

57 minutes ago, SpeedyRS2 said:

I have similar experiences. Had the macvaln crashes with the "CALL TRACE" in the syslog. crashes every day since I was an 6.9.2 (coming direct von 6.8.3) . Revmoved all fixed assigned IPs from Docker container in the br0 network.
Then no crash for over a week. But suddenly the crashes were back. Every day. I had no idea what the reason was. No more CALL TRACE in the log. That was the time, where I downgraded back to 6.8.3. Since that downgrade unraid is stable again.

So I don`t think, that theses crashes are Hardware related.

I'm hoping that's the case for me. Currently running a parity check with zero errors and that the system has not crashed in the last 36 hours. Fingers crossed.

omartian · June 15, 2021

On 6/11/2021 at 1:47 AM, SpeedyRS2 said:

I have similar experiences. Had the macvaln crashes with the "CALL TRACE" in the syslog. crashes every day since I was an 6.9.2 (coming direct von 6.8.3) . Revmoved all fixed assigned IPs from Docker container in the br0 network.
Then no crash for over a week. But suddenly the crashes were back. Every day. I had no idea what the reason was. No more CALL TRACE in the log. That was the time, where I downgraded back to 6.8.3. Since that downgrade unraid is stable again.

So I don`t think, that theses crashes are Hardware related.

Server has been up for over 4 days w/o crashing. Hoping it's the latest update. Let me know if you have success on a future update.

Unraid Server Crashing 2x this week

Recommended Posts

omartian

Link to comment

JorgeB

Link to comment

omartian

Link to comment

omartian

Link to comment

omartian

Link to comment

SpeedyRS2

Link to comment

omartian

Link to comment

SpeedyRS2

Link to comment

JorgeB

Link to comment

omartian

Link to comment

JorgeB

Link to comment

Staite

Link to comment

JorgeB

Link to comment

Staite

Link to comment

SpeedyRS2

Link to comment

omartian

Link to comment

omartian

Link to comment

Join the conversation