boxer74 Posted August 7, 2018 Share Posted August 7, 2018 Last night my server crashed. Docker containers non-responsive, no local terminal, no web ui. Did a hard reset and upon the subsequent parity sync, I'm getting call traces. While these occur, my Windows 10 VM is unresponsive. The rest of the server seems to work fine. Any thoughts as to why this would be happening all of a sudden? Quote Aug 7 11:56:46 unRAID kernel: INFO: rcu_sched self-detected stall on CPU Aug 7 11:56:46 unRAID kernel: 29-...: (60000 ticks this GP) idle=17e/140000000000001/0 softirq=1555787/1555787 fqs=14876 Aug 7 11:56:46 unRAID kernel: (t=60001 jiffies g=1556548 c=1556547 q=27686) Aug 7 11:56:46 unRAID kernel: NMI backtrace for cpu 29 Aug 7 11:56:46 unRAID kernel: CPU: 29 PID: 5876 Comm: unraidd Not tainted 4.14.49-unRAID #1 Aug 7 11:56:46 unRAID kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602, BIOS P1.80 12/09/2013 Aug 7 11:56:46 unRAID kernel: Call Trace: Aug 7 11:56:46 unRAID kernel: <IRQ> Aug 7 11:56:46 unRAID kernel: dump_stack+0x5d/0x79 Aug 7 11:56:46 unRAID kernel: nmi_cpu_backtrace+0x9b/0xba Aug 7 11:56:46 unRAID kernel: ? irq_force_complete_move+0xf3/0xf3 Aug 7 11:56:46 unRAID kernel: nmi_trigger_cpumask_backtrace+0x56/0xd4 Aug 7 11:56:46 unRAID kernel: rcu_dump_cpu_stacks+0x8e/0xb8 Aug 7 11:56:46 unRAID kernel: rcu_check_callbacks+0x212/0x5f0 Aug 7 11:56:46 unRAID kernel: update_process_times+0x23/0x45 Aug 7 11:56:46 unRAID kernel: tick_sched_timer+0x33/0x61 Aug 7 11:56:46 unRAID kernel: __hrtimer_run_queues+0x78/0xc1 Aug 7 11:56:46 unRAID kernel: hrtimer_interrupt+0x87/0x157 Aug 7 11:56:46 unRAID kernel: smp_apic_timer_interrupt+0x75/0x85 Aug 7 11:56:46 unRAID kernel: apic_timer_interrupt+0x7d/0x90 Aug 7 11:56:46 unRAID kernel: </IRQ> Aug 7 11:56:46 unRAID kernel: RIP: 0010:memcmp+0x7/0x1d Aug 7 11:56:46 unRAID kernel: RSP: 0018:ffffc9000718bcd0 EFLAGS: 00000283 ORIG_RAX: ffffffffffffff10 Aug 7 11:56:46 unRAID kernel: RAX: 0000000000000000 RBX: ffff8808482542f0 RCX: 0000000000000283 Aug 7 11:56:46 unRAID kernel: RDX: 0000000000001000 RSI: ffff88085b504000 RDI: ffff880859e0b000 Aug 7 11:56:46 unRAID kernel: RBP: 00000000ffffffff R08: 000000000000006d R09: ffff880848254358 Aug 7 11:56:46 unRAID kernel: R10: 0000000000000fd0 R11: 0000000000000ff0 R12: ffff880847940800 Aug 7 11:56:46 unRAID kernel: R13: 0000000000000001 R14: ffff880848254330 R15: 0000000000000008 Aug 7 11:56:46 unRAID kernel: check_parity+0x206/0x30b [md_mod] Aug 7 11:56:46 unRAID kernel: ? autoremove_wake_function+0x9/0x2a Aug 7 11:56:46 unRAID kernel: ? __wake_up_common+0xb2/0x126 Aug 7 11:56:46 unRAID kernel: handle_stripe+0xefc/0x1293 [md_mod] Aug 7 11:56:46 unRAID kernel: unraidd+0xb8/0x111 [md_mod] Aug 7 11:56:46 unRAID kernel: ? md_open+0x2c/0x2c [md_mod] Aug 7 11:56:46 unRAID kernel: ? md_thread+0xbc/0xcc [md_mod] Aug 7 11:56:46 unRAID kernel: ? handle_stripe+0x1293/0x1293 [md_mod] Aug 7 11:56:46 unRAID kernel: md_thread+0xbc/0xcc [md_mod] Aug 7 11:56:46 unRAID kernel: ? wait_woken+0x68/0x68 Aug 7 11:56:46 unRAID kernel: kthread+0x111/0x119 Aug 7 11:56:46 unRAID kernel: ? kthread_create_on_node+0x3a/0x3a Aug 7 11:56:46 unRAID kernel: ? SyS_exit_group+0xb/0xb Aug 7 11:56:46 unRAID kernel: ret_from_fork+0x35/0x40 Link to comment
Frank1940 Posted August 7, 2018 Share Posted August 7, 2018 Post up the entire diagnostics file in your next post. Tools >>> Diagnostics Link to comment
boxer74 Posted August 7, 2018 Author Share Posted August 7, 2018 Attached. unraid-diagnostics-20180807-1653.zip Link to comment
Squid Posted August 7, 2018 Share Posted August 7, 2018 FYI, there are no call traces in the diagnostics. Maybe start a parity check and see if a trace shows up and then repost the diagnostics Link to comment
boxer74 Posted August 7, 2018 Author Share Posted August 7, 2018 Here's a proper file. unraid-diagnostics-20180807-1808.zip Link to comment
Squid Posted August 7, 2018 Share Posted August 7, 2018 Just for giggles, under settings, Disk Settings, Tunable (nr_requests), try setting it to 8 instead of the default of 128. IIRC, when self detected stalls were a big issue on unRaid, the original setting was 8, and the fix was 128. Try setting it to the reverse and see if any improvement happens. Link to comment
boxer74 Posted August 7, 2018 Author Share Posted August 7, 2018 Seems like that made it worse. Link to comment
JorgeB Posted August 8, 2018 Share Posted August 8, 2018 There was a user with similar issues that got rid of most of the issues by reducing the tunables values (not nr_requests, the md tunables), though I think this is just a workaround even if it helps, and there's something wrong with the hardware. Link to comment
boxer74 Posted August 8, 2018 Author Share Posted August 8, 2018 I set the tunables back to defaults and it seems to have stopped the call traces. How could I tell what hardware issues there may be? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.