September 11, 20232 yr Hi, I've been setting up my first server last Spring, mainly for torrenting and running Plex. Since 2-3 months, it keeps crashing randomly. I've tried to pinpoint the issue, with no success. It crashes in safe mode too. Recently, I notice it crashes after in enable Docker. I'll need to troubleshoot to see if it crashes with Docker disabled too. **Edit** It crashes in Safe Mode too (after 3 minutes) Hard drives and SSD are new from last Spring (maybe one is older, not sure) (I attached today's syslog and Diagnostics) Things I know: - Memtest ran flawlessly for 20 hours - CPU stress tests with corefreq also runs perfectly - SMART tests return no errors on all drives - CPU and RAM are not OC'd, everything runs with factory settings/frequency and such Some Specs: OS version: 6.12.4 MB: EVGA X58 SLI LE CPU: intel core i7 920 2.67MHz RAM : 6x OCZ 1333MHz 2Gb DDR3 Any help would be greatly appreciated, thanks! syslog tower-diagnostics-20230910-1353.zip Edited September 11, 20232 yr by hathi_ndg
September 11, 20232 yr Community Expert That syslog has entries from April, so I assume it is the result of running syslog server. But then there are no further entries until after the latest reboot, so syslog server must not have been running when it crashed recently. Get us another syslog from syslog server after next crash.
September 11, 20232 yr Author ok it re-crashed about 5 minutes ago, Here are the new diagnostics as well as the syslog that's fed continuously on the flash drive tower-diagnostics-20230910-2235.zip syslog
September 11, 20232 yr Author And crashed now while mounting remote shares (looks like it though) syslog
September 11, 20232 yr Author --> Changed the USB flash drive for a brand new one, transferred the key, etc. --> Removed corefreq ans swapfile Result: Still crashed after 3-4 minutes --> Testing RAM sticks 1 by 1 now. Will update here. ** Stick 1/6: ran smoothly for 17 hours 45 minutes ✅ ** Stick 2/6 : 2 hours 20 minutes of smooth sailing ⏲️ Meanwhile, if anyone has hints with the diagnostics or syslog, lmk thanks!! Edited September 12, 20232 yr by hathi_ndg update RAM stick tests
September 12, 20232 yr The syslog has multiple call traces that look like this: Sep 10 22:38:04 Tower kernel: Call Trace: Sep 10 22:38:04 Tower kernel: <IRQ> Sep 10 22:38:04 Tower kernel: dump_stack_lvl+0x44/0x5c Sep 10 22:38:04 Tower kernel: __report_bad_irq+0x35/0xaa Sep 10 22:38:04 Tower kernel: note_interrupt+0x1f6/0x24d Sep 10 22:38:04 Tower kernel: handle_irq_event_percpu+0x2c/0x35 Sep 10 22:38:04 Tower kernel: handle_irq_event+0x37/0x56 Sep 10 22:38:04 Tower kernel: handle_fasteoi_irq+0x99/0x113 Sep 10 22:38:04 Tower kernel: __common_interrupt+0x9e/0xaa Sep 10 22:38:04 Tower kernel: common_interrupt+0x96/0xc1 Sep 10 22:38:04 Tower kernel: </IRQ> Sep 10 22:38:04 Tower kernel: <TASK> Sep 10 22:38:04 Tower kernel: asm_common_interrupt+0x22/0x40 Sep 10 22:38:04 Tower kernel: RIP: 0010:cpuidle_enter_state+0x11d/0x202 Sep 10 22:38:04 Tower kernel: Code: 20 22 a0 ff 45 84 ff 74 1b 9c 58 0f 1f 40 00 0f ba e0 09 73 08 0f 0b fa 0f 1f 44 00 00 31 ff e8 4c e3 a4 ff fb 0f 1f 44 00 00 <45> 85 e4 0f 88 ba 00 00 00 48 8b 04 24 49 63 cc 48 6b d1 68 49 29 Sep 10 22:38:04 Tower kernel: RSP: 0000:ffffc900000dbe98 EFLAGS: 00000282 Sep 10 22:38:04 Tower kernel: RAX: 0000000000000000 RBX: ffff888353bf6300 RCX: 0000000000000020 Sep 10 22:38:04 Tower kernel: RDX: 00000000820ed4af RSI: 000000007fffffff RDI: 00000000ffffffff Sep 10 22:38:04 Tower kernel: RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000000002 Sep 10 22:38:04 Tower kernel: R10: 0000000000000020 R11: 0000000000000004 R12: 0000000000000004 Sep 10 22:38:04 Tower kernel: R13: ffffffff823205c0 R14: 000000010624b113 R15: 0000000000000001 Sep 10 22:38:04 Tower kernel: ? cpuidle_enter_state+0x117/0x202 Sep 10 22:38:04 Tower kernel: cpuidle_enter+0x2a/0x38 Sep 10 22:38:04 Tower kernel: do_idle+0x18d/0x1fb Sep 10 22:38:04 Tower kernel: cpu_startup_entry+0x1d/0x1f Sep 10 22:38:04 Tower kernel: start_secondary+0x101/0x101 Sep 10 22:38:04 Tower kernel: secondary_startup_64_no_verify+0xce/0xdb Sep 10 22:38:04 Tower kernel: </TASK> Sep 10 22:38:04 Tower kernel: handlers: Sep 10 22:38:04 Tower kernel: [<(____ptrval____)>] usb_hcd_irq Sep 10 22:38:04 Tower kernel: [<(____ptrval____)>] usb_hcd_irq Those do not look like normal "call traces related to macvlan". But it wouldn't hurt to switch from macvlan to ipvlan anyway, see https://docs.unraid.net/unraid-os/release-notes/6.12.4/#fix-for-macvlan-call-traces
September 12, 20232 yr Author 15 hours ago, ljm42 said: The syslog has multiple call traces that look like this: Sep 10 22:38:04 Tower kernel: Call Trace: Sep 10 22:38:04 Tower kernel: <IRQ> Sep 10 22:38:04 Tower kernel: dump_stack_lvl+0x44/0x5c Sep 10 22:38:04 Tower kernel: __report_bad_irq+0x35/0xaa Sep 10 22:38:04 Tower kernel: note_interrupt+0x1f6/0x24d Sep 10 22:38:04 Tower kernel: handle_irq_event_percpu+0x2c/0x35 Sep 10 22:38:04 Tower kernel: handle_irq_event+0x37/0x56 Sep 10 22:38:04 Tower kernel: handle_fasteoi_irq+0x99/0x113 Sep 10 22:38:04 Tower kernel: __common_interrupt+0x9e/0xaa Sep 10 22:38:04 Tower kernel: common_interrupt+0x96/0xc1 Sep 10 22:38:04 Tower kernel: </IRQ> Sep 10 22:38:04 Tower kernel: <TASK> Sep 10 22:38:04 Tower kernel: asm_common_interrupt+0x22/0x40 Sep 10 22:38:04 Tower kernel: RIP: 0010:cpuidle_enter_state+0x11d/0x202 Sep 10 22:38:04 Tower kernel: Code: 20 22 a0 ff 45 84 ff 74 1b 9c 58 0f 1f 40 00 0f ba e0 09 73 08 0f 0b fa 0f 1f 44 00 00 31 ff e8 4c e3 a4 ff fb 0f 1f 44 00 00 <45> 85 e4 0f 88 ba 00 00 00 48 8b 04 24 49 63 cc 48 6b d1 68 49 29 Sep 10 22:38:04 Tower kernel: RSP: 0000:ffffc900000dbe98 EFLAGS: 00000282 Sep 10 22:38:04 Tower kernel: RAX: 0000000000000000 RBX: ffff888353bf6300 RCX: 0000000000000020 Sep 10 22:38:04 Tower kernel: RDX: 00000000820ed4af RSI: 000000007fffffff RDI: 00000000ffffffff Sep 10 22:38:04 Tower kernel: RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000000002 Sep 10 22:38:04 Tower kernel: R10: 0000000000000020 R11: 0000000000000004 R12: 0000000000000004 Sep 10 22:38:04 Tower kernel: R13: ffffffff823205c0 R14: 000000010624b113 R15: 0000000000000001 Sep 10 22:38:04 Tower kernel: ? cpuidle_enter_state+0x117/0x202 Sep 10 22:38:04 Tower kernel: cpuidle_enter+0x2a/0x38 Sep 10 22:38:04 Tower kernel: do_idle+0x18d/0x1fb Sep 10 22:38:04 Tower kernel: cpu_startup_entry+0x1d/0x1f Sep 10 22:38:04 Tower kernel: start_secondary+0x101/0x101 Sep 10 22:38:04 Tower kernel: secondary_startup_64_no_verify+0xce/0xdb Sep 10 22:38:04 Tower kernel: </TASK> Sep 10 22:38:04 Tower kernel: handlers: Sep 10 22:38:04 Tower kernel: [<(____ptrval____)>] usb_hcd_irq Sep 10 22:38:04 Tower kernel: [<(____ptrval____)>] usb_hcd_irq Those do not look like normal "call traces related to macvlan". But it wouldn't hurt to switch from macvlan to ipvlan anyway, see https://docs.unraid.net/unraid-os/release-notes/6.12.4/#fix-for-macvlan-call-traces Thanks I'll look into that! **Edit: done!** Let's see how this rolls, along with my RAM sticks 1 by 1 testing! Edited September 12, 20232 yr by hathi_ndg
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.