Server locking up every ~30hours with no Syslog entries


Go to solution Solved by Vr2Io,

Recommended Posts

I have been having various issues with Unraid for a little while now, and have been trying to hunt down what is causing the issue. I have tested every possible hardware-related issue the server could be having (new motherboard, swapped GPU's, even swapped CPU and RAM).

 

I thought it may be my 10GbE network card that was causing issues, but I removed that with no change. I can't seem to figure out what is causing this.

 

I have attempted to enable virtual machines with my Nvidia GPU a while ago, but I gave up and reverted everything the way it was (I think). Not sure if this could contribute to the issue.

 

 

To no avail, I still am having issues. The server (usually a few hours after a parity-check is completed) will simply either lock up completely or kernel panic and reboot.


The issue continues to happen when the server is in safe mode with all plugins disabled and minimal docker's running. Previously I have tried to uninstall problematic plugins (Nvidia driver, Tips and Tweaks etc...) but no change.

 

Very rarely I actually get something in the syslog when the server crashes. The last kernel panic logged was on Feb 5 2022, but I had one today at 10:50am, with no errors saved in the log. The most latest logged crash is as follows:

 

Feb  5 05:49:32 Unraid kernel: <IRQ>
Feb  5 05:49:32 Unraid kernel: queued_spin_lock_slowpath+0x7/0xa
Feb  5 05:49:32 Unraid kernel: nr_blockdev_pages+0x1d/0x6d
Feb  5 05:49:32 Unraid kernel: si_meminfo+0x3f/0x5c
Feb  5 05:49:32 Unraid kernel: Sys_MemInfo+0x25/0xa0 [corefreqk]
Feb  5 05:49:32 Unraid kernel: ? tick_sched_do_timer+0x3e/0x3e
Feb  5 05:49:32 Unraid kernel: ? update_cfs_rq_load_avg+0x117/0x125
Feb  5 05:49:32 Unraid kernel: ? Sys_DumpTask+0xed/0xf5 [corefreqk]
Feb  5 05:49:32 Unraid kernel: Cycle_AMD_Family_17h+0x321/0x58b [corefreqk]
Feb  5 05:49:32 Unraid kernel: ? SoC_RAPL+0x63/0x63 [corefreqk]
Feb  5 05:49:32 Unraid kernel: ? SoC_RAPL+0x63/0x63 [corefreqk]
Feb  5 05:49:32 Unraid kernel: Entry_AMD_F17h+0xb1/0xdd [corefreqk]
Feb  5 05:49:32 Unraid kernel: ? Cycle_AMD_F17h+0x18/0x18 [corefreqk]
Feb  5 05:49:32 Unraid kernel: __hrtimer_run_queues+0xfa/0x18a
Feb  5 05:49:32 Unraid kernel: hrtimer_interrupt+0x92/0x160
Feb  5 05:49:32 Unraid kernel: __sysvec_apic_timer_interrupt+0x99/0xdb
Feb  5 05:49:32 Unraid kernel: sysvec_apic_timer_interrupt+0x61/0x7d
Feb  5 05:49:32 Unraid kernel: </IRQ>
Feb  5 05:49:32 Unraid kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Feb  5 05:49:32 Unraid kernel: RIP: 0010:nr_blockdev_pages+0x48/0x6d
Feb  5 05:49:32 Unraid kernel: Code: 8b 3d 7a 98 fa 00 48 8b 87 48 05 00 00 48 8d 97 48 05 00 00 48 2d 10 01 00 00 48 8d 88 10 01 00 00 48 39 d1 74 17 48 8b 48 30 <48> 8b 80 10 01 00 00 4c 03 61 58 48 2d 10 01 00 00 eb dd 48 81 c7
Feb  5 05:49:32 Unraid kernel: RSP: 0018:ffffc900032abe68 EFLAGS: 00000206
Feb  5 05:49:32 Unraid kernel: RAX: ffff888133a30398 RBX: ffffc900032abeb8 RCX: ffff888133a30508
Feb  5 05:49:32 Unraid kernel: RDX: ffff88810007a548 RSI: 000000000004b580 RDI: ffff88810007a000
Feb  5 05:49:32 Unraid kernel: RBP: ffffc900032abea8 R08: 00000000003d78dd R09: 0000000000000000
Feb  5 05:49:32 Unraid kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000008af
Feb  5 05:49:32 Unraid kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Feb  5 05:49:32 Unraid kernel: ? nr_blockdev_pages+0x1d/0x6d
Feb  5 05:49:32 Unraid kernel: si_meminfo+0x3f/0x5c
Feb  5 05:49:32 Unraid kernel: do_sysinfo.isra.0+0x9a/0x131
Feb  5 05:49:32 Unraid kernel: __do_sys_sysinfo+0x20/0x55
Feb  5 05:49:32 Unraid kernel: do_syscall_64+0x83/0xa5
Feb  5 05:49:32 Unraid kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Feb  5 05:49:32 Unraid kernel: RIP: 0033:0x147546b45367
Feb  5 05:49:32 Unraid kernel: Code: f0 ff ff 73 01 c3 48 8b 0d fe 8a 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 63 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d1 8a 0c 00 f7 d8 64 89 01 48
Feb  5 05:49:32 Unraid kernel: RSP: 002b:00007ffc1724d148 EFLAGS: 00000206 ORIG_RAX: 0000000000000063
Feb  5 05:49:32 Unraid kernel: RAX: ffffffffffffffda RBX: 0000000000490f40 RCX: 0000147546b45367
Feb  5 05:49:32 Unraid kernel: RDX: 0000147546bd0024 RSI: 000000000000004c RDI: 00007ffc1724d150
Feb  5 05:49:32 Unraid kernel: RBP: 00007ffc1724d270 R08: 0000000000000000 R09: fffffffffffff800
Feb  5 05:49:32 Unraid kernel: R10: 0000147546a487f8 R11: 0000000000000206 R12: 0000000000000000
Feb  5 05:49:32 Unraid kernel: R13: 00000000004e350c R14: 0000000000000030 R15: 00000000000004f0

 

 

This panic happened a few times before the server locked up completely. I believe I have the kernel set to reboot automatically when it encounteres a error (since I am usually away from the server during the day), but it very rarely works.

 

Please let me know if you need any more information! Any help is very much appreciated!

unraid-diagnostics-20220207-1850.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.