Jump to content

Help please - server keeps powering itself off.


Recommended Posts

Twice in the last 3 or 4 days i've gone to access my unraid box (one time accessing a network share, the other trying to load Plex content) and found it powered off without my intervention.

We haven't had any issues with any other devices so I don't believe it's due to power outages. I've had another PC on the same electrical circuit running all day unaffected. 

 

I looked through the diagnostics export but I can't see any indicators about errors or a crash.

I've attached the file in case anyone can spot something I haven't. Cheers in advance

 

Can anyone think of any reason why it would be turning itself off inexplicably?

olympus-diagnostics-20230527-1911.zip

Edited by EwanL
removed duplicate attachments
Link to comment
  • 3 weeks later...

I'm a bit confused about what the syslog should look like.

 

I think i've configured it to save persistent logs to one of my shares - but again i can't see anything obvious as to why the server is shutting down. 

 

I have replaced the PSU with a brand new one. The server is no longer powering off - but it does randomly become unresponsive on the network. 

 

 

syslog-1686494658 syslog syslog-1686532690

Link to comment
3 minutes ago, EwanL said:

but it does randomly become unresponsive on the network. 

You have a lot of call traces in your syslog.  These usually indicate some sort of hardware issue.  When enough call traces pile up, they eventually result in the server becoming unresponsive.

 

I am not expert enough to tell you what is causing these call traces (perhaps someone else can point you in the right direction), but, there are a lot of them.

Jun 11 01:20:52 Olympus kernel: Call Trace:
Jun 11 01:20:52 Olympus kernel: <TASK>
Jun 11 01:20:52 Olympus kernel: __schedule+0x596/0x5f6
Jun 11 01:20:52 Olympus kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
Jun 11 01:20:52 Olympus kernel: ? __mod_timer+0x207/0x232
Jun 11 01:20:52 Olympus kernel: ? rcu_gp_init+0x460/0x460
Jun 11 01:20:52 Olympus kernel: schedule+0x8e/0xc3
Jun 11 01:20:52 Olympus kernel: schedule_timeout+0x9d/0xd7
Jun 11 01:20:52 Olympus kernel: ? __bpf_trace_tick_stop+0x9/0x9
Jun 11 01:20:52 Olympus kernel: rcu_gp_fqs_loop+0xed/0x351
Jun 11 01:20:52 Olympus kernel: rcu_gp_kthread+0x131/0x14d
Jun 11 01:20:52 Olympus kernel: kthread+0xe7/0xef
Jun 11 01:20:52 Olympus kernel: ? kthread_complete_and_exit+0x1b/0x1b
Jun 11 01:20:52 Olympus kernel: ret_from_fork+0x22/0x30
Jun 11 01:20:52 Olympus kernel: </TASK>
Jun 11 01:20:52 Olympus kernel: rcu: Stack dump where RCU GP kthread last ran:
Jun 11 01:20:52 Olympus kernel: Sending NMI from CPU 10 to CPUs 14:
Jun 11 01:25:02 Olympus kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Jun 11 01:25:02 Olympus kernel: rcu: 	4-...!: (0 ticks this GP) idle=a66/0/0x0 softirq=859477/859477 fqs=0  (false positive?)
Jun 11 01:25:02 Olympus kernel: rcu: 	6-...!: (1 GPs behind) idle=49c/0/0x0 softirq=834876/834877 fqs=0  (false positive?)
Jun 11 01:25:02 Olympus kernel: rcu: 	7-...!: (6 GPs behind) idle=604/0/0x0 softirq=869619/869619 fqs=0  (false positive?)
Jun 11 01:25:02 Olympus kernel: rcu: 	12-...!: (24 GPs behind) idle=616/0/0x0 softirq=795406/795407 fqs=0  (false positive?)
Jun 11 01:25:02 Olympus kernel: rcu: 	14-...!: (0 ticks this GP) idle=09d/1/0x4000000000000000 softirq=800399/800400 fqs=0 
Jun 11 01:25:02 Olympus kernel: rcu: 	15-...!: (1 GPs behind) idle=c40/0/0x0 softirq=837297/837299 fqs=0  (false positive?)
Jun 11 01:25:02 Olympus kernel: 	(detected by 3, t=3310590 jiffies, g=2244009, q=685778 ncpus=16)
Jun 11 01:25:02 Olympus kernel: Sending NMI from CPU 3 to CPUs 4:
Jun 11 01:25:02 Olympus kernel: Sending NMI from CPU 3 to CPUs 6:
Jun 11 01:25:02 Olympus kernel: Sending NMI from CPU 3 to CPUs 7:
Jun 11 01:25:02 Olympus kernel: Sending NMI from CPU 3 to CPUs 12:
Jun 11 01:25:02 Olympus kernel: Sending NMI from CPU 3 to CPUs 14:
Jun 11 01:25:02 Olympus kernel: Sending NMI from CPU 3 to CPUs 15:
Jun 11 01:25:02 Olympus kernel: rcu: rcu_preempt kthread timer wakeup didn't happen for 3370617 jiffies! g2244009 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
Jun 11 01:25:02 Olympus kernel: rcu: 	Possible timer handling issue on cpu=14 timer-softirq=55127
Jun 11 01:25:02 Olympus kernel: rcu: rcu_preempt kthread starved for 3370620 jiffies! g2244009 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=14
Jun 11 01:25:02 Olympus kernel: rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
Jun 11 01:25:02 Olympus kernel: rcu: RCU grace-period kthread stack dump:
Jun 11 01:25:02 Olympus kernel: task:rcu_preempt     state:R stack:    0 pid:   15 ppid:     2 flags:0x00004000

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...