Jump to content

Random Crashes


Recommended Posts

Posted

Hello, sorry to bug you guys, I have had two instances now in the last couple weeks where unRAID crashes on random. I cannot wake up the local screen, I can't access any of my dockers as they have crashed too. I know this is a vague issue, but was hoping someone here would notice something in the diagnostics possibly. Thanks for any help.

voltznet-1-diagnostics-20240528-0741.zip

  • 4 weeks later...
Posted
On 5/28/2024 at 9:13 AM, JorgeB said:

You can enable the syslog server and post that after a crash, in case there's something logged there.

Continuing to have crashes on random. This was all from this evening. System went completely unresponsive although continuing to run. Tried pinging it and got no response also. Just now came across the same thing after it running again for roughly an hour.

voltznet-1-diagnostics-20240621-2022.zip syslog syslog-previous

Posted

There are multiple call traces and segfaults, and lots of these:

 

Jun 21 21:00:35 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:02:09 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:04:47 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:05:50 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:08:27 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged

 

So looks like a hardware issue, start by running memtest

Posted
18 hours ago, JorgeB said:

There are multiple call traces and segfaults, and lots of these:

 

Jun 21 21:00:35 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:02:09 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:04:47 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:05:50 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged
Jun 21 21:08:27 VoltzNet-1 kernel: mce: [Hardware Error]: Machine check events logged

 

So looks like a hardware issue, start by running memtest

Memtest failed on 2nd pass (2 bits) at 92Gb out of 128Gb. I reseated dimms and realized the temps were a bit high on the VRM and Chipset so I added a couple fans also. Reran memtest 4 passes and passed. I'm kind of thinking some temp issues may be afoot. Will stop back if this still continues.

  • Like 1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...