Jump to content

Server keeps crashing


Go to solution Solved by clowncracker,

Recommended Posts

I initially installed a m.2 Google Coral and the Coral Accelerator Module Driver plugin, but my server started crashing.  Assuming that this was the issue, I removed the Google Coral and uninstalled the plugin.  Now my server is still crashing and I have no idea why, I'm hoping someone could look at the diagnostics and let me know.

Fix common problems states that there is a hardware issue:

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the Unraid forums (which is what I did).

 

 

Edited by clowncracker
Link to comment
12 hours ago, JorgeB said:

This usually suggests just that, a hardware problem, start by running memtest.

 

Memtest has completed four passes with no issues.

 

The weird thing is that it isn't an instant crash.  The server is fine for 3ish hours and then the UI just stops working.  I cannot access it from the webpage and I have manually restart the computer.

Edited by clowncracker
Link to comment

Just crashed again during a parity check, after being online for about 3 hours and 45 minutes.  I needed to manually restart my server to get it to be responsive again.

I have a notification popup that says Parity check finished (0 errors) with a duration of over 19 hours, even though the server was online for less than four hours.

I'd like to note that fix common problems (and the syslog) no longer indicate that this is a hardware issue.  I've attached the syslog.

 

 

Edited by clowncracker
Link to comment

Is the RAM ECC?... Clutching at straws

 

13 hours ago, clowncracker said:

The server is fine for 3ish hours and then the UI just stops working. 

 

Again, clutching at straws... Does the server keep working and/while the UI stops working?

 

Hope it helps.

 

MGrey.

 

 

Link to comment
8 minutes ago, MrGrey said:

Is the RAM ECC?... Clutching at straws

 

 

Again, clutching at straws... Does the server keep working and/while the UI stops working?

 

Hope it helps.

 

MGrey.

 

 

All of the VMs and Dockers stop working, I think it just crashes but the computer doesn't turn off.

 

Not ECC RAM.  Considering it's been working for about 8 hours at this point in safe mode with no Dockers/VMs running, I'm fairly certain the hardware error was a false flag.

 

This all started when I installed the m.2 Google coral and installed the driver plugin, so I think the driver plugin messed something up.  Even after I uninstalled the plugin and removed the Google coral, the issue persisted.

Link to comment

Seems you got a sorta working stable mode now. This means you can try around seeing what exactly causes the error. Its gonna be a lot of effort as you have to wait many hours but you can at least start activating stuff again bit by bit and see how the server reacts.

 

Otherwise: Maybe try to limit every docker & VM to just one CPU core via pinning and check again. Maybe its just one docker going berserk and taking up 100% CPU on all cores causing nothing else to work anymore?

 

 

Link to comment
5 hours ago, JorgeB said:

There are call traces and sgefault logged, but those by themselves don't rule point to a culprit, just suggest a hardware problem, RAM and/or board would be my main suspects.

 

I believe the sever crashes when the CPU gets near 100% utilization.  If memtest didn't give me any errors, do you think that means it's the motherboard?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...