Random server reboots/crashes


11 posts in this topic Last Reply

Recommended Posts

Let me start off by saying I went through https://forums.unraid.net/topic/37579-need-help-read-me-first/

Because my server is crashing i cant retrieve a diagnostics file from the looks of it. Fix Common Problems is reporting this:

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged

however I cannot run it as when I do:

mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead. CPU is unsupported

Google turns up nothing but dead ends for the above to be able to run edac_mce_amd .... 

The post mentions a Troubleshooting Mode in Fix Common Problems but I don't see it and there's no trace of config/logs/syslog.txt on the flash drive.

 

I've been getting these crashes quite often (about every other week or so) all my hardware in my server is basically brand new besides some older drives in the array.

 

Anyone have any suggestions for how I could troubleshoot this?

Link to post

same here my one unraid  it randomly reboots on its own.. as i loose access to the docker.. i also find dockers turn themselves off on there own  on 2 of my unraids

but no system logs and i looked at the start here but that didnt explain anything.. since there is no parity drive on the unraid thats rebooting

it started in the past week that i noticed

Link to post
  • 1 month later...
On 7/11/2020 at 3:26 AM, johnnie.black said:

Start here.

Thanks! I'm in dual channel 2/4 3200 (I plan on switching to 4 kits of 2666 ecc ram though in the future) so that shouldn't be the problem...

 

I have disabled global c-state control... we'll see if that fixes it.

Link to post
  • 2 weeks later...

HynesJeff,

 

Check your RAM, run MEMTEST64.

 

Explain;

I had a random hardware crash, reboot issue for about 6 months, I looked all over for the issue, but could not figure it out. I ended up running MEMTEST64 and found some bad RAM, it was easy to find out which DIMM it was with isolation. Once I pulled the bad stick out, it was solid for 3 months, RMA'd the ram and all is good again.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.