Need help solving random crashing.


Recommended Posts

I have only recently started using Unraid as of 2 weeks ago. Currently version 6.5.1. The entire time I have had random crashing. Sometimes multiple times a day. The longest uptime was just over 3 days. I'm only running Dockers for Plex, Sonarr, Radarr, sabnzbd, and nzbHydra2.

I'm running it on an i7 970 with a EVGA X58 FTW3 motherboard with the latest bios. The temperatures were somewhat high so yesterday I replaced the stock heat sink with something better. This greatly improved the CPU temps but the crashing appeared to increase in frequency. Possibly just a coincidence.

 

Today I went home during lunch and saw that it had crashed again. There is never any error message on the screen. This time however when I went to reboot it it hands on loading bzroot. I left it running memtest to see if detects any errors.

I have installed the Fix Common Problems plugin and have attached the latest logs generated from troubleshooting mode.

 

If anyone can provide any insight it would be greatly appreciated. I'm at a total loss currently.

 

FCPsyslog_tail.txt

tower-diagnostics-20180504-1120.zip

Edited by Magma
Added more details.
Link to comment

Unfortunately I'm still at a loss. Memtest overnight provided 0 errors.

It lasted all day but crashed somewhere between 2:23 and 2:53 in the morning. I have attached the syslog and diagnostics again.

Not exactly sure what else to be looking for honestly. One of my drives displayed read errors during a parity check. Nothing is stored on that drive presently however and is typically spun down. Could that be causing the crashes?

tower-diagnostics-20180507-0223.zip

FCPsyslog_tail.txt

Link to comment

Another update. I have removed the dying hard drive and have replaced the ram which has passed overnight memtests.

The crashing continues. It appears to be tied to high cpu usage. It happened when sabnzbd was unpacking at 30 gb download and it happens nightly when PLEX does it's scheduled maintenance.

I rebooted today and started troubleshooting mode. I then started the scheduled maintenance and after about 30 mins it all locked up. Temperatures were only at 38 C.

 

I have attached the latest diagnostics and syslog tail from this crash.

I have a new 600w PSU with a 50a single +12v rail on the way.

Any other suggestions on what could possibly be the issue or what I could try in the meantime?

 

 

FCPsyslog_tail.txt

tower-diagnostics-20180509-1148.zip

Link to comment
15 minutes ago, Magma said:

What exactly will this determine though? The server would essentially be idling with all of the drives spun down.

 

You're looking for stability. If it's stable in this basic operating mode you can start re-enabling things (just one at a time, ideally) to see if any of them cause it to fall over. It's much easier to find the culprit this way.

Link to comment
  • 3 weeks later...

I am at a complete loss so here's an update.

 

I disabled all of my dockers and enabled them one at a time. Crashes were still happening though not as often.

 

The event that most closely coincided with the crashes was plexs overnight maintenance although it wasn't uncommon for there to be no crash. I have also had it crash during the middle of the day with no real active usage.

 

My longest uptime was 2.5 days. I have replaced the power supply, and ram. It has passed overnight memtest and prime95 tests.

 

I have tried only using 1 stick of ram at a time and swapping them out.

 

My only guess is that I need to replace the motherboard which I would prefer to avoid because at that point I should probably just replace the ram and cpu too.

 

I have attached the most recent syslog and diagnostics from the last crash.

FCPsyslog_tail.txt

tower-diagnostics-20180528-1112.zip

Link to comment

Two suggestions for you - first is most likely and I had nasty problems with it but I've had problems with the second before as well.

  1. Turn off in bios if on MB or remove the Marvel 88SE9123 controller card - I had my Marvel 9230 controller passed through to a VM and got dropped drives and had to reboot the server to get them back.  Didn't cause unRAID crashes for me but if you are using array drives on yours this is likely your problem.  I turned my MB 9230 off in the bios so I wouldn't be tempted to use it and haven't had any problems with that server any more.
  2. Turn off in bios if on MB or remove the NEC USB 3.0 controller card.  Since I stopped using my Fresco USB 3.0 card in another server it has been up for 15 days and I believe it would have been longer but I had to reboot for an unRAID upgrade.  Before that I got random crashes that I couldn't figure out with it installed.

Really think the 1st one is most likely the cause but doing either or both of the above is where I would start to trouble shoot since you have already tried some other hardware changes.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.