Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Hard crash every 3-6 weeks

Featured Replies

Hi,

 

what i mean with "Hard crash" is, the server just turns completely off.

IPMI is still available, but i can not power the server back on via IPMI.

I can only turn it back on if i unplug the power and wait a couple of seconds (discharging capacitors).

 

I have set up the syslog server, but there is nothing useful in the logs at all.

The crash always happens while the server is idling @45W.

 

The server is connected to a PDU that is connected to a UPS.

No other device connected to the UPS turns off (Modem, Router etc.)

 

Hardware:

  • Motherboard - Supermicro X11SCH-F
  • CPU - Xeon E-2278G
  • RAM - 64 GB DDR4 ECC @2666Mhz
  • PSU - Seasonic Focus PX 750W (80+ Plat)
  • 3 x Toshiba 16TB HDDs
  • 2 x Silicon Power 1TB M.2 NVMe drives

 

Server was running smooth for 1,5 years

The crashes started ~ 3 months ago (no hardware changes at that time). 

 

 

What i have done so far:

  • MemTest86+ 1 pass
  • Changed custom network to "ipvlan"
  • 1 hour Prime95 small FFTs
    • 150W max power draw
    • 68°C max CPU temp
  • Checked and reseated all power connection on the motherboard
  • Reseated the 2 RAM modules
  • After the last crash, i directly connected the server to the UPS (without the PDU)

 

The relevant part of the syslog is attached.

I have an appdata backup script running at 4:30 that stops and starts my Docker containers.

I don't know the exact time, but i think the server crashed between 11:00AM - 13:00PM.

 

Unraid Version is 6.11.1 crashes also happened with 6.10.3

 

Syslog.txt

Edited by Adeon

Solved by Adeon

  • Community Expert
13 hours ago, Adeon said:

what i mean with "Hard crash" is, the server just turns completely off.

That suggests a hardware problem, PSU or board would be the main suspects.

  • Author

What i remembered now is that i disabled "Restart after AC loss" in BIOS, so that is the reason i have to manually restart the server.

There is also nothing useful in the IPMI logs (the only entries are "ACPowerOn(OEM)".)

After the parity check is done, i'll update the BIOS + BMC and do another run with MemTest86+ but this time with a downloaded version, because i learned that the Unraid version is not capable of reporting ECC errors.

 

 

If the server crashes again, i'll try another PSU and reseat the CPU.

 

What is weird to me is the server never crashed under load, only when idling, and the server is idling 90% of the time.

Because of that, i changed the "Normal CPU Scaling Governor:" back to "Performance" (was Power Save) in the Tips & Tricks plugin, just in case there is an issue with c-states or p-states.

 

I'll keep updating this thread and hopefully find a solution.

 

Edit:

Ran Memtest for ~14 hours (4 passes) with 0 errors.

 

Edited by Adeon

  • 4 weeks later...
  • Author

A little Update:

 

The server crashed again today after running smooth for ~4 weeks.

Checked logs every day and there was nothing suspicious at all. 

The server was pretty much idling all the time.

Again, there is absolutely nothing in the syslogs and nothing in the IPMI logs.

 

I just can't imagine that this is caused by a broken PSU.

 

I'm gonna disabled the last couple of Dockers that i installed 3-4 months ago.

If the server crashes again, i'll change the PSU.

Edited by Adeon

1 hour ago, Adeon said:

A little Update:

 

The server crashed again today after running smooth for ~4 weeks.

Checked logs every day and there was nothing suspicious at all. 

The server was pretty much idling all the time.

Again, there is absolutely nothing in the syslogs and nothing in the IPMI logs.

 

I just can't imagine that this is caused by a broken PSU.

 

I'm gonna disabled the last couple of Dockers that i installed 3-4 months ago.

If the server crashes again, i'll change the PSU.

Are you using the iGPU in the 2278G for transcoding via the i915 drivers?  If so, how are you loading the drivers?

 

A thread about issues others have had with the i915 drivers can be found here.  I am not saying that is your problem.  This is my reply in that thread about what I did to solve the serving crashing problems that seemed to be related to i915.

 

I have a 2288G in my server and it started crashing with 6.10.x releases of unRAID.  Nothing at all useful in the syslog. I had the GPU Statistics and Intel-GPU-Top plugins installed.  As a test, I removed them and the crashes (about once a week) stopped.  As a test, I re-installed these plugins with the 6.11.1 release and my unRAID server locked five days later.  Uninstalled the plugins and it has been smooth sailing ever since.

 

I let the i915 drivers load via the touch method.

 

I also uninstalled the CoreFreq plugin.

Edited by Hoopster

  • Author

Thank you for the tip.

 

Yes the iGPU is for transcoding but i pretty much never use it.

I have both GPU TOP & GPU Statistics installed.

The i915.conf already exists in /boot/config/modprobe.d/i915.conf.

 

I removed both plugins for now. Let's see if that helps.

3 hours ago, Adeon said:

Yes the iGPU is for transcoding but i pretty much never use it.

It does not have to be in use to cause server lockups.  In fact, for most people, including me, it always happened when the server was fairly idle.

 

Let's see if removing those plugins works for you as it did for me. 

Edited by Hoopster

Mysterious crashes like this for me have always been the PSU. I had one that would very occasionally trip the breaker before failing more often. Took ages to figure that out because the system was on a UPS that kept it on.

  • 3 weeks later...
  • Author

Well,

 

the server crashed again today.

Again, nothing in the logs at all.

Tomorrow i'm going to change the PSU, i hope that this finally fixes that really annoying issue.

 

  • 4 weeks later...

@Adeon, I have the same issue since 3 month.

 

My setup is completely different beside the PSU. I have a 

Seasonic FOCUS PX as well (with 550 Watt).

Did you already change the PSU and did that help?

 

As MoBo I run a ASRock J5005-ITX with embedded Celeron.

Edited by UnKwicks

  • Author

Yes, i did change the PSU to a Seasonic Foxus GX 550W and Unraid is running fine for 25 Days now.

But that was also the case with the old PSU, so it is too early to tell if the issue is gone.

Ok, lets see. I ordered a new PSU as well. It should arrive today. Hopefully this helps. 

  • 3 weeks later...
  • Author
  • Solution

Update: Since the PSU replacement, the server has been running without a crash for 44 days.

Update-2: Server is still running smooth for 60+ days now, i consider this fixed.

Edited by Adeon

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.