Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[SOLVED] 6.9.2 Random shutdowns could use an assist

Featured Replies

I recently replaced my motherboard and CPU from a i5 to a Ryzen 5700G and ROG Strix X570-E and it ran great for over a week.  In the past few days I have not made it past 24 hours, sometimes as low as 30 minutes.

 

What I've checked:

  • Ran a memtest, and it passed with 4 pases
  • Enabled syslog to disk, but there is no indication that I spot of why it is shutting down.
  • Updated my BIOS, which reset all my settings, so I'm not 100% sure I've got everything the way it should be.
    • Disabled c-state
    • Enabled virtualiztion 
  • Plugged in the additional 4 pin CPU power.  Running a 750W power supply.
  • Temperature seems okay, but I don't know for sure since the sensors don't load.  It finds Nuvoton NCT6798D, but modprobe doesn't load the driver

 

Any help would be greatly appreciated.  Prior to upgrading to Ryzen, the server was running great for over a year.

lucifer-diagnostics-20210916-1327.zip

 

Solved:

 

TLDR: Heat and power.  Replaced case with one with better airflow & replaced 750 W PSU with 1,000W one.

Update: 16 days up, no issues.  Definitely heat and power.

Edited by Goobaroo
solved

  • Author

I think I have it down two issues, heat and a bad SAS breakout cable.  It ran for over a week with the case open and a small desk fan blowing into the case.  I have a new case on order a Phanteks Enthoo 719 to get a bigger case with better airflow.

 

The SAS cable because I've had the same drive location marked bad by Unraid twice now, once with the original drive and again with a brand new drive.  It can't actually be the drive it has to be the cable.

 

Biggest issue I have is that there are no OS level temperature sensors that work to confirm that things are overheating.

Edited by Goobaroo

  • Author

Of course I say that and it rebooted again today.  I'm hoping it is something with the flakey SAS cable at this point, and the replacement coming today will fix the stability.

Heat/Heat caused by CPU load could well be my issue too. Mine actually ran fine for a week, but none of my docker containers were running (different issue, don't ask!). I recreated them all on Friday 

  • Author

I'm going to throttle the scaling_governor to conservative.  I only stayed up 7 hours last night and it was trying to recreate a drive, which is CPU intensive.  That was after replacing the CPU cooler and putting it all in a much larger case with better air flow.

 

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

 

  • Author

4 hours online, and reboot.  So no dice.  This is really making me regret this purchase, not Unraid, but moving to AMD

  • Author

I'm going to replace the power supply and see if that clears it finally.   Heat seems fine now with the new case, there is a Nuctua NH-D9L on the CPU and it is cool to the touch.

 

Power is the next most likely issue.

25 minutes ago, Goobaroo said:

there is a Nuctua NH-D9L on the CPU and it is cool to the touch.

You can't judge a heatsink's performance by the temp of the heatsink, because it's possible the CPU itself is boiling hot but not transferring the heat. You've got to measure the CPU directly, the on die sensors are a good indicator.

  • Author

Thanks @JonathanM, I did hit the base of the CPU with a laser thermometer, but you're right.  Not a very accurate measurement.  Usually on reboot the BIOS has the CPU in the 40-50 C range.

 

It would help if Unraid could detect the thermal sensors like it did on my old setup, but loading the nct6775 driver doesn't return anything. Despite that being the recommendation from sensors-detect.

 

But the uptimes are all over the place.  Week, day, hours.  I'm willing to swap the Power Supply to see if it fixes it.

 

 

  • Author

@JonathanM, your opinion here would be appreciated.  Right before the latest shutdown I got the error "Array has 7 disks with read errors", all my disks.  I was rebuilding a drive at the time, but other searches seem to point to a power issue.  

 

image.png.4323c0220abe7867279f7854456c61fb.png

  • Author

No, the LSI 9201-8i card I have has passive cooling.  But was in use on the previous motherboard and CPU (i5-6500) since November with no issue.

Passive cooling is completely dependent on the airflow patterns in the case. Since you changed motherboards, the airflow patterns have changed. Try temporarily forcing air over the HBA with an extra case fan, or run with the side off with a household fan directed inside. Note, running with the side off WITHOUT an extra fan blowing directly in is asking for trouble, as any case fans will likely just freewheel uselessly as their airflow bypasses everything it was meant to cool.

  • Author

So, seems the end answer was a combination of heat and power.

 

I replaced the case, with a Phanteks Enthoo 719.  Way better air flow, way more fans.

 

Replaced the 750W PSU with a 1000W and the server was able to complete it's rebuild of one of the drives.  There are 8 WD Red HDDs in there.

 

So the combination of all the drives being on, the more powerful CPU probably drawing more power was clearly tripping the PSU.

 

I'm going to mark this as solved.  Thanks for talking through it with me.

  • Goobaroo changed the title to [SOLVED] 6.9.2 Random shutdowns could use an assist

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.