My unRAID server keeps dying


Recommended Posts

Hello, unRAIDers

 

Not sure if I am posting in the right place. I could use your help with what I should do next.  Diagnostics are attached.  Here's what I did so far.

 

SPEC:

unRAID version 6.11.5 server.  

Hardware specs:

Model: AVS-10/4-X-6
M/B: Supermicro - X10SL7-F
CPU: Intel® Xeon® CPU E3-1230 v3 @ 3.30GHz

 

The Server just powered off on it's own Friday.  After several tries I got it to reboot.  I saw each CPU sit  on red almost the entire time I was in it. 
firefox_RvgUsITzwN.jpg.6da3a37d75182a7c7389a3a1be2cee96.jpg
It stayed up for a maybe an hour and then immediately shut down again and refused to come back up.  I saw the LE6 light was red on  the mother board... which indicated the Power Supply need to be replaced (according to the motherboard manual https://www.supermicro.com/en/products/motherboard/X10SL7-F).
PXL_20230327_003203727.thumb.jpg.2e067d92904d6238f0d6af8aecd78dca.jpg

 

     So Sunday I got a Seasonic 750 PSU  to replace my Seasonic 650. (why not upgrade?).  I replaced it and the server started.  I noticed the crazy utilization again.  Not sure that this wasn't contributing to the issue I began doing some things on the unRAID forum.  I installed "Glance" from the app store and saw CPU utilization in the red going from the 99% and higher.  shfs, Find, OpenVPN and a few Docker's were popping up.  But "shfs" was the worst/consistent offender.  I followed some a good idea from a thread of @mgutt and checked my Docker paths.  I updated anything  I missed from "mnt/user/appdata to  "mnt/cache/appdata."   I also saw other suggestions and revisited the settings in the Tips & Tweaks plugins.  I also updated some settings in Dynamix Cache Directories.  I turned off Plex's post credit scanning.  Now every CPU was not red the majority of the time.  There were still spikes... and times they would go orange...but not the consistent red 90+ to 100% utilization across every core.  "find" will spike in Glance more than "shfs" but not as often or as long as before.   I turned the shut the system down gracefully a few times and it came back up easily (unlke with the 650 PSU)

firefox_LSodVR4cn8.thumb.jpg.1738430c7561ffd763ca635872fcbe02.jpg


   I thought I was good...but then the system powered off again.  I am at a loss.  It did come back up again.  Things were red with the initial boot but eventually settled down.  It was up for over 2 hours.  I still didn't rust it.  I wanted to eliminate other things before I considered replacing the motherboard.
  I checked for swollen capacitors (?).  I double checked the cables.  unRAIDs tools didn't indicate any hardware issues. OS logs seem reasonable to me.  I Docker's/plugins were up to date.  *Pause*  Then it shut down again. 😵  I also noticed that if I bring it down gracefully the red light is still there.  I turn the server on the red light (LE6) goes green.  I'm a bit discouraged because I thought I figured everything out.

PXL_20230327_014740873.thumb.jpg.ebebbff3758ff7df51fcf4f9b052766b.jpg

Attached are my diagnostics (again, just to eliminate an OS component to this). 

 

Thank you!  🤞

......

 

tower1-diagnostics-20230326-2114.zip

Edited by storagehound
Cleaning and adding another screenshot.
Link to comment
3 hours ago, storagehound said:

I thought I was good...but then the system powered off again. 

As you already replaced the power supply: Did you replace the cables, too? Then I would say your motherboard is dead.

 

Or do you maybe have a problem with temps?

 

3 hours ago, storagehound said:

Tips & Tweaks plugins.  I also updated some settings in Dynamix Cache Directories

Simply reboot in safe mode, so Plugins are all disabled. And press "c" while top is running to see the full command.

  • Like 1
Link to comment
6 hours ago, mgutt said:

As you already replaced the power supply: Did you replace the cables, too? Then I would say your motherboard is dead.

 

Or do you maybe have a problem with temps?

 

Simply reboot in safe mode, so Plugins are all disabled. And press "c" while top is running to see the full command.

 

Thank you for the reply, mgutt.  I did not replace the cables yet.  The temperatures look good.  I put the server in safe mode before bed.  It's been up for  over 8 hours now.  I plan to leave it in Safe Mode  a couple of days to see if it stays up.   Is "top" "htop?"

Link to comment

UPDATE 04/02:
I ran the server in safe mode for about 3 days.  Then I decided to start the array.  My dockers have been running over 2 days without the system shutting down.  The Plugin service is not running at all.  So far I have not seen all cpus go red for seconds to minutes.  Using Glance... I still see some utilization that goes in the high 90% to well over 100% utilization....but it tends to settle down quickly.  I noticed some errors doing a short smart test on my parity drive and one other.  From what I'm reading they are minor...but I am still going to look into replacing the drives.  Later this week I will try turning plugins back on and see if it will trigger the unclean shutdown.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.