My unRAID server keeps dying

storagehound · March 27, 2023

Hello, unRAIDers

Not sure if I am posting in the right place. I could use your help with what I should do next. Diagnostics are attached. Here's what I did so far.

SPEC:

unRAID version 6.11.5 server.

Hardware specs:

Model: AVS-10/4-X-6
M/B: Supermicro - X10SL7-F
CPU: Intel® Xeon® CPU E3-1230 v3 @ 3.30GHz

The Server just powered off on it's own Friday. After several tries I got it to reboot. I saw each CPU sit on red almost the entire time I was in it.
firefox_RvgUsITzwN.jpg.6da3a37d75182a7c7389a3a1be2cee96.jpg
It stayed up for a maybe an hour and then immediately shut down again and refused to come back up. I saw the LE6 light was red on the mother board... which indicated the Power Supply need to be replaced (according to the motherboard manual https://www.supermicro.com/en/products/motherboard/X10SL7-F).

So Sunday I got a Seasonic 750 PSU to replace my Seasonic 650. (why not upgrade?). I replaced it and the server started. I noticed the crazy utilization again. Not sure that this wasn't contributing to the issue I began doing some things on the unRAID forum. I installed "Glance" from the app store and saw CPU utilization in the red going from the 99% and higher. shfs, Find, OpenVPN and a few Docker's were popping up. But "shfs" was the worst/consistent offender. I followed some a good idea from a thread of @mgutt and checked my Docker paths. I updated anything I missed from "mnt/user/appdata to "mnt/cache/appdata." I also saw other suggestions and revisited the settings in the Tips & Tweaks plugins. I also updated some settings in Dynamix Cache Directories. I turned off Plex's post credit scanning. Now every CPU was not red the majority of the time. There were still spikes... and times they would go orange...but not the consistent red 90+ to 100% utilization across every core. "find" will spike in Glance more than "shfs" but not as often or as long as before. I turned the shut the system down gracefully a few times and it came back up easily (unlke with the 650 PSU)

I thought I was good...but then the system powered off again. I am at a loss. It did come back up again. Things were red with the initial boot but eventually settled down. It was up for over 2 hours. I still didn't rust it. I wanted to eliminate other things before I considered replacing the motherboard.
I checked for swollen capacitors (?). I double checked the cables. unRAIDs tools didn't indicate any hardware issues. OS logs seem reasonable to me. I Docker's/plugins were up to date. *Pause* Then it shut down again. 😵 I also noticed that if I bring it down gracefully the red light is still there. I turn the server on the red light (LE6) goes green. I'm a bit discouraged because I thought I figured everything out.

Attached are my diagnostics (again, just to eliminate an OS component to this).

Thank you! 🤞

......

tower1-diagnostics-20230326-2114.zip

Edited March 27, 2023 by storagehound
Cleaning and adding another screenshot.

mgutt · March 27, 2023

3 hours ago, storagehound said:

I thought I was good...but then the system powered off again.

As you already replaced the power supply: Did you replace the cables, too? Then I would say your motherboard is dead.

Or do you maybe have a problem with temps?

3 hours ago, storagehound said:

Tips & Tweaks plugins. I also updated some settings in Dynamix Cache Directories

Simply reboot in safe mode, so Plugins are all disabled. And press "c" while top is running to see the full command.

storagehound · March 27, 2023

6 hours ago, mgutt said:

As you already replaced the power supply: Did you replace the cables, too? Then I would say your motherboard is dead.

Or do you maybe have a problem with temps?

Simply reboot in safe mode, so Plugins are all disabled. And press "c" while top is running to see the full command.

Thank you for the reply, mgutt. I did not replace the cables yet. The temperatures look good. I put the server in safe mode before bed. It's been up for over 8 hours now. I plan to leave it in Safe Mode a couple of days to see if it stays up. Is "top" "htop?"

mgutt · March 27, 2023

13 minutes ago, storagehound said:

Is "top" "htop?"

No. Different commands, but similar output.

storagehound · April 2, 2023

UPDATE 04/02:
I ran the server in safe mode for about 3 days. Then I decided to start the array. My dockers have been running over 2 days without the system shutting down. The Plugin service is not running at all. So far I have not seen all cpus go red for seconds to minutes. Using Glance... I still see some utilization that goes in the high 90% to well over 100% utilization....but it tends to settle down quickly. I noticed some errors doing a short smart test on my parity drive and one other. From what I'm reading they are minor...but I am still going to look into replacing the drives. Later this week I will try turning plugins back on and see if it will trigger the unclean shutdown.

My unRAID server keeps dying

Recommended Posts

storagehound

Link to comment

mgutt

Link to comment

storagehound

Link to comment

mgutt

Link to comment

storagehound

Link to comment

Join the conversation