Jump to content

[Resolved] UnRaid Server randomly power cycling


Go to solution Solved by nexusjosh,

Recommended Posts

3 hours ago, JorgeB said:

You can add

setterm -blank 0

to /boot/config/go

Thanks added.

Capture.JPG.724037f5e0e12ff8d7db88e0a7b336e8.JPG

 

Update.  I disabled bifurcation, and it still crashed.  I removed the ASUS Hyper, and replaced it with the original single m.2 card.  Perhaps the Hyper is causing some sort of issue?  We shall see.  Sigh.  If this DOES fix the issue.  I'm going to email SuperO, and see if they have any ideas as to why it doesn't properly work, when one would think it should.

Link to comment

@JorgeB So I just power cycled my server with the option you gave me.  And Its hanging during boot.  with the attached USB error.  I'm going to power cycle the server again, but... Could this entire problem be the bootable USB is failing? @.@  Its a supposedly good Samsung Fit 32GB small form factor flash drive.  I'm going to reboot the server again and see if it comes up.  Though I might as well run memtest for giggles.

Cap.jpg

 

Cap2.jpg

Edited by nexusjosh
Link to comment
10 minutes ago, JorgeB said:

Unlikely that USB related issues would power off the server.

Well, I hope that it is!  I've had my server for about 5 years, and haven't had any major issues until now.  I've now restored the hardware back to the state it was when it was last stable.  I'm unsure what else I can do, other than hope it WAS the USB.  If it isn't... Maybe the Motherboard is going bad, or the CPU.  Sigh.

 

Its been an hour and it hasn't crashed.  Lately its been crashing just about every hour.  If less.  I'm screen recording the IPMI again, so we'll see what happens.

Link to comment

So I'm currently running Memtest, but its going to take a couple of days given my large ammount of Ram.  I'm at a loss as to what it could be at this point, unless its, as I've said in a prior post, a CPU, or Motherboard issue, which would be a big shame.  I'd be having to build a new server, which wouldn't be the End of the world, but meh.  Sigh.

 

Does anyone know if there are any stress test apps by chance on Community Apps, perhaps one to stress individual components, to MAYBE determine the issue.  I once had access to Ultra-X when I used to be a Service Manager at Frys Electronics, and their kit was unique, I've looked on and off, and haven't found an alternative for it, which is a big shame.

Memtest.jpg

Link to comment
4 minutes ago, JorgeB said:

Fist thing I would try would be a different PSU.

A fair idea, though I JUST replaced the PSU about 6 months ago with the overkill EVGA Supernova 1000 T2, 80+ Titanium.  I'll plug it into my PSU tester once the memtest is complete... Perhaps the power cabled need re-seating.  Ugh.  I reseated the PCIE cards about 50 times, but didn't think to re-seat the power cables.

Edited by nexusjosh
Link to comment
4 minutes ago, nexusjosh said:

RamTest

Were you running the Memtest from Unraid's boot menu?  If so, it will not find any ECC issues (and most servers with ECC will completely halt the system if it runs into an uncorrectable ECC error).  Setup a new boot stick and download from https://www.memtest86.com/

 

On 3/4/2022 at 3:11 AM, nexusjosh said:

USB

These problems won't affect the "stability" of the server.  However based on your screenshots your flash is dropping offline and reconnecting.  Try a different port.  If it continues, then replace the flash drive and transfer your registration

Link to comment
4 hours ago, Squid said:

Were you running the Memtest from Unraid's boot menu?  If so, it will not find any ECC issues (and most servers with ECC will completely halt the system if it runs into an uncorrectable ECC error).  Setup a new boot stick and download from https://www.memtest86.com/

 

These problems won't affect the "stability" of the server.  However based on your screenshots your flash is dropping offline and reconnecting.  Try a different port.  If it continues, then replace the flash drive and transfer your registration

Thanks for your feedback!  I did use a Memtest USB I had, not the one Unraid is coupled with.  The one coupled with Unraid actually wouldn't boot for whatever reason.

 

I tested the Flash Drive, and it all tested out fine.  I re-connected/seated the power cables after testing the PSU, and the server has been purring along for nearly 5 hours now.  Soo... Maybe the power needed to be re-seated?! X.x  Ugh.  I don't want to yet get my hopes up, but, maybe that was the fix.  Even though the issues started after I added in the new ASUS card.  Maybe the ASUS card will now function fine.

Edited by nexusjosh
Link to comment

So, I was hoping the issue was resolved.  It was not.  After about 8 hours of up time it randomly power cycled once more, and I still have no idea why  I did catch it on a recording this time.  No errors or anything, just... Boop.  Hard Power cycle.

 

In the morning, I suppose I'll perhaps re-apply thermal paste?  Any other ideas on possible solutions?  I might change out the PSU, but I don't have a spare one on hand.

Link to comment
Just now, ChatNoir said:

Or if you rarely have any power grid issues, try without the UPS for a few days.

Sadly I do have power issues.  Though if it powers off once again on this new UPS, I'll try that, and hope that a brown-out, or outage, doesn't murder my server.  I'll just move the plug from the battery side, to the surge protection side of the UPS lol.

Link to comment

Maybe setup another USB key with a different distro, like Slackware (since that is what unraid is based on) and see what it does for a few hours/days. Use a live install versus an Installer install to the USB drive.

 

If it works then I'm not sure what that means. Could be tied to one of the kernel parameters that Lime it using to compile this kernel with that is incompatible with your hardware. Getting ahead of things though with this thinking...

Link to comment
6 hours ago, klepel said:

Maybe setup another USB key with a different distro, like Slackware (since that is what unraid is based on) and see what it does for a few hours/days. Use a live install versus an Installer install to the USB drive.

 

If it works then I'm not sure what that means. Could be tied to one of the kernel parameters that Lime it using to compile this kernel with that is incompatible with your hardware. Getting ahead of things though with this thinking...

So, when I was running in safe mode a few days ago, it wan fine for 2 days with no crashes.  But then begins to crash when its not in safe mode. So you may be on to something there.  Its just so disheartening that there are no error messages or -anything- just BAM, Hard Reset, with zero clues as to the issue whatsoever.

 

I've opened a ticket with SuperO, hoping they will be able to help.

Link to comment

SuperO, doesn't know.  But the more I think about it, the more I'm sure there is something going on with UnRaid, because when I have booted it into safe mode, it ran fine, with no issues!!!  I'm going to try removing all of my plugins, unmounting my VM, and deleting the docker container I have.

 

Update:  I've removed all of the plugins, we shall see if it hard resets again in an hour or so.

Edited by nexusjosh
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...