Jump to content

Server Unresponsive


Recommended Posts

Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
11 hours ago, JorgeB said:

Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

I'm starting to believe you are correct.  I initially ran this box with a Supermicro x9scl mobo and cpu and it ran perfect.  It just couldn't handle transcoding and the mobo didn't have a slot for a gpu card.  I upgraded the mobo to the Supermicro x11ssh-f and the cpu to the xeon e3-1285v6.  Ram went from 32 to 64.  All drives, psu, cooling, case stayed thee same.  

I started having failures.  I changed the ram and still same issue.  I changed the mobo and same issue.  I changed the psu from 600 80+ white to 850 80+ gold.  I added liquid cooling.  I added a Tesla P4.  Nothing has eliminated the problem.  The only thing left is the CPU.  I am waiting for a xeon e3-1270v6 to arrive in a few days.  I'll swap it out and see if that helps.  If not then I'm at a total loss as to what could be causing it!  Could it be bios related?  I have BMC connected, but don't have the password, so I will need to reset it via the jumper? Then I can review it on a remote pc.  I have link aggregation connected from the mobo to my ASUS GT-AC5300 router.  I literally have no idea what else to try?!

I have the syslog going to root on the flash drive, but nothing seems to stand out to you or others...

Do you think it could just be a bad CPU?!

Link to comment
12 hours ago, GatorMB said:

 I have link aggregation connected from the mobo to my ASUS GT-AC5300 router.

Have you tried with just one cable? Are you connected to the correct ports? Apparently only 2 specific ports on that router support aggregation.

 

Doubt it would cause what you are seeing, just grasping at straws.

Link to comment
  • 2 weeks later...

Good morning!  Here is the update...

Server is 100% stable.  It has not crashed at all since the cpu was swapped out.  However, I did find another issue.  The Nvidia Tesla P4 GPU gets super hot when transcoding 3 files in tdarr.  I have disabled the tdarr container and it runs stable.   I have ordered a fan for the GPU to help cool it.  More to follow on the success of that!  Again, huge thanks to JorgeB & JonathanM for all your help!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...