Jump to content

My server is crashing like every other day, not sure why at this point


Go to solution Solved by JorgeB,

Recommended Posts

So, I moved to another city recently.  I packed my server in my own car, drove just 1 hour to the new place, and set it up.

 

I started running a parity check.  The server crashed before the parity check was finished.  I power it up again and start another parity check.  Again the same thing happens.  It may have found 1 error I think, but I chalk that up to the fact that it crashed.  And when I say crash, I mean that I cannot access the web GUI, I cannot ssh into the server, and most docker applications are nonresponsive.  However, a few of them actually do respond so I know that the OS isn't actually fully frozen, just most of the processes are failing.

 

So I start up again, and run a parity check.  It locked up but this time I figured that if I just let it sit there maybe it is still doing the parity check and will finish.  So I wait many hours, then power cycle, and I was right, it finished the parity check without errors according to the log.   I have also ran a BTRFS scrub on all the disks and no errors found.

 

But now I restart yet again, and again the server becomes unresponsive after some time.  This is really irritating.

 

Along the way, I did see an error in the log complaining of an unregistered flash drive.  I figured ok, the flash is failing.  I followed the restore/replace license key process and figured I'd be ok.  But it's not ok, it is still crashing.  So the flash drive is not the problem.

 

I'm not sure what to do next.  Do I just have some bad hardware?  All the disks seem fine.

singularity-nas-diagnostics-20240619-2218.zip

Link to comment

Unfortunately there's nothing relevant logged, this can be a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
  • 2 weeks later...

Ok, I think that I have it figured out.....

 

It ran perfectly fine for over a week in safe mode with docker and VM's disabled.  Then I rebooted not in safe mode, but still with docker and VM's disabled.  Still was perfectly fine for days.  Then I turn on docker and it crashes within 24 hours.  Ok... so there's a problem somewhere involving docker.

 

I don't see much in the logs once again.  But I did look through Netdata to find any interesting graphs.  The one the jumps out at me is number of processes.  I see a steadily increasing number of processes.  More interestingly still, they're zombie processes.

 

Ok, so I look for zombie processes on the terminal, and I see an ever increasing number of defunct ffmpeg processes.  And the parent process?  A Shinobi script.

 

That's what's happening, I think.  I moved into my new home weeks ago, and I set everything up... EXCEPT for my security cameras.  They're still not hooked up.  Shinobi is trying to get a camera feed and can't, and it seems to be causing an endless loop of defunct ffmpeg processes, for whatever reason.  Maybe I should report this as a Shinobi defect, as well.

 

Still I need to wait and see if I have the stability that I'm expecting on the server because I literally only just found this.... but it seems very likely I got it!

 

Really appreciate the help in steering me to the right direction. 

SystemActiveProcesses.png

SystemProcessesState.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...