My server is crashing like every other day, not sure why at this point

Followers

June 20, 20242 yr

So, I moved to another city recently. I packed my server in my own car, drove just 1 hour to the new place, and set it up.

I started running a parity check. The server crashed before the parity check was finished. I power it up again and start another parity check. Again the same thing happens. It may have found 1 error I think, but I chalk that up to the fact that it crashed. And when I say crash, I mean that I cannot access the web GUI, I cannot ssh into the server, and most docker applications are nonresponsive. However, a few of them actually do respond so I know that the OS isn't actually fully frozen, just most of the processes are failing.

So I start up again, and run a parity check. It locked up but this time I figured that if I just let it sit there maybe it is still doing the parity check and will finish. So I wait many hours, then power cycle, and I was right, it finished the parity check without errors according to the log. I have also ran a BTRFS scrub on all the disks and no errors found.

But now I restart yet again, and again the server becomes unresponsive after some time. This is really irritating.

Along the way, I did see an error in the log complaining of an unregistered flash drive. I figured ok, the flash is failing. I followed the restore/replace license key process and figured I'd be ok. But it's not ok, it is still crashing. So the flash drive is not the problem.

I'm not sure what to do next. Do I just have some bad hardware? All the disks seem fine.

singularity-nas-diagnostics-20240619-2218.zip

Quote

Solved by JorgeB

July 2, 20242 yr

Go to solution

June 20, 20242 yr

Have you unplugged and reseated all connections? Have you checked all the fans to make sure none are obstructed by cables that moved in transit? Is the CPU heatsink still seated properly?

Quote

June 20, 20242 yr

Author

Ok, I've just unplugged and reseated everything inside the machine... except for the CPU but it looked perfect and is tightly mounted, so I am sure it's ok.

Everything looked perfect, but I'll see if it crashes again within the next 48 hours.

Quote

June 21, 20242 yr

Community Expert

Enable the syslog server and if it does crash again, post that.

Quote

June 21, 20242 yr

Author

It hasn't crashed again quite yet.

But I did already have a syslog server running all along, I guess I forgot that it might not be included in the diagnostics. Attaching here...

syslog.zip

Quote

June 21, 20242 yr

Community Expert

Nothing jumps out, post a new one if it happens again.

Quote

June 21, 20242 yr

Author

Just happened again. Hooray.

syslog.zip

Quote

June 21, 20242 yr

Community Expert

Unfortunately there's nothing relevant logged, this can be a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Quote

2 weeks later...

July 2, 20242 yr

Author

Ok, I think that I have it figured out.....

It ran perfectly fine for over a week in safe mode with docker and VM's disabled. Then I rebooted not in safe mode, but still with docker and VM's disabled. Still was perfectly fine for days. Then I turn on docker and it crashes within 24 hours. Ok... so there's a problem somewhere involving docker.

I don't see much in the logs once again. But I did look through Netdata to find any interesting graphs. The one the jumps out at me is number of processes. I see a steadily increasing number of processes. More interestingly still, they're zombie processes.

Ok, so I look for zombie processes on the terminal, and I see an ever increasing number of defunct ffmpeg processes. And the parent process? A Shinobi script.

That's what's happening, I think. I moved into my new home weeks ago, and I set everything up... EXCEPT for my security cameras. They're still not hooked up. Shinobi is trying to get a camera feed and can't, and it seems to be causing an endless loop of defunct ffmpeg processes, for whatever reason. Maybe I should report this as a Shinobi defect, as well.

Still I need to wait and see if I have the stability that I'm expecting on the server because I literally only just found this.... but it seems very likely I got it!

Really appreciate the help in steering me to the right direction.

Quote

July 2, 20242 yr

Community Expert
Solution

19 minutes ago, singularity098 said:

and it seems to be causing an endless loop of defunct ffmpeg processes, for whatever reason.

See if this helps for that:

https://forums.unraid.net/bug-reports/stable-releases/61210-cannot-fork-resource-temporarily-unavailable-r3020/?do=findComment&comment=28505

And please report back, LT is considering adding that setting as default.

Quote

July 2, 20242 yr

Author

Awesome, I will definitely give that a try... but first I am going to let the server run with Shinobi completely off just for confirmation of sanity. Will report back... thanks!

Quote

July 7, 20242 yr

Author

Confirmed fixed. The system no longer dies as long as the Shinobi docker image is left in a down state, or if I run it with the --pids-limit 1000 parameter.

What a relief. Thanks for the great troubleshooting help!

Quote

July 7, 20242 yr

Community Expert

Thanks for confirming, glad it's resolved.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

My server is crashing like every other day, not sure why at this point

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)