singularity098 Posted June 20 Share Posted June 20 So, I moved to another city recently. I packed my server in my own car, drove just 1 hour to the new place, and set it up. I started running a parity check. The server crashed before the parity check was finished. I power it up again and start another parity check. Again the same thing happens. It may have found 1 error I think, but I chalk that up to the fact that it crashed. And when I say crash, I mean that I cannot access the web GUI, I cannot ssh into the server, and most docker applications are nonresponsive. However, a few of them actually do respond so I know that the OS isn't actually fully frozen, just most of the processes are failing. So I start up again, and run a parity check. It locked up but this time I figured that if I just let it sit there maybe it is still doing the parity check and will finish. So I wait many hours, then power cycle, and I was right, it finished the parity check without errors according to the log. I have also ran a BTRFS scrub on all the disks and no errors found. But now I restart yet again, and again the server becomes unresponsive after some time. This is really irritating. Along the way, I did see an error in the log complaining of an unregistered flash drive. I figured ok, the flash is failing. I followed the restore/replace license key process and figured I'd be ok. But it's not ok, it is still crashing. So the flash drive is not the problem. I'm not sure what to do next. Do I just have some bad hardware? All the disks seem fine. singularity-nas-diagnostics-20240619-2218.zip Quote Link to comment
JonathanM Posted June 20 Share Posted June 20 Have you unplugged and reseated all connections? Have you checked all the fans to make sure none are obstructed by cables that moved in transit? Is the CPU heatsink still seated properly? Quote Link to comment
singularity098 Posted June 20 Author Share Posted June 20 Ok, I've just unplugged and reseated everything inside the machine... except for the CPU but it looked perfect and is tightly mounted, so I am sure it's ok. Everything looked perfect, but I'll see if it crashes again within the next 48 hours. Quote Link to comment
JorgeB Posted June 21 Share Posted June 21 Enable the syslog server and if it does crash again, post that. Quote Link to comment
singularity098 Posted June 21 Author Share Posted June 21 It hasn't crashed again quite yet. But I did already have a syslog server running all along, I guess I forgot that it might not be included in the diagnostics. Attaching here... syslog.zip Quote Link to comment
JorgeB Posted June 21 Share Posted June 21 Nothing jumps out, post a new one if it happens again. Quote Link to comment
singularity098 Posted June 21 Author Share Posted June 21 Just happened again. Hooray. syslog.zip Quote Link to comment
JorgeB Posted June 21 Share Posted June 21 Unfortunately there's nothing relevant logged, this can be a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
singularity098 Posted July 2 Author Share Posted July 2 Ok, I think that I have it figured out..... It ran perfectly fine for over a week in safe mode with docker and VM's disabled. Then I rebooted not in safe mode, but still with docker and VM's disabled. Still was perfectly fine for days. Then I turn on docker and it crashes within 24 hours. Ok... so there's a problem somewhere involving docker. I don't see much in the logs once again. But I did look through Netdata to find any interesting graphs. The one the jumps out at me is number of processes. I see a steadily increasing number of processes. More interestingly still, they're zombie processes. Ok, so I look for zombie processes on the terminal, and I see an ever increasing number of defunct ffmpeg processes. And the parent process? A Shinobi script. That's what's happening, I think. I moved into my new home weeks ago, and I set everything up... EXCEPT for my security cameras. They're still not hooked up. Shinobi is trying to get a camera feed and can't, and it seems to be causing an endless loop of defunct ffmpeg processes, for whatever reason. Maybe I should report this as a Shinobi defect, as well. Still I need to wait and see if I have the stability that I'm expecting on the server because I literally only just found this.... but it seems very likely I got it! Really appreciate the help in steering me to the right direction. Quote Link to comment
Solution JorgeB Posted July 2 Solution Share Posted July 2 19 minutes ago, singularity098 said: and it seems to be causing an endless loop of defunct ffmpeg processes, for whatever reason. See if this helps for that: https://forums.unraid.net/bug-reports/stable-releases/61210-cannot-fork-resource-temporarily-unavailable-r3020/?do=findComment&comment=28505 And please report back, LT is considering adding that setting as default. Quote Link to comment
singularity098 Posted July 2 Author Share Posted July 2 Awesome, I will definitely give that a try... but first I am going to let the server run with Shinobi completely off just for confirmation of sanity. Will report back... thanks! 1 Quote Link to comment
singularity098 Posted July 7 Author Share Posted July 7 Confirmed fixed. The system no longer dies as long as the Shinobi docker image is left in a down state, or if I run it with the --pids-limit 1000 parameter. What a relief. Thanks for the great troubleshooting help! 1 Quote Link to comment
JorgeB Posted July 7 Share Posted July 7 Thanks for confirming, glad it's resolved. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.