Resource temporarily unavailable - locking up system


Recommended Posts

This is now the second time in a few days this has happened and it's causing me to restart the whole array as I can't do much to troubleshoot as almost everything kicks out the following line trying to run:

/bin/sh: fork: retry: Resource temporarily unavailable

I start getting emails consistently from scheduled scripts that are supposed to run that are likely piling up as well. I honestly have no idea where to start identifying what could be happening here. It looks as though the cache is not becoming completely full, all my disks are in healthy order, docker containers "were" running with no issue until this started to build up and no resources left. This time I started to get message about the size of the docker image utilization. Last time I did not, I just woke up to a fairly non-responsive system. Almost any command run attempt results in that error being spit out. 

 

I'm hoping someone can point me to logs I may have or something that can help me outline what's happening if/when this strikes again. The last time it happened around the time of nightly backup and mover jobs. This time it just started maybe 6 hours after that had already completed. 

Link to comment

What if the GUI is not loading? This time it might have worked, but the previous time I was able to load the UI, but nothing was really functioning it seemed. Is there a CLI option to run the diagnostics? It may not run from there either, so just trying to understand my options to attempt. 

Link to comment
7 minutes ago, 1activegeek said:

Is there a CLI option to run the diagnostics?

diagnostics

 

8 minutes ago, 1activegeek said:

It may not run from there either

It will (should) always complete.  However some of the commands utilized may take quite a while (20+ minutes) to run if the system is completed hosed

Link to comment
  • 4 months later...

Ok, it's been awhile, but I've had this strike about 1-2 more times since posting originally. This time though, I happened to be sitting at my computer when the first email came in. So I immediately jumped onto the box and was able to successfully run the diagnostics and SCP off the box before restarting. Should I just post the diagnostics archive here? Or is there possibly sensitive info I should share privately? 

Link to comment
  • 2 weeks later...

How far back did that NFS problem stretch? I've had this issue happen as far back as v6.5 and possibly 6.4. It just always happened in the middle of the night usually when I wasn't around to be able to have the system still be somewhat responsive. This time I was lucky enough to get the diagnostic output. 

Edited by 1activegeek
Link to comment
9 hours ago, 1activegeek said:

I'd like to do that, but I feel for troubleshooting purposes, it's best not to change what I'm currently running with. Obviously updates could bear new fixes, but it could also make it difficult to identify something particular to the setup currently that the LimeTech folks can diagnose. 

Limetech is only going to spend time diagnosing issues with the current release and newer.

Link to comment

What is considered current and new? I'm on 6.6.3. I was just highlighting that this issue has occurred as far back as 6.4 or 6.5. But it most recently happened 11/18 as posted on 6.6.3. Which is 10 days past the update to 6.6.5 I see, but I make it a habit to not immediately update to the newest version. Seems that should qualify as current version as it's simply a minor release behind. 

Edited by 1activegeek
Link to comment

@itimpi - makes sense. I think the problem I have though is that this seems to only happen after the system has been up and running for extended periods. With updates coming out almost monthly, it's hard to let the system run and hit this issue. 

 

I guess it comes down to - who can tell me if they will look into this diagnostic I've reported? Or if I leave the system running on another version and have the same issue, will someone then dig into it? I'd like to save everyone the time/effort of responding to this thread if nobody will do anything about it. If so I'll just move along and get pissed every time it keeps happening. Like I mentioned, this has happened for an extended period of time. I'd imagine any Lime folks should be able to know if any current fixed in the latest minor release "should" resolve my issue. I'd imagine most bug fixes and patches, should have a root cause that is known - which should be discoverable in the diagnostic? 

 

Forgive me if my tone seems rude - this is just frustrating as I've now FINALLY gotten a diagnostic from this happening, and now I'm seeming to hear that it's useless as nobody will look at it. For reference, just notice the original post was 7/14. And that was after I finally had it happen 4-5 time where I decided I need to do something as new version updates were not the solution. I'm almost going on 5 months now since posting, and probably at least 8-10 months since this first happened. 

Link to comment

 

 

Forgive me if my tone seems rude - this is just frustrating as I've now FINALLY gotten a diagnostic from this happening, and now I'm seeming to hear that it's useless as nobody will look at it.

I did look at your diagnostics, but didn't see anything out of the ordinary, hence why suggested you upgraded, if the issue keeps happening it also might be worth booting in safe mode and see it the problem persists, since the problem has been happening for several releases it's not likely an Unraid issue, more likely related to your hardware or some plugin/docker you're running.

 

 

Link to comment
  • 4 weeks later...

The issue is due to running out of available processes - I think. Note that in your 'top' listing you're showing pid 7530 running very hard. When correlated to the ps listing - you can see hundreds if not thousands of 'defunct' processes (but not this one) - all based around some fairly basic commands such as awk, tail, cut, sort & uniq.

I note in other logs you're manually creating a local user - hard to say what impact this might have but could be related to the problem - looks like it's also manually configured to run some sudo commands - again as this is not standard it's hard to speculate if this is related.

I'd also check what you're seeing in /etc/rc.d/rc.diskinfo (and obviously the corresponding file in /boot/config - the usb stick.)

It's almost like a service is failing, and thus keeps forking new processess which also fail.

 

Link to comment

Thanks for the response and info Delarius. I'll keep this in mind as I look into the next time it inevitably happens. I would agree it is all circled around running out of available processes as the errors being kicked out have indicated. It's good to know someone sees something out of normal operation happening. 

 

@Delarius any tips on being able to spot or corner what the process that is causing the issue is when it happens? This way I can hopefully identify what may be causing this to happen?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.