fork: retry: Resource temporarily unavailable


Recommended Posts

I am almost done setting up my first unraid server and have a problem. As the title says, I'm getting 

fork: retry: Resource temporarily unavailable

EVERYWHERE.

 

A little background:

 

High level:

i7-6700K 4Ghz

32GB RAM

13 Drives totaling 42TB usable, 1TB SSD Cache

Bunch of dockers, no running VMs (yet)

 

Details:

Selection_141.png.dc50447e57fb86358c4601e9f08079b8.pngSelection_142.thumb.png.2a49f587d1e4b899fdfb21afb8f41027.png

 

Selection_143.thumb.png.ed330ca0879e7398b3dcb5c8bf55ba91.png

 

So that's the setup. No a little history: I am migrating from a home built storage server that was running on hardware RAID 6. I cobbled together enough old drives on the unraid server to move my data. I setup the new server and everything was looking good. I rsync'd my data from the old server (about 19TB) without issue. Once the rsync was done, I decommissioned the server and pulled the drives for the new server. 

 

Here is where the problems started. While preclearing the new drives I started seeing 'fork: retry: Resource temporarily unavailable' when trying to tab-complete from the terminal. Then I started seeing issues with my dockers. Unifi-video especially would be stopped every night. I'd look at the log for the docker and see

Selection_144.png.88b009bf3273227cff53d995e57f073a.png

 

I'm getting a ton of emails from cron with the subject 'cron for user root /etc/rc.d/rc.diskinfo --daemon &> /dev/null' with the same fork error.

 

I got through the preclear and add the new drives, but then I had more data to sync back. I added a 10TB drive and mounted it with unassigned drives so I can directly rsync the data. The fork error got WAY out of control to the point no docker containers are working. I did some googling and found the error means I'm hitting a resource limit (obvious) and to look at `ulimit -a` to see what my limits are:

root@Storage:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127826
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 40960
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 127826
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I suspected open files was the issue and doubled (then doubled again) the number. I also increased max user processes and tried to increase pipe size, but it wouldn't let me. I though this helped, but now I'm have the same issues and I'm not sure if it helped, or I just was looking at a time when the issue temporarily stopped...

 

The problem seems to be tied to high disk I/O. I've seen the issue now during preclear, disk to disk sync, and during a parity rebuild (I rearranged my drives after installing the new ones).

 

It COULD be that this will all go away when I'm done the massive data moves, but I want to know why it's happening and fix it now.

 

Any help is appreciated.

Link to comment

Your syslog also has a lot of wrong csrf tokens. This is caused by having another browser still open to your server somewhere after a reboot.

 

Your docker image is many times larger than necessary. If you have things configured correctly it is unlikely you will ever need more than 20G. If you are filling it up then making it larger will only make it take longer to fill. You currently are using less than 20 and if that is increasing it typically means you have something misconfigured.

 

As for your problem, I don't see anything obvious. And this complaint isn't at all common so I don't know what you are doing different than other people who don't get this. Possibly these symptoms don't have a common cause, but you have multiple causes. Maybe you could try eliminating some things to try to narrow it down.

 

Each of your dockers has its own support thread that you can easily access by clicking on its icon and selecting Support.

Link to comment

Thanks Constructor - I definitely have unraid open on several laptops, so I'm sure that explains the csrf errors. As for the docker size, I knew I would have a bunch of containers and didn't want to run out of space. Besides, I have a 1TB SSD, I figured I had the room.

 

As for the actual problem. I had a feeling this was not a common problem as searching turned up nothing unraid related. I think Unifi-Video is pushing me over whatever limit I am hitting. I stopped the container and haven't seen any errors for a few hours. The app is the NVR for my 8 security cameras, so there is a lot of disk i/o there. Still, this is nothing different from what I had running on my previous (Ubuntu) server, and I checked the ulimits there and they were actually much lower in a lot of cases.

 

My only thought here is it has something to do with how the unraid fuse file system works. I know as data is written to a drive, it also has to read from all drives to calculate parity. Maybe the strain of 8 video streams plus my disk to disk rsync is just too much? I did not have any problems when I copied the bulk of my data from the old server (over gigabit ethernet).

 

My concern here is that if I schedule a regular parity check I have to knock my security system offline for the 24+ hours it takes to finish. I am still on the trial license as I wanted to give the system time to show me any potential issues, and this is a pretty big one for me. I'm hoping I'm not going to be given the run around on this - I could easily open the unifi-video support thread, but we both know the maintainer just packages Ubiquiti's video app in a docker, so he's not going to know much about that proprietary application. Ubiquity is a large company (that honestly is focusing on their new hardware NVR) and they are just going to blame the OS since it works on Ubuntu without hitting a resource limit - and they'd be right, they can't support every OS under the sun. Hitting a resource limit is an OS restriction placed on user space applications. Being Linux, my go to was to look at ulimit, but that didn't seem to help.

 

Is there anything unraid specific I can look at to try and resolve this?

Link to comment
46 minutes ago, timekiller said:

My only thought here is it has something to do with how the unraid fuse file system works. I know as data is written to a drive, it also has to read from all drives to calculate parity.

There are actually 2 different methods you can choose for calculating parity. The one you mention isn't actually the default method so unless you changed the default it isn't reading all drives. See here:

https://forums.unraid.net/topic/50397-turbo-write/

 

Fuse and parity don't have anything to do with each other, since parity just treats the complete disk as a bunch of bits and has nothing at all to do with files.

 

I always recommend taking things a bit at a time, getting one thing working well before adding anything else. Sounds like what you are doing now will not be a typical load. Get the initial data load out of the way before trying to get your applications going. Then add the other stuff a little at a time.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.