Thursday, 14 August 2014

A Forking Nightmare - 0403-030 The fork function failed. Too many processes already exist

So this one baffled me for several days, until I focused 100% on the problem.

This is what I saw when I ran a script to start a WAS 8.5.5.2 Deployment Manager: -

0403-030 The fork function failed. Too many processes already exist 

on two of AIX LPARs.

I've checked the "obvious", including

lsattr -El sys0 | grep maxuproc

maxuproc 4096 Maximum number of PROCESSES allowed per user True

ulimit -a

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     2097151
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited


When I monitored the number of processes for my user - wasadmin : -

$ ps -ef  | grep -i wasadmin

I noticed that the number quickly ramped up to the ~4,000 mark, before the exception popped up.

Guess what the problem was ?

I'd created a script - startManager.sh - in the wasadmin home directory, as I usually do.

However, for some strange reason (!), I'd created a symbolic link from: -

/opt/ibm/WebSphereProfiles/Dmgr01/bin/startManager.sh

to: -

/home/wasasdmin/startManager.sh

but, in doing so, I'd somehow overwritten the original script.

In other words, I had a "shortcut" script that called itself :-)

Therefore, I run startManager.sh which then spawns 4,096 shells, each to run a copy of the script, before I hit the maxuproc limit, at which point the forking error appears.

Simples :-)

Do I feel embarrassed ? Yes, I do, but one learns through failure :-)

No comments:

Note to self - use kubectl to query images in a pod or deployment

In both cases, we use JSON ... For a deployment, we can do this: - kubectl get deployment foobar --namespace snafu --output jsonpath="{...