Thursday, 14 August 2014

A Forking Nightmare - 0403-030 The fork function failed. Too many processes already exist

So this one baffled me for several days, until I focused 100% on the problem.

This is what I saw when I ran a script to start a WAS 8.5.5.2 Deployment Manager: -

0403-030 The fork function failed. Too many processes already exist 

on two of AIX LPARs.

I've checked the "obvious", including

lsattr -El sys0 | grep maxuproc

maxuproc 4096 Maximum number of PROCESSES allowed per user True

ulimit -a

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     2097151
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited


When I monitored the number of processes for my user - wasadmin : -

$ ps -ef  | grep -i wasadmin

I noticed that the number quickly ramped up to the ~4,000 mark, before the exception popped up.

Guess what the problem was ?

I'd created a script - startManager.sh - in the wasadmin home directory, as I usually do.

However, for some strange reason (!), I'd created a symbolic link from: -

/opt/ibm/WebSphereProfiles/Dmgr01/bin/startManager.sh

to: -

/home/wasasdmin/startManager.sh

but, in doing so, I'd somehow overwritten the original script.

In other words, I had a "shortcut" script that called itself :-)

Therefore, I run startManager.sh which then spawns 4,096 shells, each to run a copy of the script, before I hit the maxuproc limit, at which point the forking error appears.

Simples :-)

Do I feel embarrassed ? Yes, I do, but one learns through failure :-)

No comments:

Yay, VMware Fusion and macOS Big Sur - no longer "NAT good friends" - forgive the double negative and the terrible pun ...

After macOS 11 Big Sur was released in 2020, VMware updated their Fusion product to v12 and, sadly, managed to break Network Address Trans...