Monday, 10 May 2010
More tales from the front-line - Portal clustering
Whilst endeavouring to add the first node into my WebSphere Portal 6.1.5 cluster, I kept hitting a problem whereby the script: -
was failing with a java.lang.NullPointerException against the action fixup-after-security-change-admin in
After much trial and even more error, I realised that, although the portal node was able to resolve the Deployment Manager node by it's fully qualified hostname, the same was not true in reverse.
I proved this by TELNETing from the portal node to the DM node: -
telnet dm.uk.ibm.com 8879
which worked - at least, I could see that TELNET was responding even if I didn't, as one might expect, get a valid TELNET session.
When I did the reverse: -
telnet portal.uk.ibm.com 8878
( where 8878 is the SOAP port of the node agent )
I got Connection refused
As far as I can establish, the node being added into the cell passes over it's hostname, which the DM node then tries and, in my case, fails to resolve CORRECTLY.
The trick, in my case, was to ensure that the correct fully qualified hostname of the portal node was available to the DM node, by way of the /etc/hosts file.
Once I was able to TELNET both ways using the FQ hostname, I was good to go.
On a related note, I hit a similar problem whereby: -
was failing with: -
Exception: java.net.NoRouteToHostException: No route to host
on the second ( of two ) nodes in the cluster.
This turned out to be due to a similar problem - in that case, the second node was resolved the FQ hostname of the DM to the WRONG IP address, which was then being blocked by a Linux firewall configuration ( as seen via the command /sbin/iptables -L ).
In short, when networking becomes notworking, things go pear-shaped