Monday, 10 May 2010

More tales from the front-line - Portal clustering

Whilst endeavouring to add the first node into my WebSphere Portal 6.1.5 cluster, I kept hitting a problem whereby the script: -

./ cluster-node-config-post-federation

was failing with a java.lang.NullPointerException against the action fixup-after-security-change-admin in 

After much trial and even more error, I realised that, although the portal node was able to resolve the Deployment Manager node by it's fully qualified hostname, the same was not true in reverse.

I proved this by TELNETing from the portal node to the DM node: -

telnet 8879

which worked - at least, I could see that TELNET was responding even if I didn't, as one might expect, get a valid TELNET session.

When I did the reverse: -

telnet 8878

( where 8878 is the SOAP port of the node agent )

I got Connection refused

As far as I can establish, the node being added into the cell passes over it's hostname, which the DM node then tries and, in my case, fails to resolve CORRECTLY.

The trick, in my case, was to ensure that the correct fully qualified hostname of the portal node was available to the DM node, by way of the /etc/hosts file.

Once I was able to TELNET both ways using the FQ hostname, I was good to go.

On a related note, I hit a similar problem whereby: -

./ cluster-node-config-post-federation

was failing with: -

Exception: No route to host

on the second ( of two ) nodes in the cluster.

This turned out to be due to a similar problem - in that case, the second node was resolved the FQ hostname of the DM to the WRONG IP address, which was then being blocked by a Linux firewall configuration ( as seen via the command /sbin/iptables -L ).

In short, when networking becomes notworking, things go pear-shaped


Tony said...

Hey, thanks for the post. Saved my day!

Dave Hay said...

Tony, no problems, glad I was able to help