During the final phase of configuring a two-node cluster for Lotus Connections 2.5, I was attempting to configure the Shared Message Store so that all LC features deployed as clusters across both nodes could see messages and logs.
The Information Centre covers this in depth here: -
and, in essence, I'd created a series of directories on a NFS server ( running NFS v3 on Red Hat Enterprise Linux ), as follows: -
mkdir /net/data/collaboration/messagestore
mkdir /net/data/collaboration/messagestore/Activities
mkdir /net/data/collaboration/messagestore/Blogs
mkdir /net/data/collaboration/messagestore/Communities
mkdir /net/data/collaboration/messagestore/Dogear
mkdir /net/data/collaboration/messagestore/Files
mkdir /net/data/collaboration/messagestore/Homepage
mkdir /net/data/collaboration/messagestore/Profiles
mkdir /net/data/collaboration/messagestore/Wikis
and then created eight new members of the WebSphere Service Integration Bus ( SIBus ), one for each of the clustered LC features.
Each bus member has two directories; one for logs and one for messages: -
/net/data/collaboration/messagestore/<clusterName>/log
/net/data/collaboration/messagestore/<clusterName>/store
these two subdirectories being created when the cluster is first started ( which, in turn, starts the bus ).
So far, so good.
I'd already verified that I could write to, and read from, the NFS server, by creating/editing/viewing/deleting files from both nodes, against the shared NFS server - which was automatically mounted in /etc/fstab when Linux starts.
However, whilst I could start my clusters after making these changes, the SIBus members never started completely, and merely showed as "Starting".
In order to diagnose the problem further, I stopped all of the clusters, stopped the node agents, cleared down the logs, started ONE node agent, and started one cluster ( Activities ), which meant that I only had one JVM on one node to play with.
I then monitored the logs and, c'est voila, I found these messages: -
[21/05/10 14:47:38:723 BST] 0000002d SibMessage E [ConnectionsBus:Activities.000-ConnectionsBus] CWSIS1592E: The file store has caught an unexpected io exception.
[21/05/10 14:47:38:724 BST] 0000002d SibMessage I [ConnectionsBus:Activities.000-ConnectionsBus] CWSIS1582I: The file store had a problem initialising its log file but will attempt to retry.
[21/05/10 14:47:43:731 BST] 0000002d SibMessage I [ConnectionsBus:Activities.000-ConnectionsBus] CWSIS1581I: The file store is attempting to initalise its log file: /net/data/collaboration/messagestore/Activities/log/Log
When I checked the normal Linux error log, via the dmesg command, I also found: -
SELinux: initialized (dev 0:13, type nfs), uses genfs_contexts
lockd: server 192.168.113.97 not responding, still trying
Working with the networking specialists at the client site, it turned out that the iptables firewall on the NFS server was misconfigured, and was blocking me. However, the problem was even more subtle, as my tests had proved that NFS writes and reads were working OK.
The problem, as seen from dmesg, was with the Lock Daemon ( lockd ), which was being blocked.
Using the NFS v3 protocol ( which is supported by Connections ), the ports that needed to be opened on the NFS server were: -
LOCKD_TCPPORT=32803
LOCKD_UDPPORT=32769
or, in other words: -
32803/tcp
32769/udp
Once these changes were made, and the NFS server was rebooted, the SIBus burst into life and Connections started .... connecting.
The moral of the story - get to know and love your network specialist :-)