Wednesday, 7 April 2010
Lotus Forms Server - Webform Server 3.5.1 - Some Windows-based weirdness
I haven't properly thought this through yet, but I wanted to write up the results of some work with which I was involved last week that initially led to some confusion about the way that the Lotus Forms Server Translator application actually works.
In my environment, we had three Windows-based virtual machines running the LF infrastructure, as follows: -
Shared File Cache ( shared as a Windows networked drive using the UNC e..g. W$ )
Access Control Database ( DB2 UDB )
with Nodes 2 and 3 being federated into a managed cell, and operating as a cluster, via the WebSphere Plugin deployed onto a pair of co-located IBM HTTP Servers.
Having built the cluster, updated the Plugin configuration files etc. all appeared to work, and we even proved that failover worked, via an externally deployed load balancer, based upon WebSphere Edge Components.
So all was well .....
.... until we reconfigured WAS on Nodes 2 and 3 to allow the node agents to run as Windows services, using the WASService.exe binary.
From this point on, we were getting a strange set of errors ( which I need to recapture and record here ) implying that the Translator servers ( which are executing on Nodes 2 and 3 under the control of the Deployment Manager ) were no longer able to access the Shared File Cache. Apart from the error messages in SystemOut.log and StdOut.log on each of the WAS servers, we were also seeing NO files or directories being written to the SFC itself.
My colleague had seen similar problems previously, which appeared to be related to a lack of time synchronisation between the WAS nodes and the DM. However, in this case, the clocks were all showing the same time, down to the second.
We experimented with the SFC settings in translator.properties on each of the two nodes, e.g.: -
but to no avail.
We also confirmed that the logged-in Windows user on either Node 2 or Node 3 ( actually the LOCAL Windows Administrator account ) could map to the SFC via the UNC e.g. \\vbCONF0001\w$ or \\vbCONF0001\SharedFileCache etc. and read AND write files and directories.
Having recognised that this all worked until we added the Node Agents in as Windows services, we did a quick test to manually start the Node Agent, using the startNode.bat script, rather than via the Windows Services Control Panel ( services.msc ). In this configuration, we were able to successfully start the Translator cluster from the DM, and all was well.
We then changed the Windows account that is used to start the Node Agent, from within the Services Control Panel, from the default of Local System Account to the actual user name of the local Administrator user ( .\Administrator ), along with the user's password.
Having made this final change, we were able to start/stop/restart the Node Agents from the Services Control Panel, and start the cluster from the DM without any problems at all.
The moral of the story ?
When you run a WAS Node Agent as a service, check the user under which it's running. I've not seen this problem elsewhere e.g. with a similarly clustered instance of WebSphere Portal 6.1.5, running on the same version of WAS - 18.104.22.168, so I'm assuming that it's the added wrinkle of the Shared File Cache that's being read/written via the Windows UNC.
I'd be interested to see if others have had the same problem ....