Wednesday, 7 April 2010

Lotus Forms Server - Webform Server 3.5.1 - Some Windows-based weirdness

I haven't properly thought this through yet, but I wanted to write up the results of some work with which I was involved last week that initially led to some confusion about the way that the Lotus Forms Server Translator application actually works.

In my environment, we had three Windows-based virtual machines running the LF infrastructure, as follows: -

Node 1

Log Server
Shared File Cache ( shared as a Windows networked drive using the UNC e..g. W$ )
Access Control Database ( DB2 UDB )

Node 2

Translator Server

Node 3

Translator Server

with Nodes 2 and 3 being federated into a managed cell, and operating as a cluster, via the WebSphere Plugin deployed onto a pair of co-located IBM HTTP Servers.

Having built the cluster, updated the Plugin configuration files etc. all appeared to work, and we even proved that failover worked, via an externally deployed load balancer, based upon WebSphere Edge Components.

So all was well .....

.... until we reconfigured WAS on Nodes 2 and 3 to allow the node agents to run as Windows services, using the WASService.exe binary.

From this point on, we were getting a strange set of errors ( which I need to recapture and record here ) implying that the Translator servers ( which are executing on Nodes 2 and 3 under the control of the Deployment Manager ) were no longer able to access the Shared File Cache. Apart from the error messages in SystemOut.log and StdOut.log on each of the WAS servers, we were also seeing NO files or directories being written to the SFC itself.

My colleague had seen similar problems previously, which appeared to be related to a lack of time synchronisation between the WAS nodes and the DM. However, in this case, the clocks were all showing the same time, down to the second.

We experimented with the SFC settings in translator.properties on each of the two nodes, e.g.: -

   <entry key="fileCacheLocation">\\vbCONF0001\SharedFileCache</entry>

or: -

   <entry key="fileCacheLocation">\\vbCONF0001\W$\SharedFileCache</entry>

etc.

but to no avail.

We also confirmed that the logged-in Windows user on either Node 2 or Node 3 ( actually the LOCAL Windows Administrator account ) could map to the SFC via the UNC e.g. \\vbCONF0001\w$ or \\vbCONF0001\SharedFileCache etc. and read AND write files and directories.

Having recognised that this all worked until we added the Node Agents in as Windows services, we did a quick test to manually start the Node Agent, using the startNode.bat script, rather than via the Windows Services Control Panel ( services.msc ). In this configuration, we were able to successfully start the Translator cluster from the DM, and all was well.

We then changed the Windows account that is used to start the Node Agent, from within the Services Control Panel, from the default of Local System Account to the actual user name of the local Administrator user ( .\Administrator ), along with the user's password.

Having made this final change, we were able to start/stop/restart the Node Agents from the Services Control Panel, and start the cluster from the DM without any problems at all.

The moral of the story ?

When you run a WAS Node Agent as a service, check the user under which it's running. I've not seen this problem elsewhere e.g. with a similarly clustered instance of WebSphere Portal 6.1.5, running on the same version of WAS - 6.1.0.27, so I'm assuming that it's the added wrinkle of the Shared File Cache that's being read/written via the Windows UNC.

I'd be interested to see if others have had the same problem ....

1 comment:

Dave Hay said...

Digging further, I found reference to the NullSessionShares registry modification that was available with earlier versions of Windows NT, as per this Technote: -

Service Running as System Account Fails Accessing Network

http://support.microsoft.com/kb/124184

which describes how to set: -

[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \LanmanServer \Parameters]
NullSessionPipes = "..."

or: -

NullSessionShares = "..."

depending upon whether Shares or Named Pipes are being used.

Since then, Windows 2003 has introduced a similar feature: -

"Network Access: Do not allow anonymous enumeration of SAM accounts", which disables enumerations of SAM accounts, but still allows enumerations of shares (For Simple File Sharing):

[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \LSA]
RestrictAnonymousSAM = 1 (Default = 1)

http://support.microsoft.com/kb/Q328459

and: -

"Network access: Let Everyone permissions apply to anonymous users", which enforces that all rights given to the Everyone-group (authenticated users) are not automatically given to the Anonymous Logon security group. Before Windows XP the Everyone-group included both authenticated users and anonymous users:

[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \LSA]
EveryoneIncludesAnonymous = 0 (Default = 0)

http://support.microsoft.com/kb/Q278259

With thanks to http://smallvoid.com/article/winnt-restrict-anonymous.html for this information.

Note to self - Firefox and local connections

 Whilst trying to hit my NAS from Firefox on my Mac, I kept seeing errors such as:- Unable to connect Firefox can’t establish a connection t...