In my environment, we had three Windows-based virtual machines running the LF infrastructure, as follows: -
Node 1
Log Server
Shared File Cache ( shared as a Windows networked drive using the UNC e..g. W$ )
Access Control Database ( DB2 UDB )
Node 2
Translator Server
Node 3
Translator Server
with Nodes 2 and 3 being federated into a managed cell, and operating as a cluster, via the WebSphere Plugin deployed onto a pair of co-located IBM HTTP Servers.
Having built the cluster, updated the Plugin configuration files etc. all appeared to work, and we even proved that failover worked, via an externally deployed load balancer, based upon WebSphere Edge Components.
So all was well .....
.... until we reconfigured WAS on Nodes 2 and 3 to allow the node agents to run as Windows services, using the WASService.exe binary.
From this point on, we were getting a strange set of errors ( which I need to recapture and record here ) implying that the Translator servers ( which are executing on Nodes 2 and 3 under the control of the Deployment Manager ) were no longer able to access the Shared File Cache. Apart from the error messages in SystemOut.log and StdOut.log on each of the WAS servers, we were also seeing NO files or directories being written to the SFC itself.
My colleague had seen similar problems previously, which appeared to be related to a lack of time synchronisation between the WAS nodes and the DM. However, in this case, the clocks were all showing the same time, down to the second.
We experimented with the SFC settings in translator.properties on each of the two nodes, e.g.: -
<entry key="fileCacheLocation">\\vbCONF0001\SharedFileCache</entry>
or: -
<entry key="fileCacheLocation">\\vbCONF0001\W$\SharedFileCache</entry>
etc.
but to no avail.
We also confirmed that the logged-in Windows user on either Node 2 or Node 3 ( actually the LOCAL Windows Administrator account ) could map to the SFC via the UNC e.g. \\vbCONF0001\w$ or \\vbCONF0001\SharedFileCache etc. and read AND write files and directories.
Having recognised that this all worked until we added the Node Agents in as Windows services, we did a quick test to manually start the Node Agent, using the startNode.bat script, rather than via the Windows Services Control Panel ( services.msc ). In this configuration, we were able to successfully start the Translator cluster from the DM, and all was well.
We then changed the Windows account that is used to start the Node Agent, from within the Services Control Panel, from the default of Local System Account to the actual user name of the local Administrator user ( .\Administrator ), along with the user's password.
Having made this final change, we were able to start/stop/restart the Node Agents from the Services Control Panel, and start the cluster from the DM without any problems at all.
The moral of the story ?
When you run a WAS Node Agent as a service, check the user under which it's running. I've not seen this problem elsewhere e.g. with a similarly clustered instance of WebSphere Portal 6.1.5, running on the same version of WAS - 6.1.0.27, so I'm assuming that it's the added wrinkle of the Shared File Cache that's being read/written via the Windows UNC.
I'd be interested to see if others have had the same problem ....
1 comment:
Digging further, I found reference to the NullSessionShares registry modification that was available with earlier versions of Windows NT, as per this Technote: -
Service Running as System Account Fails Accessing Network
http://support.microsoft.com/kb/124184
which describes how to set: -
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \LanmanServer \Parameters]
NullSessionPipes = "..."
or: -
NullSessionShares = "..."
depending upon whether Shares or Named Pipes are being used.
Since then, Windows 2003 has introduced a similar feature: -
"Network Access: Do not allow anonymous enumeration of SAM accounts", which disables enumerations of SAM accounts, but still allows enumerations of shares (For Simple File Sharing):
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \LSA]
RestrictAnonymousSAM = 1 (Default = 1)
http://support.microsoft.com/kb/Q328459
and: -
"Network access: Let Everyone permissions apply to anonymous users", which enforces that all rights given to the Everyone-group (authenticated users) are not automatically given to the Anonymous Logon security group. Before Windows XP the Everyone-group included both authenticated users and anonymous users:
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \LSA]
EveryoneIncludesAnonymous = 0 (Default = 0)
http://support.microsoft.com/kb/Q278259
With thanks to http://smallvoid.com/article/winnt-restrict-anonymous.html for this information.
Post a Comment