Saturday, 30 May 2015

Things that make you go "Hmmm" - #432 - WebSphere Application Server Transaction and Partner Logs

Over the past few weeks, I've written about my experiences configuring IBM Business Process Manager and IBM Business Monitor to connect via a TLS-encrypted tunnel to IBM DB2: -


and am just about to create a post covering the experiences learned whilst configuring WebSphere Application Server to support the current latest Transport Layer Security (TLS) 1.2.

However, I hit a small glitch....

Whilst validating my current setup ( IBM Business Monitor 8.5.6 on WAS ND 8.5.5.5 connecting via TLS 1.0 to DB2 10.5.0.5 ), I noted the following exception in one of my cluster member logs ( specifically the AppTarget ): -

[30/05/15 06:54:23:906 BST] 00000065 RecoveryManag I   WTRN0135I: Transaction service recovering no transactions.
[30/05/15 06:54:23:917 BST] 00000065 RecoveryManag A   WTRN0134I: Recovering 1 XA resource manager(s) from the transaction partner logs
[30/05/15 06:54:23:954 BST] 00000065 XARecoveryDat A   WTRN0151I: Preparing to call xa recover on XAResource: Monitor_Database
[30/05/15 06:54:24:891 BST] 00000065 DMAdapter     I com.ibm.ws.ffdc.impl.DMAdapter getAnalysisEngine FFDC1009I: Analysis Engine using data base: /opt/IBM/WebSphere/AppServer/properties/logbr/ffdc/adv/ffdcdb.xml
[30/05/15 06:54:24:897 BST] 00000065 FfdcProvider  W com.ibm.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I: FFDC Incident emitted on /opt/IBM/WebSphere/AppServer/profiles/BAMCell1AppSrv01/logs/ffdc/AppClusterMember1_c432d1b_15.05.30_06.54.24.8794642297944511548081.txt com.ibm.ws.rsadapter.spi.InternalGenericDataStoreHelper.getPooledCon 1298
[30/05/15 06:54:24:939 BST] 00000065 FfdcProvider  W com.ibm.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I: FFDC Incident emitted on /opt/IBM/WebSphere/AppServer/profiles/BAMCell1AppSrv01/logs/ffdc/AppClusterMember1_c432d1b_15.05.30_06.54.24.9204315923791292855617.txt com.ibm.ejs.j2c.J2CXAResourceFactory.getXAResource 310
[30/05/15 06:54:24:942 BST] 00000065 J2CXAResource W   J2CA0061W: Error creating XA Connection and Resource com.ibm.ws.exception.WsException: DSRA8100E: Unable to get a XAConnection from the DataSource jdbc/wbm/MonitorDatabase. with SQL State : 08001 SQL Code : -4499


Caused by: com.ibm.websphere.ce.cm.StaleConnectionException: [jcc][t4][2043][11550][4.18.60] Exception java.net.ConnectException: Error opening socket to server bam856.uk.ibm.com/127.0.0.1 on port 60,006 with message: Connection refused. ERRORCODE=-4499, SQLSTATE=08001 DSRA0010E: SQL State = 08001, Error Code = -4,499


java.sql.SQLNonTransientException: [jcc][t4][2043][11550][4.18.60] Exception java.net.ConnectException: Error opening socket to server bam856.uk.ibm.com/127.0.0.1 on port 60,006 with message: Connection refused. ERRORCODE=-4499, SQLSTATE=08001 DSRA0010E: SQL State = 08001, Error Code = -4,499

Caused by: java.net.ConnectException: Connection refused

Having gone through the configuration with a fine tooth comb ( whatever one of those is ), I could NOT find ANY reference to port  60006 anywhere.

For the record, port 60006 is the non-TLS port that I'd previously used, before switching to port 60007 for a TLS-encrypted connection.

After much trial and quite a lot of error, I re-read the log, specifically these two lines: -

[30/05/15 06:54:23:917 BST] 00000065 RecoveryManag A   WTRN0134I: Recovering 1 XA resource manager(s) from the transaction partner logs
[30/05/15 06:54:23:954 BST] 00000065 XARecoveryDat A   WTRN0151I: Preparing to call xa recover on XAResource: Monitor_Database


which started me thinking about the Transaction Manager.

What, I wondered, was the possibility that the OLD pre-TLS configuration was still persisted in a transaction that had previously NOT completed before I switched the configuration across ?

I did some further digging ( using the command fgrep -R 60006 * ) inside the directory that hosts the Transaction, Recovery and Partner logs for the AppCluster: -

cd /opt/IBM/WebSphere/AppServer/profiles/BAMCell1AppSrv01/tranlog/BAMCell1/AppSrv01Node/AppClusterMember1/transaction

and found two binary files: -

log1
log2

here: -

/opt/IBM/WebSphere/AppServer/profiles/BAMCell1AppSrv01/tranlog/BAMCell1/AppSrv01Node/AppClusterMember1/transaction/partner

both of which contained references to the string 60006.

That confirmed my suspicion.

Now there's a third file in this directory, sensibly named: -

DO NOT DELETE LOG FILES

That's there for a VERY good reason - one should NEVER delete the Transaction or Partner Log files.

*** WARNING - CAVEAT EMPTOR ***

Having said NEVER, this is MY own test environment with NO important or critical data - if I break things, I simply rebuild the WAS cell, which takes ~30 minutes.

So, ignoring my own ( and IBM's ) advice, I delete the Partner Logs: -

cd /opt/IBM/WebSphere/AppServer/profiles/BAMCell1AppSrv01/tranlog/BAMCell1/AppSrv01Node/AppClusterMember1/transaction/partner
rm -Rf *

having shut down the AppCluster.

Quelle surprise, when I restarted the cluster, there were no failed transactions to recover, and WAS came up clean and green with NO JDBC exceptions.

*** WARNING - CAVEAT EMPTOR ***

Thinking about it after the event, I probably could have achieved the same thing by re-opening port 60006 on DB2, which I'd previously disabled using the db2set command as follows: -

db2set DB2COMM=SSL

thus overriding the previous configuration: -

db2set DB2COMM=SSL,TCPIP

For the record, the SSL value means that DB2 observes the SSL service configuration within the Database Manager: -

SSL service name                         (SSL_SVCENAME) = db2c_ssl

whereas TCPIP means that it observes the TCPIP service configuration: -

TCP/IP Service name                          (SVCENAME) = db2c_db2inst1

In each case, the Service Name is inferred from /etc/services which ensures that the instance is listening on the appropriate ports: -

DB2_db2inst1 60000/tcp
DB2_db2inst1_1 60001/tcp
DB2_db2inst1_2 60002/tcp
DB2_db2inst1_3 60003/tcp
DB2_db2inst1_4 60004/tcp
DB2_db2inst1_END 60005/tcp
db2c_db2inst1 60006/tcp
db2c_ssl 60007/tcp


So, had I enabled BOTH services, WAS would've been able to connect via a non-TLS connection to port 60006 and the transaction would have been recovered / completed.

Life is, as ever, a learning curve :-)

For future reference, there's plenty of good material covering the WAS Java Transaction Service, including this: -


which does cover the costs and benefits of deleting the Transaction and Partner Logs, especially in the context of WebSphere Process Server ( now IBM BPM ).

So, again, do NOT NOT NOT delete Tran/Partner Logs unless you really really really know what you're doing.

No comments:

Reminder - installing podman and skopeo on Ubuntu 22.04

This follows on from: - Lest I forget - how to install pip on Ubuntu I had reason to install podman  and skopeo  on an Ubuntu box: - lsb_rel...