I'm trying to get a handle (!) on Messaging Engine ( SIbus ) failover between primary and standby DB2 boxes.
I've configured the ME Custom Property sib.msgstore.jdbcFailoverOnDBConnectionLoss = false rather than the WAS 8.5 default of true.
I've also tuned the TCP/IP keep-alives on all three VMs to: -
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_retries2 = 2
and yet SIB DB failover does not ... failover.
I start the MECluster ( which hosts the SIB ME ), and it happily connects to the primary DB ( db2one.uk.ibm.com ).
When I do the DB2 HADR takeover on db2two.uk.ibm.com, I see: -
[18/09/15 21:18:42:274 BST] 0000006a ConnectionEve A J2CA0056I: The Connection Manager received a fatal connection error from the Resource Adapter for resource jdbc/wbm/MonitorDatabase. The exception is: com.ibm.db2.jcc.am.ClientRerouteException: [jcc][t4][2027][11212][4.11.69] A connection failed but has been re-established. The host name or IP address is "db2one" and the service name or port number is 60,006.
in SystemOut.log, before it connects: -
[18/09/15 21:19:02:232 BST] 00000069 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1537I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=7E0AFF46E21AA864, has acquired an exclusive lock on the data store.
However, when i reverse the takeover ( back to d2one.uk.ibm.com ), I see: -
[18/09/15 21:19:22:269 BST] 000000a6 WSJccConnecti W DSRA8650W: Error closing a JDBC child wrapper, com.ibm.ws.rsadapter.jdbc.WSJccPreparedStatement@c693be3a
com.ibm.db2.jcc.am.SqlException: [jcc][10120][10943][4.11.69] Invalid operation: statement is closed. ERRORCODE=-4470, SQLSTATE=null
...
[18/09/15 21:19:22:278 BST] 000000a6 ConnectionEve W J2CA0206W: A connection error occurred. To help determine the problem, enable the Diagnose Connection Usage option on the Connection Factory or Data Source. This is the multithreaded access detection option. Alternatively check that the Database or MessageProvider is available.
[18/09/15 21:19:22:279 BST] 000000a6 ConnectionEve A J2CA0056I: The Connection Manager received a fatal connection error from the Resource Adapter for resource jdbc/wbm/MonitorDatabase. The exception is: com.ibm.db2.jcc.am.DisconnectNonTransientConnectionException: [jcc][t4][2030][11211][4.11.69] A communication error occurred during operations on the connection's underlying socket, socket input stream,
or socket output stream. Error location: Reply.fill() - insufficient data (-1). Message: Insufficient data. ERRORCODE=-4499, SQLSTATE=08001
[18/09/15 21:19:42:234 BST] 00000069 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1594I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=7E0AFF46E21AA864, has lost the lock on the data store.
[18/09/15 21:19:42:240 BST] 00000069 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1538I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=7E0AFF46E21AA864, is attempting to obtain an exclusive lock on the data store.
...
In other words, it fails from the configured primary db2one to the configured standby db2two but not back again.
This is what I have for the primary DB: -
and this is what I have for the standby: -
I then made some progress :-)
So this is what I have on db2two: -
db2 list db directory
System Database Directory
Number of entries in the directory = 2
Database 1 entry:
Database alias = COGNOS
Database name = COGNOS
Local database directory = /home/db2inst1
Database release level = 10.00
Comment = IBM Cognos Content Store
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname = db2one
Alternate server port number = 60006
Database 2 entry:
Database alias = MONITOR
Database name = MONITOR
Local database directory = /home/db2inst1
Database release level = 10.00
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname = db2one
Alternate server port number = 60006
and this is what I have on db2one: -
db2 list db directory
System Database Directory
Number of entries in the directory = 2
Database 1 entry:
Database alias = COGNOS
Database name = COGNOS
Local database directory = /home/db2inst1
Database release level = 10.00
Comment = IBM Cognos Content Store
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname = db2two
Alternate server port number = 60006
Database 2 entry:
Database alias = MONITOR
Database name = MONITOR
Local database directory = /home/db2inst1
Database release level = 10.00
Comment =
Directory entry type = Indirect
Catalog database partition number = 0
Alternate server hostname =
Alternate server port number =
Spot the difference ?
Yep, the DB2 catalog on db2one has no Alternate server settings for the MONITOR DB, and guess with what DB I'm having problems ?
Added to that, I see this: -
2015-09-19-07.28.52.105153+060 I61711243E775 LEVEL: Warning
PID : 2100 TID : 140240484296448 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : MONITOR
APPHDL : 0-13494 APPID: 192.168.33.100.56862.150919073103
AUTHID : DB2USER1 HOSTNAME: db2one.uk.ibm.com
EDUID : 156 EDUNAME: db2agent (MONITOR) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrCheckDb, probe:18200
MESSAGE : SQL1776N The command cannot be issued on an HADR database. Reason
code = "1".
DATA #1 : Hex integer, 4 bytes
0x00000000
DATA #2 : sqeApplication_acbInfo, PD_TYPE_sqeApplication_acbInfo, 4 bytes
x0
DATA #3 : String, 50 bytes
Connections are not allowed on a standby database.
in ~/sqllib/db2dump/db2diag.log.
I updated the Alternate server settings on db2one as follows: -
db2 "update alternate server for database monitor using hostname db2two port 60006"
and restarted DB2: -
db2stop force
db2start
and ... guess what ?
Yep, takeover now works both ways: -
From db2two to db2one
Special registers may or may not be re-attempted (Reason code = 1). ERRORCODE=-4498, SQLSTATE=08506
[19/09/15 07:34:31:487 BST] 00000068 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1594I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7, has lost the lock on the data store.
[19/09/15 07:34:31:488 BST] 00000068 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1538I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7, is attempting to obtain an exclusive lock on the data store.
[19/09/15 07:34:31:655 BST] 00000154 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1545I: A single previous owner was found in the messaging engine's data store, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7
[19/09/15 07:34:31:659 BST] 00000068 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1537I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7, has acquired an exclusive lock on the data store.
From db2one to db2two
[19/09/15 07:37:31:801 BST] 00000154 ConnectionEve W J2CA0206W: A connection error occurred. To help determine the problem, enable the Diagnose Connection Usage option on the Connection Factory or Data Source. This is the multithreaded access detection option. Alternatively check that the Database or MessageProvider is available.
[19/09/15 07:37:31:802 BST] 00000154 ConnectionEve A J2CA0056I: The Connection Manager received a fatal connection error from the Resource Adapter for resource jdbc/wbm/MonitorDatabase. The exception is: com.ibm.db2.jcc.am.ClientRerouteException: [jcc][t4][2027][11212][4.11.69] A connection failed but has been re-established. The host name or IP address is "db2one.uk.ibm.com" and the service name or port number is 60,006.
Special registers may or may not be re-attempted (Reason code = 1). ERRORCODE=-4498, SQLSTATE=08506
[19/09/15 07:37:51:666 BST] 00000068 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1594I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7, has lost the lock on the data store.
[19/09/15 07:37:51:668 BST] 00000068 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1538I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7, is attempting to obtain an exclusive lock on the data store.
[19/09/15 07:37:51:787 BST] 0000015a SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1545I: A single previous owner was found in the messaging engine's data store, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7
[19/09/15 07:37:51:797 BST] 00000068 SibMessage I [MONITOR.BAMCell1.Bus:MECluster.000-MONITOR.BAMCell1.Bus] CWSIS1537I: The messaging engine, ME_UUID=FE28F02EE5B5F496, INC_UUID=A35AD93CE23D2EA7, has acquired an exclusive lock on the data store.
No comments:
Post a Comment