A Portal to a Portal: IBM Business Process Manager

Friday, 21 November 2014

IBM Business Process Manager - Missing the Bus

I've just built a single cell, two node three cluster IBM BPM Advanced 8.5.5 environment, against a remote DB2 ESE 10.1.0.3 server.

So I was a little startled when, after starting the Deployment Environment, the Service Integration Bus (SIbus) failed to properly start.

This is what I saw in one of my Cluster Member logs: -

[21/11/14 13:17:03:719 GMT] 00000073 SibMessage I [BPM.ProcessServer.Bus:MECluster.000-BPM.ProcessServer.Bus] CWSIS1593I: The messaging engine, ME_UUID=E997A9EFA09498FC, INC_UUID=6DC2A53AD19710D7, has failed to gain an initial lock on the data store.
[21/11/14 13:17:03:719 GMT] 00000073 SibMessage I [BPM.ProcessServer.Bus:MECluster.000-BPM.ProcessServer.Bus] CWSIS1538I: The messaging engine, ME_UUID=E997A9EFA09498FC, INC_UUID=6DC2A53AD19710D7, is attempting to obtain an exclusive lock on the data store.

This was a clean build, so the Messaging Engine database should have been OK.

The tables were definitely there: -

SIB000   DB2USER1   T     2014-11-21-13.43.55.547439
SIB001   DB2USER1   T     2014-11-21-13.43.55.682333
SIB002   DB2USER1   T     2014-11-21-13.43.55.819494
SIBCLASSMAP     DB2USER1   T     2014-11-21-13.43.55.334938
SIBKEYS     DB2USER1   T     2014-11-21-13.43.55.947883
SIBLISTING   DB2USER1   T     2014-11-21-13.43.55.420531
SIBOWNER   DB2USER1   T     2014-11-21-13.43.55.151963
SIBOWNERO     DB2USER1   T     2014-11-21-13.43.55.081007
SIBXACTS   DB2USER1   T     2014-11-21-13.43.56.039355

and yet .... they were ALL empty :-(

As this is MY own environment, I called the ball and dropped the SIB tables: -

db2 drop table db2user1.sib000
db2 drop table db2user1.sib001
db2 drop table db2user1.sib002
db2 drop table db2user1.sibclassmap
db2 drop table db2user1.sibkeys
db2 drop table db2user1.siblisting
db2 drop table db2user1.sibowner
db2 drop table db2user1.sibownero
db2 drop table db2user1.sibxacts

and restarted the MECluster

This time around, the tables were nicely populated e.g.

db2 "select id from db2user1.sib000"

...

252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
272

269 record(s) selected.
...

and the SIbus comes up nicely: -

with JVM1 reports: -

[21/11/14 13:43:58:431 GMT] 0000006a SibMessage I [BPM.ProcessServer.Bus:MECluster.000-BPM.ProcessServer.Bus] CWSID0016I: Messaging engine MECluster.000-BPM.ProcessServer.Bus is in state Started.

and JVM2 reports: -

[21/11/14 13:47:23:859 GMT] 00000065 SibMessage I [BPM.ProcessServer.Bus:MECluster.000-BPM.ProcessServer.Bus] CWSID0016I: Messaging engine MECluster.000-BPM.ProcessServer.Bus is in state Joined.

In other words, the Bus Member on node 1 is active, with the Bus Member on node 2 standing by to take over.

When I stopped the MEClusterMember1 on node 1, I see this from node 2: -

[21/11/14 13:51:53:684 GMT] 00000097 SibMessage I [BPM.ProcessServer.Bus:MECluster.000-BPM.ProcessServer.Bus] CWSID0016I: Messaging engine MECluster.000-BPM.ProcessServer.Bus is in state Started.

which again is as expected.

And, as a final acid test, when I restart MEClusterMember1, I see this: -

[21/11/14 13:55:33:043 GMT] 00000062 SibMessage I [BPM.ProcessServer.Bus:MECluster.000-BPM.ProcessServer.Bus] CWSID0016I: Messaging engine MECluster.000-BPM.ProcessServer.Bus is in state Joined.

and stop MEClusterMember2, I see this: -

[21/11/14 13:57:33:123 GMT] 0000008f SibMessage I [BPM.ProcessServer.Bus:MECluster.000-BPM.ProcessServer.Bus] CWSID0016I: Messaging engine MECluster.000-BPM.ProcessServer.Bus is in state Started.

both messaging coming from node 1.

This shows that, once I dropped and recreated the SIB tables, the bus comes up nicely, and failover works both ways - node 1 to node 2 and node 2 to node 1.

This ties up with the IBM BPM pattern, known as 1-of-n, where only one ME / Bus Member can be active at any one time, regardless of the number of nodes in the cell / members in the cluster.

Which is nice.

So what went wrong ? I do not know, but I know how to resolve it AND, more importantly, watch for problems.

Some background reading: -

CWSIS1519E error occur when Messaging Engine failed to obtain lock on failover in clustering environment

Service Integration Bus Messaging Engine Startup Problems and Solutions

A Portal to a Portal

Friday, 21 November 2014

IBM Business Process Manager - Missing the Bus

No comments:

Note to self - Firefox and local connections

Search This Blog