Thursday, 28 March 2013

Problems with IBM Business Monitor Messaging Engine ( SI Bus ) following a teardown

*CAVEAT*

This post relates to my OWN individual experiences on my OWN personal VMware environment. This is NOT NOT NOT a recipe for everyone; your mileage may vary. If in doubt, PLEASE raise a PMR with IBM Support

*CAVEAT*

Having performed a fresh installation of IBM Business Monitor 8.0.1.1 against Oracle 11g R2 after a "teardown" - where I cleaned up the database objects created the first time around - I noticed that the Messaging Engine cluster ( that hosts the Service Integration Bus ) kept restarting.

When I checked SystemOut.log for the offending cluster member, I found: -

...
[28/03/13 09:54:01:606 GMT] 0000001b SibMessage    I   [CEI.BAMCELL.BUS:BAMSR01.Messaging.000-CEI.BAMCELL.BUS] CWSIS1538I: The messaging engine, ME_UUID=3D59E737F07528C9, INC_UUID=62A8E276B06B1903, is attempting to obtain an exclusive lock on the data store.
[28/03/13 09:54:01:766 GMT] 0000001c SibMessage    I   [CEI.BAMCELL.BUS:BAMSR01.Messaging.000-CEI.BAMCELL.BUS] CWSIS1545I: A single previous owner was found in the messaging engine's data store, ME_UUID=09BF782E0B664719, INC_UUID=78437FD9A71F6596
[28/03/13 09:54:01:768 GMT] 0000001d SibMessage    I   [MONITOR.BAMCELL.Bus:BAMSR01.Messaging.000-MONITOR.BAMCELL.Bus] CWSIS1545I: A single previous owner was found in the messaging engine's data store, ME_UUID=E2ABE650D061BE5C, INC_UUID=ADD9DFC1AA982A5A
[28/03/13 09:54:01:771 GMT] 0000001c SibMessage    E   [CEI.BAMCELL.BUS:BAMSR01.Messaging.000-CEI.BAMCELL.BUS] CWSIS1535E: The messaging engine's unique id does not match that found in the data store. ME_UUID=3D59E737F07528C9, ME_UUID(DB)=09BF782E0B664719
[28/03/13 09:54:01:784 GMT] 0000001b SibMessage    I   [CEI.BAMCELL.BUS:BAMSR01.Messaging.000-CEI.BAMCELL.BUS] CWSIS1593I: The messaging engine, ME_UUID=3D59E737F07528C9, INC_UUID=62A8E276B06B1903, has failed to gain an initial lock on the data store.

[28/03/13 09:54:01:788 GMT] 0000001a SibMessage    I   [MONITOR.BAMCELL.Bus:BAMSR01.Messaging.000-MONITOR.BAMCELL.Bus] CWSIS1537I: The messaging engine, ME_UUID=E2ABE650D061BE5C, INC_UUID=5634F9A5B06B1901, has acquired an exclusive lock on the data store.

and: -

...
[28/03/13 09:55:53:555 GMT] 0000000f SibMessage    E   [CEI.BAMCELL.BUS:BAMSR01.Messaging.000-CEI.BAMCELL.BUS] CWSID0046E: Messaging engine BAMSR01.Messaging.000-CEI.BAMCELL.BUS detected an error and cannot continue to run in this server.
[28/03/13 09:55:53:555 GMT] 0000000f HAGroupImpl   I   HMGR0130I: The local member of group IBM_hc=BAMSR01.Messaging,WSAF_SIB_BUS=CEI.BAMCELL.BUS,WSAF_SIB_MESSAGING_ENGINE=BAMSR01.Messaging.000-CEI.BAMCELL.BUS,type=WSAF_SIB has indicated that is it not alive. The JVM will be terminated.
[28/03/13 09:55:53:566 GMT] 0000000f SystemOut     O Panic:component requested panic from isAlive
[28/03/13 09:55:53:567 GMT] 0000000f SystemOut     O java.lang.RuntimeException: emergencyShutdown called:
[28/03/13 09:55:53:567 GMT] 0000000f SystemOut     O    at com.ibm.ws.runtime.component.ServerImpl.emergencyShutdown(ServerImpl.java:632)
[28/03/13 09:55:53:567 GMT] 0000000f SystemOut     O    at com.ibm.ws.hamanager.runtime.RuntimeProviderImpl.panicJVM(RuntimeProviderImpl.java:92)
[28/03/13 09:55:53:569 GMT] 0000000f SystemOut     O    at com.ibm.ws.hamanager.coordinator.impl.JVMControllerImpl.panicJVM(JVMControllerImpl.java:56)
[28/03/13 09:55:53:569 GMT] 0000000f SystemOut     O    at com.ibm.ws.hamanager.impl.HAGroupImpl.doIsAlive(HAGroupImpl.java:882)
[28/03/13 09:55:53:569 GMT] 0000000f SystemOut     O    at com.ibm.ws.hamanager.impl.HAGroupImpl$HAGroupUserCallback.doCallback(HAGroupImpl.java:1388)
[28/03/13 09:55:53:569 GMT] 0000000f SystemOut     O    at com.ibm.ws.hamanager.impl.Worker.run(Worker.java:64)
[28/03/13 09:55:53:569 GMT] 0000000f SystemOut     O    at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1690)
...

The first set of exceptions ( CWSIS1545I and CWSIS1535E ) led me to the solution, aided by this IBM Technote: -


Resolution

I realised that, when I'd cleaned down the database objects from the previous installation of BAM, I'd neglected to remove the schemas for the Messaging Engine.

In Oracle, I used SQL*Plus: -

sqlplus / as SYSDBA

and ran: -

SQL> select username from dba_users;

USERNAME
------------------------------
COGNOS
IBMBUSSP
MONITOR
MONME00
MONCM00

SCOTT

This showed the two schema user objects - MONME00 and MONCM00 - which I then removed: -

SQL> drop user MONCM00 cascade;

User dropped.

SQL> drop user MONME00 cascade;

User dropped.

and then restarted the ME cluster member.

This automatically recreated the objects ( this is almost certainly NOT the default behaviour - most DBAs would prefer to have more control over the creation of database objects such as schemas and users ) and the ME came up without exception.

Job done :-)

*CAVEAT*

This post relates to my OWN individual experiences on my OWN personal VMware environment. This is NOT NOT NOT a recipe for everyone; your mileage may vary. If in doubt, PLEASE raise a PMR with IBM Support

*CAVEAT*

2 comments:

Syed Ammar Hassan said...

I had the exact same problem in my installation, this do solve my problem. Is there any possibility to modify the scripts(where can be location) to specify the new usernames instead of by default MONME00 ? As i am using same database for two different environments so it can be a problem right?

Dave Hay said...

You can simply use the sibDDLGenerator script to generate new scripts using different schema names e.g.
/opt/ibm/WebSphere/AppServer/bin/sibDDLGenerator.sh -system oracle -version 11g -platform unix -schema MONCM00 -statementend ";" >> ~/createMESchemas.sql

Note to self - use kubectl to query images in a pod or deployment

In both cases, we use JSON ... For a deployment, we can do this: - kubectl get deployment foobar --namespace snafu --output jsonpath="{...