Monday, 3 January 2011

WebSphere Portal Express v7 on Ubuntu Linux - sloooooooooooooooooooow

Having installed WebSphere Portal Express v7 on Linux a while back, I'd noticed an issue where it'd fail to start, instead filling the logs with exceptions such as: -

[28/12/10 19:09:31:110 GMT] 0000000f EventLogServi I   Rebuilding event log for library: Web Content
[28/12/10 19:11:31:375 GMT] 0000001d TimeoutManage I   WTRN0006W: Transaction 0000012D2E629F6D00000005000000276FACDD0C7C3820993606C63E30166C874E766DD70000012D2E629F6D00000005000000276FACDD0C7C3820993606C63E30166C874E766DD700000001 has timed out after 120 seconds.
[28/12/10 19:11:31:377 GMT] 0000001d TimeoutManage I   WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[server.startup : 2,5,main]. The stack trace of this thread when the timeout occurred was:
java.util.ArrayList.indexOf(ArrayList.java:448)
java.util.ArrayList.remove(ArrayList.java:556)
java.util.Observable.deleteObserver(Observable.java:78)
org.apache.derby.impl.store.raw.data.BasePage.releaseExclusive(Unknown Source)
...

and: -

[28/12/10 19:38:21:873 GMT] 0000000f EventLogServi I   Rebuilding event log for library: Blog Solo Template v70
[28/12/10 19:40:22:065 GMT] 0000001d TimeoutManage I   WTRN0006W: Transaction 0000012D2E7D07F000000005000000486FACDD0C7C3820993606C63E30166C874E766DD70000012D2E7D07F000000005000000486FACDD0C7C3820993606C63E30166C874E766DD700000001 has timed out after 120 seconds.
[28/12/10 19:40:22:067 GMT] 0000001d TimeoutManage I   WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[server.startup : 2,5,main]. The stack trace of this thread when the timeout occurred was:
org.apache.derby.impl.store.access.btree.ControlRow.getIsRoot(Unknown Source)
...

and: -

[28/12/10 20:35:31:975 GMT] 0000000f EventLogServi I   Rebuilding event log for library: Wiki Template v70
[28/12/10 20:37:32:087 GMT] 0000001b TimeoutManage I   WTRN0006W: Transaction 0000012D2EB15E7700000005000000D76FACDD0C7C3820993606C63E30166C874E766DD70000012D2EB15E7700000005000000D76FACDD0C7C3820993606C63E30166C874E766DD700000001 has timed out after 120 seconds.
[28/12/10 20:37:32:088 GMT] 0000001b TimeoutManage I   WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[server.startup : 2,5,main]. The stack trace of this thread when the timeout occurred was:
org.apache.derby.impl.sql.execute.GenericQualifier.getOperator(Unknown Source)
...

etc.

Whilst I didn't actually resolve the issue, I simply left it running ( over the course of two days BUT with a suspend/resume somewhere in the middle ), and it sorted itself out.

Looking at the exceptions relating to org.apache.derby, I suspect that there is an issue with the Apache Derby database - I'm using the out-of-the-box version: -

[28/12/10 19:08:10:059 GMT] 00000008 InternalGener I   DSRA8203I: Database product name : Apache Derby
[28/12/10 19:08:10:068 GMT] 00000008 InternalGener I   DSRA8204I: Database product version : 10.3.3.1 - (883215)
[28/12/10 19:08:10:069 GMT] 00000008 InternalGener I   DSRA8205I: JDBC driver name  : Apache Derby Embedded JDBC Driver
[28/12/10 19:08:10:070 GMT] 00000008 InternalGener I   DSRA8206I: JDBC driver version  : 10.3.3.1 - (883215)

It may be simply that Derby cannot handle the event log rebuild process for the 10 WCM libraries that I have: -

  • Blog Solo Template v70
  • Blog Template v70
  • C_Content
  • C_Resources
  • Sample
  • TB_Catalog
  • TB_Core
  • TB_Shared
  • TB_Top
  • Web Content
I'll keep digging, and see whether I can find any relevant fixes for Derby, and report back. I'm currently looking at exceptions in derby.log in /opt/IBM/WebSphere/wp_profile/PortalServer/derby with exceptions such as: - ERROR XSDG3: Meta-data for Container org.apache.derby.impl.store.raw.data.RAFContainer4@57685768 could not be accessed at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)

*UPDATE* I note that this derby.log file is (a) marked as executable and (b) 2.3 GB ( yes, GIGABYTES ) in size. That'll make pleasant reading ...

In the meantime, if you see this, the best advice is to simply allow the rebuild to finish it's course. PS I even tried forcing a Javacore dump ( using kill -3 ) but WebSphere bravely kept running, even though I did get a Javacore logged to /opt/IBM/WebSphere/wp_profile.

1 comment:

Dave Hay said...

Looking through derby.log, there were a large number of "Too many open files" messages.

This makes sense because, by default, ulimit -n returns 1024 which is too low for Portal.

Thinking back, I had had to force the startServer.sh WebSphere_Portal command to quit as I was seeing the same error in SystemOut.log during startup.

I then added ulimit -n 40000 to /opt/IBM/WebSphere/wp_profile/bin/setupCmdLine.sh and tried to start Portal again.

However, by this time, I must have "confused" Derby to the point that it had to rebuilt the event log for the 10 WCM content libraries.

Therefore, the morale of the story is to get the ulimit right BEFORE you install and start WebSphere Portal.

Simple :-)