Monday, 4 March 2013

Fun with Linux disks

So this morning, when I landed into work, I saw an unexpected set of symptoms from my partially built IBM Business Monitor environment.

First of all, I saw this when I attempted to list my WAS profiles: -

$ /opt/IBM/WebSphere/AppServer/profiles/BAMDMProfile/bin/manageprofiles.sh -list 

JVMSHRC226E Error opening shared class cache file 
JVMSHRC336E Port layer error code = -300 
JVMSHRC337E Platform error message: Read-only file system 
JVMJ9VM015W Initialization error for library j9shr26(11): JVMJ9VM009E J9VMDllMain failed 
Could not create the Java virtual machine.


Then I saw these errors: -

<snip>
[3/1/13 15:30:05:202 GMT] 0000001c MMRoutingConf E com.ibm.wbimonitor.lifecycle.routing.MMRoutingConfigFlowDaemon checkForStateChanges CWMLC0274E: An error occurred while trying to determine the state of an MM.  This will be retried shortly.  The condition was caused by: com.ibm.wbimonitor.persistence.metamodel.spi.MetaModelPersistenceException: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=MONITOR.META_MODEL_UNVERSIONED_T, DRIVER=4.11.69.

[3/1/13 15:30:05:740 GMT] 0000001b LifecycleStop E com.ibm.wbimonitor.lifecycle.LifecycleStopRequestScanTask run() CWMLC0012E: Unexpected exception [com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-204, SQLSTATE=42704, SQLERRMC=MONITOR.META_MODEL_T, DRIVER=4.11.69].

</snip>

in the SystemOut.log file for the Deployment Manager, even though I knew that (a) it was fine on Friday and (b) that DB2 was up-and-running ( validated via the command db2 connect to MONITOR ).

When I went to check the file systems on the box: -

$ mount 

/dev/mapper/vglinux-rootlv on / type ext3 (rw) 
proc on /proc type proc (rw) 
sysfs on /sys type sysfs (rw) 
devpts on /dev/pts type devpts (rw,gid=5,mode=620) 
/dev/mapper/vglinux-tmplv on /tmp type ext3 (rw) 
/dev/mapper/vglinux-varlv on /var type ext3 (rw) 
/dev/dasda1 on /boot type ext3 (rw) 
tmpfs on /dev/shm type tmpfs (rw) 
/dev/mapper/vgapp-optlv on /opt type ext3 (rw) 
/dev/mapper/vgapp-homelv on /home type ext3 (rw) 
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) 
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 
LBPM002L:/store/ on /store type nfs (rw,addr=10.222.36.21)

mount: warning: /etc/mtab is not writable (e.g. read-only filesystem). 
       It's possible that information reported by mount(8) is not 
       up to date. For actual information about system mount points 
       check the /proc/mounts file.


When I attempted to change the RWX permissions for /tmp: -

chmod 777 /tmp 

chmod: changing permissions of `/tmp': Read-only file system

When I attempted to update the locate database ( as root ): -

$ updatedb 

updatedb: can not open a temporary file for `/var/lib/mlocate/mlocate.db'

At this point, I figured that something serious had occurred; this was validated by messages such as: -

<snip>
EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal 
Remounting filesystem read-only 
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229390 in dir #229378 
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229384 in dir #229378 
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229390 in dir #229378 
EXT3-fs error (device dm-1): ext3_lookup: unlinked inode 229384 in dir #229378 
attempt to access beyond end of device

</snip>

in the kernel ring buffer ( via the dmesg command ).

Then I called the Unix sysadm who realised that there was a much bigger problem with the disks, leading to a rebuild ( of / as /opt and /home appeared to be OK ).

Ah well, one lives and one learns ….





No comments: