Thursday, 31 December 2009
IBM WebSphere eXtreme Scale 6 by Anthony Chaves - my thoughts thus far
As mentioned previously, I'm working my way through Anthony Chaves' book on WebSphere eXtreme Scale. It's a very good book, but it is as much aimed at an application developer as at an infrastructure architect like me.
In my simple world, the main benefit of a data grid is that vast amounts of data can be held in memory, spread across as many processors as is necessary to handle the performance and storage requirements. This means that a data-centric application, such as that used in the banking and financial sectors, can access data far more quickly. The book has a great chart that compares data access times from the CPU registers ( ~ 1 nanosecond ) through main memory ( ~ 150 nanoseconds ) to secondary storage cache aka disk cache ( ~ 50 microseconds ) to secondary storage itself ( ~ 12 milliseconds ).
To quote from the book itself "... Accessing data on a hard drive platter is one million times slower than accessing that same data in main memory and one billion times slower than accessing that data in a register..."
The chapter goes on to compare the relative cost and capacity of CPU registers against disk storage - this makes perfect sense; I'm typing this on a Macbook Pro which has 6 MB of Level 2 cache ( access time is around 20 nanoseconds ), 4 GB of main memory ( access time is around 150 nanoseconds ) and 320 GB of hard disk ( access time is around 12 milliseconds ). I recently bought a 320 GB SATA hard disk for 30 pounds, and a 1 TB SATA hard disk for 60 pounds, but I'm guessing that it'd cost me a heck of a lot more to add more main memory, and I've got little or no chance of adding extra L2 cache, unless Apple happens to provide a quad core CPU with 16 MB as an upgrade - not likely :-)
The net benefit of the datagrid approach is that a programmer can ensure that data is held as close as possible to the processor cores, and not need to rely upon costly ( in performance terms ) database interactions - the Java Enterprise Edition (JEE) approach of database access is immensely powerful in terms of allowing a programmer to interact with a relational database without knowing or caring on what platform it runs - this abstraction via the Java Naming and Directory Interface (JNDI) and Java Database Connectivity (JDBC) APIs, via statement and connection pools etc. is very very useful, but does add latency to each and every database interaction. Caching data in memory helps, but only for read operations. Similarly, object locking takes a similar amount of time - each time I want to update a database record, I have to go to and from the database application itself.
Whilst not solving every problem, datagrids can help to mitigate against this, and the API appears immensely flexible in terms of read/write, locking, caching etc.
As a non-developer, I'm not going to see all of the benefits, but it's definitely worth considering, especially where performance and scalability are crucial requirements.
In terms of the book, Mr Chaves clearly knows his subject, and writes extremely well - it should be noted that he does jump into Java code quite quickly, with the first snippets of code appearing on page 17. It's in the later chapters that he goes into the architecture and the cost vs. benefit analysis of the datagrid methodology.
The book is aimed at the WebSphere eXtreme Scale product, and Mr Chaves even goes through the process of obtaining and installing the product, but the concepts are appropriate to any similar datagrid product.
I've got a lot more book to work through, but I'm impressed thus far - again, as an architect rather than a developer, with 10 years of Java experience behind me ( even at the basic J2EE / JEE level with servlets, portlets, JSPs, entity beans, session beans etc. ), the book is eminently readable, and I'd be happy to recommend it to anyone. The author's style is friendly without being patronising - it's not "Datagrids for Dummies" but that's probably a good thing :-)