Thursday, 29 November 2007

My kicks in life... yeah I am a nerd

I love programming!! End of story, talk to the hand.. I don't care what you think. It doesn't matter if I am working as an architect, technical director or janitor. I love programming and the creative mindset you are indulging in. To just sit and juggle with different concepts and translate them into running code is just so inspiring. I am trying to pick up a new language now and then and that has really helped me cross cutting several different paradigms.

If I am bored I just pick up a little problem that has bothered me and I just fix it. Like this other day. We have this product right where a mobile device is interacting with a Java server using Hessian. I wrote most of the Java server and the Mobile application so I should know how it works. There was something with the speed of the synchronization of the data that just didn't feel right so I sat down and started writing some tests for it. The mobile application is written in C# using Visual Studio. Unfortunately there are no freeware/opensource performance monitoring tools that I know of as of now that covers that platform so I had to resort to some of my own tricks. (I might write something myself as soon as the Compact Framework team is adding dynamic proxy support to the framework.)

So after some time I discovered that during the synchronization of the data some methods were called with their arguments evaluated dynamically. These dynamic arguments are quite expensive and should not be performed during sync. So we basically have this type of situation:


// running on the client in the syncService..
while(until no more dataChanges or exception)
{
serviceMethod(extractArguments(dataChange))
...
}
public void addToChangeLog(Date date, Operation op, ...)
{
if(IsSyncing)
return;

// add to changelog...
}

...
// in the proper service method, called during sync or normal operation..
public void serviceMethod(Type arg1, Type arg2, ...)
{
// 1) do work... save to DB and so on...
...
// 2) save changes to dataChangeLog to be picked up during next sync
// but should not be done if we are syncing already
syncService.addToChangeLog(currentDate, operation,
extractNewDataAsStringVeryExpensive(sourceObject));
}

In the call to syncService.addToChangeLog we have one argument extractNewDataAsStringVeryExpensive that is evaluated before the addToChangeLog is called. This method is very expensive in terms of execution time. But this is not directly obvious when running this piece of code just a few times. Lets say the argument takes 200ms to complete and this method is called 100 times then the total execution time would be 200ms x 100 = 20 seconds. That is 20 seconds wasted.

So by changing the code to something like...

public void serviceMethod(Type arg1, Type arg2, ...)
{
// 1) do work... save to DB and so on...
...
// 2) save changes to dataChangeLog to be picked up during next sync
// but should not be done if we are syncing already
if(!syncService.IsSyncing)
syncService.addToChangeLog(currentDate, operation,
extractNewDataAsStringVeryExpensive(sourceObject));
}

...we will fix the problem but have introduced a special case check in otherwise generic code. The check whether we are syncing already has been moved one level up to the calling code. Once would think that this problem could be handled with an around advice in an aspect where we have full control of the invocation of the target method, but no. The arguments would still be evaluated before the target method was called. If would be nice somehow to be able to have an advice feature whereby you would be able to decide on what arguments to resolve before the method is called, similar to the around advice.

Take away points:
  • the extractNewDataAsStringVeryExpensive method was not expensive initially. It evolved!!
  • it pays of having a CONTINUOUS PERFORMANCE task in place that would run as part of the continuous integration build so that performance increasing code changes can be detected properly. This is not a trivial exercise but fun never the less
Well anyways, it solved my problem for now and I don't think I will loose any sleep over it :)

If I come up with a better solution I will write about it here... Ciao

Tuesday, 27 November 2007

Book writing

After some initial excitement regarding writing a book about software development and my experiences with it I have discovered a new book written by Kent Beck (of XP fame) called Implementation Patterns which I believe perfectly covers some of the material I was planning to write about.

I am happy that Kent (whose ideas and work I respect a lot) has published this book and look forward to read it. I still have a few ideas though that I would like to work with in written form so more about this later. I might have to change the form and approach a bit :)

Monday, 26 November 2007

Jasypt experiences

We have been using the Jasypt framework for a bit more than a year in our product and its pretty good. Very easy to integrate, configure and extend to suit your own needs.

Few things though:

- Hibernate integration: you have to think a bit about your solution if you wanna encrypt PK-FK relationships, no out of the box solution... (yet?)

- In a development environment you probably don't want to use the PBE Filter + Servlet until deployment time. If you do, you probably want to automate the initialization of the PBE system at startup from a props file via a servlet listener or something..

Thursday, 15 November 2007

Scalability tips

  • Asynchronous event-driven design: Avoid as much as possible any synchronous interaction with the data or business logic tier. Instead, use an event-driven approach and workflow
  • Partitioning/Shards: You need to design your data model so that it will fit the partitioning model
  • Parallel execution: Parallel execution should be used to get the most out of the available resources. A good place to use parallel execution is for processing users requests. In this case multiple instances of each service can take the requests from the messaging system and execute them in parallel. Another place for parallel processing is using MapReduce for performing aggregated requests on partitioned data
  • Replication (read-mostly): In read-mostly scenarios (LinkedIN seems to fall into this category well), database replication can help load-balance the read load by splitting the read requests among the replicated database nodes
  • Consistency without distributed transactions: That was one of the hot topics of the conference, which also sparked some discussion during one of the panels I participated in. An argument was made that to reach scalability you had to sacrifice consistency and handle consistency in your applications using things such as optimistic locking and asynchronous error-handling. It also assumes that you will need to handle idempotency in your code. My argument was that while this pattern addresses scalability, it creates complexity and is therefore error-prone. During another panel, Dan Pritchett argued that there are ways to avoid this level of complexity and still achieve the same goal, as I outlined in this blog post.
  • Move the database to the background - There was violent agreement that the database bottleneck can only be solved if database interactions happen in the background.
(src- http://natishalom.typepad.com/nati_shaloms_blog/2007/11/lessons-from--1.html)