This post should be read in the line of Tim’s post about Decoupling Components, as it takes for granted some information written there.
During the last week, I’ve been learning/working with some technologies that are related to the decoupling of components we want to accomplish. Specifically, I’ve been working with the Synchronizer component of the event driven architecture Tim described.
Right now, the synchronizer takes the responses from the resources and gets those responses into the occurrence store (MySQL as of today, but not final). But it has more to it: The responses from the resources come typically from DiGIR, TAPIR and BioCASe providers which render their responses into XML format. So how does all this data ends up in the occurrence store? Well, fortunately my colleague Oliver Meyn wrote a very useful library to unmarshall all these XML chunks into nice and simple objects, so on my side I just have to worry about calling all those getter methods. Also, the synchronizer acts as a listener to a message queue , queue that will store all the resource responses that need to be handled. All the queue’s nuts & bolts were worked out by Tim and Federico Méndez. So yes, it has been a nice collaboration from many developers inside the Secretariat and it’s always nice to have this kind of head start from your colleagues :)
So, getting back to my duties, I have to take all these objects and start populating the occurrence target store taking some precautions (e.g.: not inserting duplicated occurrence records, checking that some mandatory fields are not null and other requirements).
For now, it’s in development mode, but I have managed to make some tests and extract some metrics that show current performance and definitely leaves room for improvement. For the tests, first the message queue is loaded with some responses that need to be attended and afterwards I execute the synchronizer which starts populating the occurrence store. All these tests are done on my MacBook Pro, so definetely response times will improve on a better box. So here are the metrics:
- MacBook Pro 2.4 GHz Core2 Duo (4GB Memory)
- Mac OS X 10.5.8 (Leopard)
- Message Queue & MySQL DB reside on different machines, but same intranet.
- Threads: synchronizer spawns 5 threads to attend the queue elements.
- Message queue: loaded with 552 responses (some responses are just empty to emulate a real world scenario).
- Records in total: 70,326 occurrence records in total in all responses
Results Test 1 (without filtering out records):
- Extracting responses from queue
- Inserting into a MySQL DB
- 202022 milliseconds (3 min, 22 secs)
Results Test 2 (filtering out records):
- Extracting from queue
- Filtering out records (duplicates, mandatory fields, etc)
- Inserting into MySQL DB
- over 30 minutes... (big FAIL)
I hope to communicate further improvements later, see you for now.