Wednesday, March 18, 2009

SEDA Architecture

SEDA stands for Staged Event Driven Architecture. Some of the design goals of SEDA are massive concurrency, self tuning of resources, simplified construction of services and also providing introspection. The fundamental unit in SEDA is a 'Stage'. Stage is an independent software entity that is supported by an event handler, thread pool and request queue.

A Complext software is divided into a number of independent components - 'Stage' with request queues connecting these stages. Each Stage is free to configure and tune its own threadpool. Each stage has an event handler that takes the requests from the request queue and carries out the appropriate action and finally delivers the request to the next request queue in the software structure.

This model is seen to perform better than 'Thread per request' model as well as 'Bounded thread pool' model. This also provides a very good opportunity to implement self tuning resource management mechanism.


Thursday, March 12, 2009

Column Oriented Data Storage

Databases can store the table data in different forms. One of the methods that is used in lot of commercial database is row store. All the columns of the record are stored one after another on the disk. In this method, when the data is read from the disk, the columns belonging to the same record can be fetched faster because of the locational proximity on the storage disk. This works fine for most of the database applications.

In case of analytics and data warehousing applications, various analytical operations are carried out for a specific column and hence the other columns belonging to the same record are of little to no significance in these operations. These kind of applications can benefit from the column oriented data storage technique. The data belonging to the same column are stored one after another. One entry from each of the column store is read for creating a single record. This may sound expensive in most general cases but for various column oriented operations, this type of data storage may prove to be extremely performant.

But a simple storage layer optimization cannot yield better performance. This needs to be matched by the algorithms that are specifically designed to take advantage of the fact that data belonging to the same column are stored close to each other. If that is done, column store can provide high performance for analytical operations like that used in data warehouses.

Thanks to article from