Wednesday, March 18, 2009

SEDA Architecture

SEDA stands for Staged Event Driven Architecture. Some of the design goals of SEDA are massive concurrency, self tuning of resources, simplified construction of services and also providing introspection. The fundamental unit in SEDA is a 'Stage'. Stage is an independent software entity that is supported by an event handler, thread pool and request queue.

A Complext software is divided into a number of independent components - 'Stage' with request queues connecting these stages. Each Stage is free to configure and tune its own threadpool. Each stage has an event handler that takes the requests from the request queue and carries out the appropriate action and finally delivers the request to the next request queue in the software structure.

This model is seen to perform better than 'Thread per request' model as well as 'Bounded thread pool' model. This also provides a very good opportunity to implement self tuning resource management mechanism.


Thursday, March 12, 2009

Column Oriented Data Storage

Databases can store the table data in different forms. One of the methods that is used in lot of commercial database is row store. All the columns of the record are stored one after another on the disk. In this method, when the data is read from the disk, the columns belonging to the same record can be fetched faster because of the locational proximity on the storage disk. This works fine for most of the database applications.

In case of analytics and data warehousing applications, various analytical operations are carried out for a specific column and hence the other columns belonging to the same record are of little to no significance in these operations. These kind of applications can benefit from the column oriented data storage technique. The data belonging to the same column are stored one after another. One entry from each of the column store is read for creating a single record. This may sound expensive in most general cases but for various column oriented operations, this type of data storage may prove to be extremely performant.

But a simple storage layer optimization cannot yield better performance. This needs to be matched by the algorithms that are specifically designed to take advantage of the fact that data belonging to the same column are stored close to each other. If that is done, column store can provide high performance for analytical operations like that used in data warehouses.

Thanks to article from

Tuesday, February 24, 2009

DBFS Storage Engine

The DBFS Storage Engine module is responsible for providing access to the underlying storage. The DBFS clients are completely unaware of the DBFS Storage layout and the data orientation and the DBFS Storage engine provides a consistent relational access to the underlying data.

Architecture of DBFS Storage Engine
Uniform Access Provider
The Uniform Access Provider (UAP) component is responsible for exposing the underlying data through a relational data access interface. The UAP abstracts the complexities of data layout and data orientation and provides a uniform interface for any type of supported data. The UAP provides Row Store and Column Store as the two default data orientation components. The UAP also provides various data layout implementations like CSV, Fixed Width and In Memory. The UAP supports a pluggable data layout factory that can be used to plug-in various other types of data layout implementations.

Index Manager
The Index Manager provides the index support for the data. This lets the DBFS clients to query the data in a much faster way. Index Manager implements various index data structures like B+ tree, Bitmap Index and Hashes. Index Manager also manages the various index related activities like Index creation and update.

Statistics Manager

The Statistics Manager is a passive component in the Storage Engine module. The UAP and the Index Manager components publish various data access and modification statistics to the Statistics Manager component. The Statistics Manager uses this statistical information to carry out various housekeeping activities as well as implement different data access optimization strategies.

Cache Manager

All the DBFS read/write operations are routed through the Cache Manager component. When the Cache Manager finds any match, the values are returned from the cache. This provides faster access compared to the disk access. Cache Manager provides various cache replacement algorithms like Least Recently Used (LRU) and Least Frequently used (LFU). The data is cached in a memory sensitive manner to conserve the amount of memory and other system resources used by the data cache.

I/O Manager

The I/O Manager provides access to the underlying disk storage. The I/O manager is responsible for carrying out the data read, write and update operations in an optimized manner. The I/O Manager provides asynchronous non-blocking I/O operation as well as synchronous blocking I/O operations.

DBFS Coordinator

DBFS Coordinator is the central DBFS System Management process. DBFS Co-ord is the gateway to the DBFS system. Co-ord acts as a proxy to the underlying DBFS subsystems and provides a connection which can be used to carry out various file system CRUD operations.

The architecture of the DBFS Co-ord is show below:
Architecture of DBFS Co-ordinator

Bootstrap Manager
Bootstrap Manager is responsible for initializing all the DBFS sub-systems. The inter process communication between various DBFS sub-systems requires each DBFS sub-system to run a listener on a TCP port that is registered with the DBFS Coord. The ports required by each DBFS sub-system is not fixed and therefore, each sub-system is free to choose their port during the bootstrap process and the sub-system is required to communicate the port address to the DBFS sub-system before the Bootstrap Manager finishes the sub-system initialization phase.

DBFS Health Monitor
DBFS Health Monitor is responsible for communicating with each of the DBFS sub-systems to monitor the health of the component and also manages the health of the DBFS system as a whole. DBFS Health Monitor has a heart beat monitor which listens to the DBFS sub-systems during their active lifetime and if there is any anomaly, Health Monitor initiates the failover operation and hands in the control to the FailOver Manager.

FailOver Manager
When the DBFS Health Monitor activates the FailOver Manager, the FailOver Manager calls the Bootstrap Manager to re-initialize the failed sub-system. Also, the FailOver Manager recreates the DBFS state before the component failed. FailOver Manager switches the control from the failed component to the newly activated component in a transparent manner which the DBFS clients are unaware of.

Unified Connection Manager
The Unified Connection Manager is the gateway or the entry point to the DBFS system. The Unified Connection Manager runs at a TCP port that is known to the DBFS clients and all the DBFS clients communicates with the DBFS System through this port. The UCon Manager accepts the incoming requests and creates a new session for each of the DBFS client connection and the DBFS clients can perform various file system related operations through the connection handle. The UCon Manager is responsible for providing a connection handle to the authenticated DBFS clients. It is essential that the UCon Manager runs at a very well known TCP port that the DBFS client is aware of. When the DBFS is shutdown, the UCon Manager terminates all the client connections and waits for the clients to exit if required and carries out the termination in a graceful manner.

DBFS Architectural Overview

DBFS is architected for high throughput, parallel, asynchronous data access. DBFS is a multi process, multi thread program. Each module is executed as a separate process and each request is handled in a separate thread which is obtained from the module specific thread pool. The Inter process communication is carried out through TCP sockets communication for reliable message delivery.

Architectural Overview

The design goals of DBFS are scalability, reliability, high throughput and high concurrency. In order to maximize the CPU utilization, the different DBFS sub-systems are run as separate process in an independent JVM instance.

DBFS provides a very high degree of scalability and concurrency by supporting asynchronous, non-blocking I/O operations. The number of I/O requests served is very high as there is no synchronous blocking I/O calls to the underlying storage system.

DBFS provides reliability by providing a foolproof mechanism for storage integrity and consistency. The data delivered by the DBFS is consistent across multiple request threads and are reliably persisted in the underlying storage. The health of the overall DBFS system is monitored and a mechanism for failover is provided for transparent switching of the DBFS subsystems.

Components of DBFS

DBFS consists of three main sub-systems – DBFS Storage Engine, Transaction Management System and Recovery Management System. Each of these sub-systems has its own address space which is shared by all the components of a sub-system. Therefore, each of these sub-systems is run as a separate process and the components in the sub-system runs as a Java Thread inside the corresponding VM. The Inter process communication between these sub-systems is carried out through TCP/IP socket communication.

Visit DBFS Home for more information on DBFS.