Tuesday, February 24, 2009

DBFS Storage Engine

The DBFS Storage Engine module is responsible for providing access to the underlying storage. The DBFS clients are completely unaware of the DBFS Storage layout and the data orientation and the DBFS Storage engine provides a consistent relational access to the underlying data.

Architecture of DBFS Storage Engine
Uniform Access Provider
The Uniform Access Provider (UAP) component is responsible for exposing the underlying data through a relational data access interface. The UAP abstracts the complexities of data layout and data orientation and provides a uniform interface for any type of supported data. The UAP provides Row Store and Column Store as the two default data orientation components. The UAP also provides various data layout implementations like CSV, Fixed Width and In Memory. The UAP supports a pluggable data layout factory that can be used to plug-in various other types of data layout implementations.

Index Manager
The Index Manager provides the index support for the data. This lets the DBFS clients to query the data in a much faster way. Index Manager implements various index data structures like B+ tree, Bitmap Index and Hashes. Index Manager also manages the various index related activities like Index creation and update.

Statistics Manager

The Statistics Manager is a passive component in the Storage Engine module. The UAP and the Index Manager components publish various data access and modification statistics to the Statistics Manager component. The Statistics Manager uses this statistical information to carry out various housekeeping activities as well as implement different data access optimization strategies.

Cache Manager

All the DBFS read/write operations are routed through the Cache Manager component. When the Cache Manager finds any match, the values are returned from the cache. This provides faster access compared to the disk access. Cache Manager provides various cache replacement algorithms like Least Recently Used (LRU) and Least Frequently used (LFU). The data is cached in a memory sensitive manner to conserve the amount of memory and other system resources used by the data cache.

I/O Manager

The I/O Manager provides access to the underlying disk storage. The I/O manager is responsible for carrying out the data read, write and update operations in an optimized manner. The I/O Manager provides asynchronous non-blocking I/O operation as well as synchronous blocking I/O operations.

DBFS Coordinator

DBFS Coordinator is the central DBFS System Management process. DBFS Co-ord is the gateway to the DBFS system. Co-ord acts as a proxy to the underlying DBFS subsystems and provides a connection which can be used to carry out various file system CRUD operations.

The architecture of the DBFS Co-ord is show below:
Architecture of DBFS Co-ordinator

Bootstrap Manager
Bootstrap Manager is responsible for initializing all the DBFS sub-systems. The inter process communication between various DBFS sub-systems requires each DBFS sub-system to run a listener on a TCP port that is registered with the DBFS Coord. The ports required by each DBFS sub-system is not fixed and therefore, each sub-system is free to choose their port during the bootstrap process and the sub-system is required to communicate the port address to the DBFS sub-system before the Bootstrap Manager finishes the sub-system initialization phase.

DBFS Health Monitor
DBFS Health Monitor is responsible for communicating with each of the DBFS sub-systems to monitor the health of the component and also manages the health of the DBFS system as a whole. DBFS Health Monitor has a heart beat monitor which listens to the DBFS sub-systems during their active lifetime and if there is any anomaly, Health Monitor initiates the failover operation and hands in the control to the FailOver Manager.

FailOver Manager
When the DBFS Health Monitor activates the FailOver Manager, the FailOver Manager calls the Bootstrap Manager to re-initialize the failed sub-system. Also, the FailOver Manager recreates the DBFS state before the component failed. FailOver Manager switches the control from the failed component to the newly activated component in a transparent manner which the DBFS clients are unaware of.

Unified Connection Manager
The Unified Connection Manager is the gateway or the entry point to the DBFS system. The Unified Connection Manager runs at a TCP port that is known to the DBFS clients and all the DBFS clients communicates with the DBFS System through this port. The UCon Manager accepts the incoming requests and creates a new session for each of the DBFS client connection and the DBFS clients can perform various file system related operations through the connection handle. The UCon Manager is responsible for providing a connection handle to the authenticated DBFS clients. It is essential that the UCon Manager runs at a very well known TCP port that the DBFS client is aware of. When the DBFS is shutdown, the UCon Manager terminates all the client connections and waits for the clients to exit if required and carries out the termination in a graceful manner.

DBFS Architectural Overview

DBFS is architected for high throughput, parallel, asynchronous data access. DBFS is a multi process, multi thread program. Each module is executed as a separate process and each request is handled in a separate thread which is obtained from the module specific thread pool. The Inter process communication is carried out through TCP sockets communication for reliable message delivery.

Architectural Overview

The design goals of DBFS are scalability, reliability, high throughput and high concurrency. In order to maximize the CPU utilization, the different DBFS sub-systems are run as separate process in an independent JVM instance.


DBFS provides a very high degree of scalability and concurrency by supporting asynchronous, non-blocking I/O operations. The number of I/O requests served is very high as there is no synchronous blocking I/O calls to the underlying storage system.


DBFS provides reliability by providing a foolproof mechanism for storage integrity and consistency. The data delivered by the DBFS is consistent across multiple request threads and are reliably persisted in the underlying storage. The health of the overall DBFS system is monitored and a mechanism for failover is provided for transparent switching of the DBFS subsystems.

Components of DBFS

DBFS consists of three main sub-systems – DBFS Storage Engine, Transaction Management System and Recovery Management System. Each of these sub-systems has its own address space which is shared by all the components of a sub-system. Therefore, each of these sub-systems is run as a separate process and the components in the sub-system runs as a Java Thread inside the corresponding VM. The Inter process communication between these sub-systems is carried out through TCP/IP socket communication.


Visit DBFS Home for more information on DBFS.