Column Oriented Data Storage

Databases can store the table data in different forms. One of the methods that is used in lot of commercial database is row store. All the columns of the record are stored one after another on the disk. In this method, when the data is read from the disk, the columns belonging to the same record can be fetched faster because of the locational proximity on the storage disk. This works fine for most of the database applications.

In case of analytics and data warehousing applications, various analytical operations are carried out for a specific column and hence the other columns belonging to the same record are of little to no significance in these operations. These kind of applications can benefit from the column oriented data storage technique. The data belonging to the same column are stored one after another. One entry from each of the column store is read for creating a single record. This may sound expensive in most general cases but for various column oriented operations, this type of data storage may prove to be extremely performant.

But a simple storage layer optimization cannot yield better performance. This needs to be matched by the algorithms that are specifically designed to take advantage of the fact that data belonging to the same column are stored close to each other. If that is done, column store can provide high performance for analytical operations like that used in data warehouses.

Acknowledgement:
Thanks to article from http://www.databasecolumn.com

DBFS - A Relational File system

Thursday, March 12, 2009