Title :
An Evaluation of Cassandra for Hadoop
Author :
Dede, E. ; Sendir, B. ; Kuzlu, P. ; Hartog, J. ; Govindaraju, M.
Author_Institution :
Grid & Cloud Comput. Res. Lab., SUNY Binghamton, Binghamton, NY, USA
fDate :
June 28 2013-July 3 2013
Abstract :
In the last decade, the increased use and growth of social media, unconventional web technologies, and mobile applications, have all encouraged development of a new breed of database models. NoSQL data stores target the unstructured data, which by nature is dynamic and a key focus area for "Big Data" research. New generation data can prove costly and unpractical to administer with SQL databases due to lack of structure, high scalability, and elasticity needs. NoSQL data stores such as MongoDB and Cassandra provide a desirable platform for fast and efficient data queries. This leads to increased importance in areas such as cloud applications, e-commerce, social media, bioinformatics, and materials science. In an effort to combine the querying capabilities of conventional database systems and the processing power of the MapReduce model, this paper presents a thorough evaluation of the Cassandra NoSQL database when used in conjunction with the Hadoop MapReduce engine. We characterize the performance for a wide range of representative use cases, and then compare, contrast, and evaluate so that application developers can make informed decisions based upon data size, cluster size, replication factor, and partitioning strategy to meet their performance needs.
Keywords :
SQL; distributed databases; pattern clustering; public domain software; relational databases; Big Data research; Cassandra evaluation; Hadoop MapReduce engine; MapReduce model; MongoDB; NoSQL database; Web technologies; cluster size; data querying; data size; database models; mobile applications; partitioning strategy; performance needs; replication factor; representative use cases; social media; Benchmark testing; Data models; Distributed databases; Peer-to-peer computing; Servers; Writing; Cassandra; Distributed Computing; Hadoop; MapReduce; NoSQL;
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
DOI :
10.1109/CLOUD.2013.31