Title :
The HiBench benchmark suite: Characterization of the MapReduce-based data analysis
Author :
Huang, Shengsheng ; Huang, Jie ; Dai, Jinquan ; Xie, Tao ; Huang, Bo
Author_Institution :
Intel China Software Center, Shanghai, China
Abstract :
The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.
Keywords :
Java; data analysis; public domain software; HDFS bandwidth; Hadoop characterization; HiBench benchmark suite; MapReduce based data analysis; data access patterns; open source implementation; Bandwidth; Cloud computing; Data analysis; Explosives; Fault tolerance; Large-scale systems; Open source software; Personal communication networks; Resource management; Throughput;
Conference_Titel :
Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-6522-4
Electronic_ISBN :
978-1-4244-6521-7
DOI :
10.1109/ICDEW.2010.5452747