DocumentCode :
659558
Title :
A characterization of big data benchmarks
Author :
Wen Xiong ; Zhibin Yu ; Zhendong Bei ; Juanjuan Zhao ; Fan Zhang ; Yubin Zou ; Xue Bai ; Ye Li ; Chengzhong Xu
Author_Institution :
Center for Cloud Comput., Inst. of Adv. Technol., Shenzhen, China
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
118
Lastpage :
125
Abstract :
Recently, big data has been evolved into a buzzword from academia to industry all over the world. Benchmarks are important tools for evaluating an IT system. However, benchmarking big data systems is much more challenging than ever before. First, big data systems are still in their infant stage and consequently they are not well understood. Second, big data systems are more complicated compared to previous systems such as a single node computing platform. While some researchers started to design benchmarks for big data systems, they do not consider the redundancy between their benchmarks. Moreover, they use artificial input data sets rather than real world data for their benchmarks. It is therefore unclear whether these benchmarks can be used to precisely evaluate the performance of big data systems. In this paper, we first analyze the redundancy among benchmarks from ICTBench, HiBench and typical workloads from real world applications: spatio-temporal data analysis for Shenzhen transportation system. Subsequently, we present an initial idea of a big data benchmark suite for spatio-temporal data. There are three findings in this work: (1) redundancy exists in these pioneering benchmark suites and some of them can be removed safely. (2) The workload behavior of trajectory data analysis applications is dramatically affected by their input data sets. (3) The benchmarks created for academic research cannot represent the cases of real world applications.
Keywords :
Big Data; benchmark testing; data analysis; redundancy; transportation; HiBench; ICTBench; Shenzhen transportation system; big data benchmark suite; big data systems; redundancy; spatio-temporal data analysis; trajectory data analysis applications workload behavior; Bandwidth; Benchmark testing; Data handling; Data storage systems; Information management; Measurement; Principal component analysis; mapreduce; micro-architecture metrics; similarity; trajectory data; workloads;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691707
Filename :
6691707
Link To Document :
بازگشت