• DocumentCode
    659558
  • Title

    A characterization of big data benchmarks

  • Author

    Wen Xiong ; Zhibin Yu ; Zhendong Bei ; Juanjuan Zhao ; Fan Zhang ; Yubin Zou ; Xue Bai ; Ye Li ; Chengzhong Xu

  • Author_Institution
    Center for Cloud Comput., Inst. of Adv. Technol., Shenzhen, China
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    118
  • Lastpage
    125
  • Abstract
    Recently, big data has been evolved into a buzzword from academia to industry all over the world. Benchmarks are important tools for evaluating an IT system. However, benchmarking big data systems is much more challenging than ever before. First, big data systems are still in their infant stage and consequently they are not well understood. Second, big data systems are more complicated compared to previous systems such as a single node computing platform. While some researchers started to design benchmarks for big data systems, they do not consider the redundancy between their benchmarks. Moreover, they use artificial input data sets rather than real world data for their benchmarks. It is therefore unclear whether these benchmarks can be used to precisely evaluate the performance of big data systems. In this paper, we first analyze the redundancy among benchmarks from ICTBench, HiBench and typical workloads from real world applications: spatio-temporal data analysis for Shenzhen transportation system. Subsequently, we present an initial idea of a big data benchmark suite for spatio-temporal data. There are three findings in this work: (1) redundancy exists in these pioneering benchmark suites and some of them can be removed safely. (2) The workload behavior of trajectory data analysis applications is dramatically affected by their input data sets. (3) The benchmarks created for academic research cannot represent the cases of real world applications.
  • Keywords
    Big Data; benchmark testing; data analysis; redundancy; transportation; HiBench; ICTBench; Shenzhen transportation system; big data benchmark suite; big data systems; redundancy; spatio-temporal data analysis; trajectory data analysis applications workload behavior; Bandwidth; Benchmark testing; Data handling; Data storage systems; Information management; Measurement; Principal component analysis; mapreduce; micro-architecture metrics; similarity; trajectory data; workloads;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691707
  • Filename
    6691707