• DocumentCode
    1667464
  • Title

    The Pig Mix Benchmark on Pig, MapReduce, and HPCC Systems

  • Author

    Ouaknine, Keren ; Carey, Michael ; Kirkpatrick, Scott

  • fYear
    2015
  • Firstpage
    643
  • Lastpage
    648
  • Abstract
    Soon after Google published MapReduce, their paradigm for processing large amounts of data, the open-source world followed with the Hadoop ecosystem. Later on, Lexis Nexis, the company behind the world´s largest database of legal documents, open-sourced its Big Data processing platform, called the High-Performance Computing Cluster (HPCC). This paper makes three contributions. First, we describe our additions and improvements to the Pig Mix benchmark, the set of queries originally written for Apache Pig, and the porting of Pig Mix to HPCC. Second, we compare the performance of queries written in Pig, Java MapReduce, and ECL. Last, we draw conclusions and issue recommendations for future system benchmarks and large-scale data-processing platforms.
  • Keywords
    Big Data; Java; parallel processing; pattern clustering; public domain software; Apache Pig; Big Data processing platform; ECL; Google; HPCC system; Hadoop ecosystem; Java MapReduce; Lexis Nexis; PigMix benchmark; high-performance computing cluster; large-scale data-processing platform; legal document; open-source world; Benchmark testing; Big data; Java; Optimization; Programming; Servers; XML; Benchmark; Big Data; HPCC Systems; MapReduce; Performance; PigMix;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2015 IEEE International Congress on
  • Conference_Location
    New York, NY
  • Print_ISBN
    978-1-4673-7277-0
  • Type

    conf

  • DOI
    10.1109/BigDataCongress.2015.99
  • Filename
    7207283