Title :
The Pig Mix Benchmark on Pig, MapReduce, and HPCC Systems
Author :
Ouaknine, Keren ; Carey, Michael ; Kirkpatrick, Scott
Abstract :
Soon after Google published MapReduce, their paradigm for processing large amounts of data, the open-source world followed with the Hadoop ecosystem. Later on, Lexis Nexis, the company behind the world´s largest database of legal documents, open-sourced its Big Data processing platform, called the High-Performance Computing Cluster (HPCC). This paper makes three contributions. First, we describe our additions and improvements to the Pig Mix benchmark, the set of queries originally written for Apache Pig, and the porting of Pig Mix to HPCC. Second, we compare the performance of queries written in Pig, Java MapReduce, and ECL. Last, we draw conclusions and issue recommendations for future system benchmarks and large-scale data-processing platforms.
Keywords :
Big Data; Java; parallel processing; pattern clustering; public domain software; Apache Pig; Big Data processing platform; ECL; Google; HPCC system; Hadoop ecosystem; Java MapReduce; Lexis Nexis; PigMix benchmark; high-performance computing cluster; large-scale data-processing platform; legal document; open-source world; Benchmark testing; Big data; Java; Optimization; Programming; Servers; XML; Benchmark; Big Data; HPCC Systems; MapReduce; Performance; PigMix;
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
DOI :
10.1109/BigDataCongress.2015.99