DocumentCode
1667464
Title
The Pig Mix Benchmark on Pig, MapReduce, and HPCC Systems
Author
Ouaknine, Keren ; Carey, Michael ; Kirkpatrick, Scott
fYear
2015
Firstpage
643
Lastpage
648
Abstract
Soon after Google published MapReduce, their paradigm for processing large amounts of data, the open-source world followed with the Hadoop ecosystem. Later on, Lexis Nexis, the company behind the world´s largest database of legal documents, open-sourced its Big Data processing platform, called the High-Performance Computing Cluster (HPCC). This paper makes three contributions. First, we describe our additions and improvements to the Pig Mix benchmark, the set of queries originally written for Apache Pig, and the porting of Pig Mix to HPCC. Second, we compare the performance of queries written in Pig, Java MapReduce, and ECL. Last, we draw conclusions and issue recommendations for future system benchmarks and large-scale data-processing platforms.
Keywords
Big Data; Java; parallel processing; pattern clustering; public domain software; Apache Pig; Big Data processing platform; ECL; Google; HPCC system; Hadoop ecosystem; Java MapReduce; Lexis Nexis; PigMix benchmark; high-performance computing cluster; large-scale data-processing platform; legal document; open-source world; Benchmark testing; Big data; Java; Optimization; Programming; Servers; XML; Benchmark; Big Data; HPCC Systems; MapReduce; Performance; PigMix;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location
New York, NY
Print_ISBN
978-1-4673-7277-0
Type
conf
DOI
10.1109/BigDataCongress.2015.99
Filename
7207283
Link To Document