Title :
Reducing MapReduce Abstraction Costs for Text-centric Applications
Author :
Chun-Hung Hsiao ; Cafarella, Michael ; Narayanasamy, Satish
Author_Institution :
Univ. of Michigan, Ann Arbor, MI, USA
Abstract :
The MapReduce framework has become widely popular for programming large clusters, even though MapReduce jobs may use underlying resources relatively inefficiently. There has been substantial research in improving MapReduce performance for applications that were inspired by relational database queries, but almost none for text-centric applications, including inverted index construction, processing large log files, and so on. We identify two simple optimizations to improve MapReduce performance on text-centric tasks: frequency-buffering and spill-matcher. The former approach improves buffer efficiency for intermediate map outputs by identifying frequent keys, effectively shrinking the amount of work that the shuffle phase must perform. Spill-matcher is a runtime controller that improves parallelization of MapReduce framework background tasks. Together, our two optimizations improve the performance of text-centric applications by up to 39.1%. We demonstrate gains on both a small local cluster and Amazon´s EC2 cloud service. Unlike other MapReduce optimizations, these techniques require no user code changes, and only small changes to the MapReduce system.
Keywords :
cloud computing; optimisation; parallel programming; relational databases; text analysis; Amazon´s EC2 cloud service; MapReduce abstraction cost reduction; MapReduce framework background task parallelization; MapReduce performance improvement; buffer efficiency; frequency-buffering; frequent keys; runtime controller; shuffle phase; spill-matcher; text-centric applications; text-centric tasks; Indexes; Instruction sets; Optimization; Parallel processing; Runtime; Sorting; Standards;
Conference_Titel :
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location :
Minneapolis MN
DOI :
10.1109/ICPP.2014.13