Title :
HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL
Author :
Grossman, Max ; Breternitz, Mauricio ; Sarkar, Vivek
Abstract :
As the scale of high performance computing systems grows, three main challenges arise: the programmability, reliability, and energy efficiency of those systems. Accomplishing all three without sacrificing performance requires a rethinking of legacy distributed programming models and homogeneous clusters. In this work, we integrate Hadoop MapReduce with OpenCL to enable the use of heterogeneous processors in a distributed system. We do this by exploiting the implicit data parallelism of mappers and reducers in a MapReduce system. Combining Hadoop and OpenCL provides 1) an easy-to-learn and flexible application programming interface in a high level and popular programming language, 2) the reliability guarantees and distributed file system of Hadoop, and 3) the low power consumption and performance acceleration of heterogeneous processors. This paper presents HadoopCL: an extension to Hadoop which supports execution of user-written Java kernels on heterogeneous devices, optimizes communication through asynchronous transfers and dedicated I/O threads, automatically generates OpenCL kernels from Java byte code using the open source tool APARAPI, and achieves nearly 3x overall speedup and better than 55x speedup of the computational sections for example MapReduce applications, relative to Hadoop.
Keywords :
application program interfaces; programming languages; Hadoop MapReduce system; HadoopCL; Java byte code; MapReduce applications; OpenCL kernels; computational sections; data parallelism; distributed file system; distributed heterogeneous platforms; distributed system; energy efficiency; flexible application programming interface; heterogeneous devices; heterogeneous processors; high performance computing systems; homogeneous clusters; legacy distributed programming models; open source tool APARAPI; programmability; programming language; reliability; seamless integration; user written Java kernels; Complexity theory; Graphics processing units; Instruction sets; Java; Kernel; Programming; Reliability; GPGPU; Hadoop; OpenCL; heterogeneous; multicore;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
DOI :
10.1109/IPDPSW.2013.246