Title :
JVM-Bypass for Efficient Hadoop Shuffling
Author :
Yandong Wang ; Cong Xu ; Xiaobing Li ; Weikuan Yu
Author_Institution :
Dept. of Comput. Sci., Auburn Univ., Auburn, AL, USA
Abstract :
Hadoop employs Java-based network transport stack on top of the Java Virtual Machine (JVM) for its data shuffling and merging purposes. Our examination reveals that JVM introduces a significant amount of overhead to data processing capability of the native interface. Furthermore, JVM constrains the use of high-performance networking mechanisms such as RDMA (Remote Direct Memory Access) which has established itself as an effective data movement technology in many networking environments because of its low-latency, high bandwidth, low CPU utilization, and energy efficiency. In this paper, we introduce a plug-in library called JVM-Bypass Shuffling (JBS) for Hadoop data shuffling. JBS helps Hadoop data shuffling by avoiding Javabased transport protocols, removing the overhead and limitations of the JVM. In addition, we design JBS as a portable library that can leverage both TCP/IP and RDMA on different network systems such as InfiniBand and 1/10 Gigabit Ethernet. We have designed and implemented JBS as part of Hadoop acceleration. It has been transferred to Mellanox as the software product UDA (Unstructured Data Accelerator) and used to enable our studies on a variety of merging algorithms. Our performance evaluation demonstrates that JBS can effectively reduce the execution time of Hadoop jobs by up to 66.3% and lower the CPU utilization by 48.1%.
Keywords :
Java; parallel processing; virtual machines; CPU utilization; Ethernet; Hadoop acceleration; Hadoop data shuffling; InfiniBand; JVM-Bypass shuffling; Java virtual machine; Java-based network transport stack; Java-based transport protocol; Mellanox; RDMA; TCP/IP; data movement technology; energy efficiency; high-performance networking mechanism; merging algorithm; plug-in library; remote direct memory access; software product UDA; unstructured data accelerator; Bandwidth; IP networks; Java; Libraries; Merging; Prefetching; Protocols; High Performance Interconnect; JVM; MapReduce;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4673-6066-1
DOI :
10.1109/IPDPS.2013.13