• DocumentCode
    625618
  • Title

    JVM-Bypass for Efficient Hadoop Shuffling

  • Author

    Yandong Wang ; Cong Xu ; Xiaobing Li ; Weikuan Yu

  • Author_Institution
    Dept. of Comput. Sci., Auburn Univ., Auburn, AL, USA
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    569
  • Lastpage
    578
  • Abstract
    Hadoop employs Java-based network transport stack on top of the Java Virtual Machine (JVM) for its data shuffling and merging purposes. Our examination reveals that JVM introduces a significant amount of overhead to data processing capability of the native interface. Furthermore, JVM constrains the use of high-performance networking mechanisms such as RDMA (Remote Direct Memory Access) which has established itself as an effective data movement technology in many networking environments because of its low-latency, high bandwidth, low CPU utilization, and energy efficiency. In this paper, we introduce a plug-in library called JVM-Bypass Shuffling (JBS) for Hadoop data shuffling. JBS helps Hadoop data shuffling by avoiding Javabased transport protocols, removing the overhead and limitations of the JVM. In addition, we design JBS as a portable library that can leverage both TCP/IP and RDMA on different network systems such as InfiniBand and 1/10 Gigabit Ethernet. We have designed and implemented JBS as part of Hadoop acceleration. It has been transferred to Mellanox as the software product UDA (Unstructured Data Accelerator) and used to enable our studies on a variety of merging algorithms. Our performance evaluation demonstrates that JBS can effectively reduce the execution time of Hadoop jobs by up to 66.3% and lower the CPU utilization by 48.1%.
  • Keywords
    Java; parallel processing; virtual machines; CPU utilization; Ethernet; Hadoop acceleration; Hadoop data shuffling; InfiniBand; JVM-Bypass shuffling; Java virtual machine; Java-based network transport stack; Java-based transport protocol; Mellanox; RDMA; TCP/IP; data movement technology; energy efficiency; high-performance networking mechanism; merging algorithm; plug-in library; remote direct memory access; software product UDA; unstructured data accelerator; Bandwidth; IP networks; Java; Libraries; Merging; Prefetching; Protocols; High Performance Interconnect; JVM; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
  • Conference_Location
    Boston, MA
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-6066-1
  • Type

    conf

  • DOI
    10.1109/IPDPS.2013.13
  • Filename
    6569843