• DocumentCode
    560176
  • Title

    Hadoop acceleration through network levitated merge

  • Author

    Wang, Yandong ; Que, Xinyu ; Yu, Weikuan ; Goldenberg, Dror ; Sehgal, Dhiraj

  • fYear
    2011
  • fDate
    12-18 Nov. 2011
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Hadoop is a popular open-source implementation of the MapReduce programming model for cloud computing. However, it faces a number of issues to achieve the best performance from the underlying system. These include a serialization barrier that delays the reduce phase, repetitive merges and disk access, and lack of capability to leverage latest high speed interconnects. We describe Hadoop-A, an acceleration framework that optimizes Hadoop with plugin components implemented in C++ for fast data movement, overcoming its existing limitations. A novel network-levitated merge algorithm is introduced to merge data without repetition and disk access. In addition, a full pipeline is designed to overlap the shuffle, merge and reduce phases. Our experimental results show that Hadoop-A doubles the data processing throughput of Hadoop, and reduces CPU utilization by more than 36%.
  • Keywords
    C++ language; cloud computing; public domain software; C++; Hadoop; Hadoop-A; MapReduce programming model; cloud computing; network-levitated merge algorithm; open-source software; plugin component; serialization barrier; Acceleration; Algorithm design and analysis; Data processing; Merging; Pipelines; Protocols; Servers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
  • Conference_Location
    Seatle, WA
  • Electronic_ISBN
    978-1-4503-0771-0
  • Type

    conf

  • Filename
    6114442