• DocumentCode
    1916601
  • Title

    HOG: Distributed Hadoop MapReduce on the Grid

  • Author

    Chen He ; Weitzel, Derek ; Swanson, David ; Ying Lu

  • fYear
    2012
  • fDate
    10-16 Nov. 2012
  • Firstpage
    1276
  • Lastpage
    1283
  • Abstract
    MapReduce is a powerful data processing platform for commercial and academic applications. In this paper, we build a novel Hadoop MapReduce framework executed on the Open Science Grid which spans multiple institutions across the United States - Hadoop On the Grid (HOG). It is different from previous MapReduce platforms that run on dedicated environments like clusters or clouds. HOG provides a free, elastic, and dynamic MapReduce environment on the opportunistic resources of the grid. In HOG, we improve Hadoop´s fault tolerance for wide area data analysis by mapping data centers across the U.S. to virtual racks and creating multi-institution failure domains. Our modifications to the Hadoop framework are transparent to existing Hadoop MapReduce applications. In the evaluation, we successfully extend HOG to 1100 nodes on the grid. Additionally, we evaluate HOG with a simulated Facebook Hadoop MapReduce workload. We conclude that HOG´s rapid scalability can provide comparable performance to a dedicated Hadoop cluster.
  • Keywords
    fault tolerant computing; grid computing; parallel programming; Facebook; HOG; Hadoop fault tolerance; Hadoop-on-the-grid; United States; data analysis; distributed Hadoop MapReduce framework; Grid computing; MapReduce; Middleware;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4673-6218-4
  • Type

    conf

  • DOI
    10.1109/SC.Companion.2012.154
  • Filename
    6495936