DocumentCode
1916601
Title
HOG: Distributed Hadoop MapReduce on the Grid
Author
Chen He ; Weitzel, Derek ; Swanson, David ; Ying Lu
fYear
2012
fDate
10-16 Nov. 2012
Firstpage
1276
Lastpage
1283
Abstract
MapReduce is a powerful data processing platform for commercial and academic applications. In this paper, we build a novel Hadoop MapReduce framework executed on the Open Science Grid which spans multiple institutions across the United States - Hadoop On the Grid (HOG). It is different from previous MapReduce platforms that run on dedicated environments like clusters or clouds. HOG provides a free, elastic, and dynamic MapReduce environment on the opportunistic resources of the grid. In HOG, we improve Hadoop´s fault tolerance for wide area data analysis by mapping data centers across the U.S. to virtual racks and creating multi-institution failure domains. Our modifications to the Hadoop framework are transparent to existing Hadoop MapReduce applications. In the evaluation, we successfully extend HOG to 1100 nodes on the grid. Additionally, we evaluate HOG with a simulated Facebook Hadoop MapReduce workload. We conclude that HOG´s rapid scalability can provide comparable performance to a dedicated Hadoop cluster.
Keywords
fault tolerant computing; grid computing; parallel programming; Facebook; HOG; Hadoop fault tolerance; Hadoop-on-the-grid; United States; data analysis; distributed Hadoop MapReduce framework; Grid computing; MapReduce; Middleware;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location
Salt Lake City, UT
Print_ISBN
978-1-4673-6218-4
Type
conf
DOI
10.1109/SC.Companion.2012.154
Filename
6495936
Link To Document