Title :
iMapReduce: A Distributed Computing Framework for Iterative Computation
Author :
Zhang, Yanfeng ; Gao, Qinxin ; Gao, Lixin ; Wang, Cuirong
Author_Institution :
Northeastern Univ., Shenyang, China
Abstract :
Relational data are pervasive in many applications such as data mining or social network analysis. These relational data are typically massive containing at least millions or hundreds of millions of relations. This poses demand for the design of distributed computing frameworks for processing these data on a large cluster. MapReduce is an example of such a framework. However, many relational data based applications typically require parsing the relational data iteratively and need to operate on these data through many iterations. MapReduce lacks built-in support for the iterative process. This paper presents iMapReduce, a framework that supports iterative processing. iMapReduce allows users to specify the iterative operations with map and reduce functions, while supporting the iterative processing automatically without the need of users´ involvement. More importantly, iMapReduce significantly improves the performance of iterative algorithms by (1) reducing the overhead of creating a new task in every iteration, (2) eliminating the shuffling of the static data in the shuffle stage of MapReduce, and (3) allowing asynchronous execution of each iteration, i.e., an iteration can start before all tasks of a previous iteration have finished. We implement iMapReduce based on Apache Hadoop, and show that iMapReduce can achieve a factor of 1.2 to 5 speedup over those implemented on MapReduce for well-known iterative algorithms.
Keywords :
data analysis; data mining; formal specification; grammars; iterative methods; parallel programming; Apache Hadoop; asynchronous execution; data mining; data processing; distributed computing; iMapReduce; iterative algorithm; iterative computation; iterative operation specification; iterative processing; iterative relational data parsing; overhead reduction; parallel programming; reduce function; social network analysis; static data;
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2011.260