DocumentCode :
1925429
Title :
PIC: Partitioned Iterative Convergence for Clusters
Author :
Farivar, Reza ; Raghunathan, Anand ; Chakradhar, Srimat ; Kharbanda, Harshit ; Campbell, Roy H.
Author_Institution :
Univ. of Illinois, Urbana, IL, USA
fYear :
2012
fDate :
24-28 Sept. 2012
Firstpage :
391
Lastpage :
401
Abstract :
Iterative-convergence algorithms are frequently used in a variety of domains to build models from large data sets. Cluster implementations of these algorithms are commonly realized using parallel programming models such as MapReduce. However, these implementations suffer from significant performance bottlenecks, especially due to large volumes of network traffic resulting from intermediate data and model updates during the iterations. To address these challenges, we propose partitioned iterative convergence (PIC), a new approach to programming and executing iterative convergence algorithms on frameworks like MapReduce. In PIC, we execute the iterative-convergence computation in two phases - the best-effort phase, which quickly produces a good initial model and the top-off phase, which further refines this model to produce the final solution. The best-effort phase iteratively performs the following steps: (a) partition the input data and the model to create several smaller, model-building sub-problems, (b) independently solve these sub-problems using iterative convergence computations, and (c) merge solutions of the sub-problems to create the next version of the model. This partitioned, loosely coupled execution of the computation produces a model of good quality, while drastically reducing network traffic due to intermediate data and model updates. The top-off phase further refines this model by employing the original iterative-convergence computation on the entire (un-partitioned) problem until convergence. However, the number of iterations executed in the top-off phase is quite small, resulting in a significant overall improvement in performance. We have implemented a library for PIC on top of the Hadoop MapReduce framework, and evaluated it using five popular iterative-convergence algorithms (Page Rank, K-Means clustering, neural network training, linear equation solver and image smoothing). Our evaluations on clusters ranging from 6 nodes to 256 nodes demonstra- e a 2.5X-4X speedup compared to conventional implementations using Hadoop.
Keywords :
convergence; data handling; data models; iterative methods; neural nets; parallel programming; pattern clustering; smoothing methods; Hadoop MapReduce framework; K-Means clustering; PIC library; Page Rank; best-effort phase; data model; image smoothing; input data partitioning; iterative-convergence computation; linear equation solver; network traffic reduction; neural network training; parallel programming model; partitioned iterative convergence algorithm; performance bottleneck; solution merging; top-off phase; Clustering algorithms; Computational modeling; Convergence; Data models; Integrated circuit modeling; Partitioning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2422-9
Type :
conf
DOI :
10.1109/CLUSTER.2012.84
Filename :
6337802
Link To Document :
بازگشت