مرکز منطقه ای اطلاع رساني علوم و فناوري - PIC: Partitioned Iterative Convergence for Clusters

DocumentCode :

1925429

Title :

PIC: Partitioned Iterative Convergence for Clusters

Author :

Farivar, Reza ; Raghunathan, Anand ; Chakradhar, Srimat ; Kharbanda, Harshit ; Campbell, Roy H.

Author_Institution :

Univ. of Illinois, Urbana, IL, USA

fYear :

2012

fDate :

24-28 Sept. 2012

Firstpage :

391

Lastpage :

401

Abstract :

Iterative-convergence algorithms are frequently used in a variety of domains to build models from large data sets. Cluster implementations of these algorithms are commonly realized using parallel programming models such as MapReduce. However, these implementations suffer from significant performance bottlenecks, especially due to large volumes of network traffic resulting from intermediate data and model updates during the iterations. To address these challenges, we propose partitioned iterative convergence (PIC), a new approach to programming and executing iterative convergence algorithms on frameworks like MapReduce. In PIC, we execute the iterative-convergence computation in two phases - the best-effort phase, which quickly produces a good initial model and the top-off phase, which further refines this model to produce the final solution. The best-effort phase iteratively performs the following steps: (a) partition the input data and the model to create several smaller, model-building sub-problems, (b) independently solve these sub-problems using iterative convergence computations, and (c) merge solutions of the sub-problems to create the next version of the model. This partitioned, loosely coupled execution of the computation produces a model of good quality, while drastically reducing network traffic due to intermediate data and model updates. The top-off phase further refines this model by employing the original iterative-convergence computation on the entire (un-partitioned) problem until convergence. However, the number of iterations executed in the top-off phase is quite small, resulting in a significant overall improvement in performance. We have implemented a library for PIC on top of the Hadoop MapReduce framework, and evaluated it using five popular iterative-convergence algorithms (Page Rank, K-Means clustering, neural network training, linear equation solver and image smoothing). Our evaluations on clusters ranging from 6 nodes to 256 nodes demonstra- e a 2.5X-4X speedup compared to conventional implementations using Hadoop.

Keywords :

convergence; data handling; data models; iterative methods; neural nets; parallel programming; pattern clustering; smoothing methods; Hadoop MapReduce framework; K-Means clustering; PIC library; Page Rank; best-effort phase; data model; image smoothing; input data partitioning; iterative-convergence computation; linear equation solver; network traffic reduction; neural network training; parallel programming model; partitioned iterative convergence algorithm; performance bottleneck; solution merging; top-off phase; Clustering algorithms; Computational modeling; Convergence; Data models; Integrated circuit modeling; Partitioning algorithms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster Computing (CLUSTER), 2012 IEEE International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4673-2422-9

Type :

conf

DOI :

10.1109/CLUSTER.2012.84

Filename :

6337802

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1925429