DocumentCode
2548967
Title
Asynchronous Algorithms in MapReduce
Author
Kambatla, Karthik ; Rapolu, Naresh ; Jagannathan, Suresh ; Grama, Ananth
fYear
2010
fDate
20-24 Sept. 2010
Firstpage
245
Lastpage
254
Abstract
Asynchronous algorithms have been demonstrated to improve scalability of a variety of applications in parallel environments. Their distributed adaptations have received relatively less attention, particularly in the context of conventional execution environments and associated overheads. One such framework, MapReduce, has emerged as a commonly used programming framework for large-scale distributed environments. While the MapReduce programming model has proved to be effective for data-parallel applications, significant questions relating to its performance and application scope remain unresolved. The strict synchronization between map and reduce phases limits expression of asynchrony and hence, does not readily support asynchronous algorithms. This paper investigates the notion of partial synchronizations in iterative MapReduce applications to overcome global synchronization overheads. The proposed approach applies a locality-enhancing partition on the computation. Map tasks execute local computations with (relatively) frequent local synchronizations, with less frequent global synchronizations. This approach yields significant performance gains in distributed environments, even though their serial operation counts are higher. We demonstrate these performance gains on asynchronous algorithms for diverse applications, including pagerank, shortestpath, and kmeans. We make the following specific contributions in the paper(i) we motivate the need to extend MapReduce with constructs for asynchrony, (ii) we propose an API to facilitate partial synchronizations combined with eager scheduling and locality enhancing techniques, and (iii) demonstrate performance improvements from our proposed extensions through a variety of applications from different domains.
Keywords
application program interfaces; iterative methods; parallel algorithms; parallel programming; synchronisation; API; MapReduce programming model; application program interface; asynchronous algorithm; data parallel application; global synchronization; large scale distributed environment; locality enhancing partition; parallel environment; partial synchronization; Clustering algorithms; Convergence; Optimization; Partitioning algorithms; Programming; Scalability; Synchronization; Asynchronous Algorithms; Distributed Computing; MapReduce;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing (CLUSTER), 2010 IEEE International Conference on
Conference_Location
Heraklion, Crete
Print_ISBN
978-1-4244-8373-0
Electronic_ISBN
978-0-7695-4220-1
Type
conf
DOI
10.1109/CLUSTER.2010.30
Filename
5600303
Link To Document