• DocumentCode
    2548967
  • Title

    Asynchronous Algorithms in MapReduce

  • Author

    Kambatla, Karthik ; Rapolu, Naresh ; Jagannathan, Suresh ; Grama, Ananth

  • fYear
    2010
  • fDate
    20-24 Sept. 2010
  • Firstpage
    245
  • Lastpage
    254
  • Abstract
    Asynchronous algorithms have been demonstrated to improve scalability of a variety of applications in parallel environments. Their distributed adaptations have received relatively less attention, particularly in the context of conventional execution environments and associated overheads. One such framework, MapReduce, has emerged as a commonly used programming framework for large-scale distributed environments. While the MapReduce programming model has proved to be effective for data-parallel applications, significant questions relating to its performance and application scope remain unresolved. The strict synchronization between map and reduce phases limits expression of asynchrony and hence, does not readily support asynchronous algorithms. This paper investigates the notion of partial synchronizations in iterative MapReduce applications to overcome global synchronization overheads. The proposed approach applies a locality-enhancing partition on the computation. Map tasks execute local computations with (relatively) frequent local synchronizations, with less frequent global synchronizations. This approach yields significant performance gains in distributed environments, even though their serial operation counts are higher. We demonstrate these performance gains on asynchronous algorithms for diverse applications, including pagerank, shortestpath, and kmeans. We make the following specific contributions in the paper(i) we motivate the need to extend MapReduce with constructs for asynchrony, (ii) we propose an API to facilitate partial synchronizations combined with eager scheduling and locality enhancing techniques, and (iii) demonstrate performance improvements from our proposed extensions through a variety of applications from different domains.
  • Keywords
    application program interfaces; iterative methods; parallel algorithms; parallel programming; synchronisation; API; MapReduce programming model; application program interface; asynchronous algorithm; data parallel application; global synchronization; large scale distributed environment; locality enhancing partition; parallel environment; partial synchronization; Clustering algorithms; Convergence; Optimization; Partitioning algorithms; Programming; Scalability; Synchronization; Asynchronous Algorithms; Distributed Computing; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2010 IEEE International Conference on
  • Conference_Location
    Heraklion, Crete
  • Print_ISBN
    978-1-4244-8373-0
  • Electronic_ISBN
    978-0-7695-4220-1
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2010.30
  • Filename
    5600303