• DocumentCode
    2840546
  • Title

    ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

  • Author

    Jin, Hui ; Yang, Xi ; Sun, Xian-He ; Raicu, Ioan

  • Author_Institution
    Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
  • fYear
    2012
  • fDate
    18-21 June 2012
  • Firstpage
    516
  • Lastpage
    525
  • Abstract
    The MapReduce programming paradigm is gaining more and more popularity recently due to its merits of ease of programming, data distribution and fault tolerance. The low barrier of adoption of MapReduce makes it a promising framework for non-dedicated distributed computing environments. However, the variability of hosts resources and availability could substantially degrade the performance of MapReduce applications. The replication-based fault tolerance mechanism helps to alleviate some problems at the cost of inefficient storage space utilization. Intelligent solutions that guarantee the performance of MapReduce applications with low data replication degree are needed to promote the idea of running MapReduce applications in non-dedicated environment at lower costs. In this research, we propose an Availability-aware Data Placement (ADAPT) strategy to improve the application performance without extra storage cost. The basic idea of ADAPT is to dispatch data based on the availability of each node, reduce network traffic, improve data locality, and optimize the application performance. We implement the prototype of ADAPT within the Hadoop framework, an open-source implementation of MapReduce. The performance of ADAPT is evaluated in an emulated non-dedicated distributed environment. The experimental results show that ADAPT can improve the performance by more than 30%. ADAPT achieves high reliability without the need for additional data replication. ADAPT has also been evaluated for large-scale computing environment through simulations, with promising results.
  • Keywords
    fault tolerant computing; parallel programming; ADAPT strategy; Hadoop framework; MapReduce adoption; MapReduce programming paradigm; availability-aware MapReduce data placement; data distribution; data locality; data replication degree; fault tolerance; large-scale computing environment; network traffic reduction; node availability; nondedicated distributed computing; open-source implementation; replication-based fault tolerance mechanism; resource availability; resource variability; storage cost; storage space utilization; Adaptation models; Availability; Computational modeling; Data models; Distributed databases; Interrupters; MapReduce; Performance; Reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on
  • Conference_Location
    Macau
  • ISSN
    1063-6927
  • Print_ISBN
    978-1-4577-0295-2
  • Type

    conf

  • DOI
    10.1109/ICDCS.2012.48
  • Filename
    6258024