• DocumentCode
    2181243
  • Title

    On Improving Fault Tolerance for Heterogeneous Hadoop MapReduce Clusters

  • Author

    Chi-Yi Lin ; Ting-Hau Chen ; Yi-No Cheng

  • Author_Institution
    Dept. Comput. Sci. & Inf. Eng., Tamkang Univ., Taipei, Taiwan
  • fYear
    2013
  • fDate
    16-19 Dec. 2013
  • Firstpage
    38
  • Lastpage
    43
  • Abstract
    The computing paradigm of MapReduce has gained extreme popularity in the area of large-scale data-intensive applications in recent years. Hadoop, an open-source implementation of MapReduce, can be set up easily and rapidly on commodity hardware to form a massive computing cluster. In such a cluster, task failures and node failures are not an anomaly, which will cause a substantial impact on Hadoop´s performance. Although Hadoop can restart failed tasks automatically and compensate for slow tasks by enabling speculative execution, many researchers have identified the shortcomings of Hadoop´s fault tolerance. In this research, we try to improve them by designing a simple check pointing mechanism for Map tasks, and using a revised criterion for identifying slow tasks. Specifically, our check pointing mechanism saves the partial output produced by the Mappers, and our criterion for identifying slow tasks considers tasks with variable progress rates. By preliminary simulations, although the results show only marginal performance improvement compared with native Hadoop and the LATE scheduler, we believe that our approaches have the potential to offer greater performance gain on real workloads.
  • Keywords
    checkpointing; fault tolerant computing; parallel processing; public domain software; software performance evaluation; Hadoop fault tolerance; Hadoop performance; LATE scheduler; Map tasks; checkpointing mechanism; commodity hardware; computing cluster; heterogeneous Hadoop MapReduce clusters; large-scale data-intensive applications; node failures; open-source MapReduce implementation; speculative execution; task failures; Abstracts; Checkpointing; Cloud computing; Data models; Dynamic scheduling; Google; MapReduce; checkpointing; heterogeneous environments; intermediate data; speculative execution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on
  • Conference_Location
    Fuzhou
  • Print_ISBN
    978-1-4799-2829-3
  • Type

    conf

  • DOI
    10.1109/CLOUDCOM-ASIA.2013.83
  • Filename
    6820971