On Improving Fault Tolerance for Heterogeneous Hadoop MapReduce Clusters

Author

Chi-Yi Lin ; Ting-Hau Chen ; Yi-No Cheng

Author_Institution

Dept. Comput. Sci. & Inf. Eng., Tamkang Univ., Taipei, Taiwan

fYear

2013

fDate

16-19 Dec. 2013

Firstpage

38

Lastpage

43

Abstract

The computing paradigm of MapReduce has gained extreme popularity in the area of large-scale data-intensive applications in recent years. Hadoop, an open-source implementation of MapReduce, can be set up easily and rapidly on commodity hardware to form a massive computing cluster. In such a cluster, task failures and node failures are not an anomaly, which will cause a substantial impact on Hadoop´s performance. Although Hadoop can restart failed tasks automatically and compensate for slow tasks by enabling speculative execution, many researchers have identified the shortcomings of Hadoop´s fault tolerance. In this research, we try to improve them by designing a simple check pointing mechanism for Map tasks, and using a revised criterion for identifying slow tasks. Specifically, our check pointing mechanism saves the partial output produced by the Mappers, and our criterion for identifying slow tasks considers tasks with variable progress rates. By preliminary simulations, although the results show only marginal performance improvement compared with native Hadoop and the LATE scheduler, we believe that our approaches have the potential to offer greater performance gain on real workloads.

Keywords

checkpointing; fault tolerant computing; parallel processing; public domain software; software performance evaluation; Hadoop fault tolerance; Hadoop performance; LATE scheduler; Map tasks; checkpointing mechanism; commodity hardware; computing cluster; heterogeneous Hadoop MapReduce clusters; large-scale data-intensive applications; node failures; open-source MapReduce implementation; speculative execution; task failures; Abstracts; Checkpointing; Cloud computing; Data models; Dynamic scheduling; Google; MapReduce; checkpointing; heterogeneous environments; intermediate data; speculative execution;

fLanguage

English

Publisher

ieee

Conference_Titel

Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on

Conference_Location

Fuzhou

Print_ISBN

978-1-4799-2829-3

Type

conf

DOI

10.1109/CLOUDCOM-ASIA.2013.83

Filename

6820971