DocumentCode :
1791701
Title :
ReCT: Improving MapReduce performance under failures with resilient checkpointing tactics
Author :
Hao Wang ; Haopeng Chen ; Fei Hu
Author_Institution :
Sch. of Software, Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
27
Lastpage :
32
Abstract :
MapReduce is a programming paradigm that makes it simple and efficient to process vast amount of data. It targets at very big clusters, where failures are no longer exceptions. Fault tolerance is vital to MapReduce, however, fault tolerance and recovery strategies in MapReduce perform poorly under failures. Currently fault tolerance is implemented at the task level, a task failure will lead to a re-execution of the whole task. In this work, we present ReCT, a family of resilient checkpointing tactics(ReCT) to intensively improve MapReduce performance under map task failures. ReCT introduces slight changes to current MapReduce execution flow and makes it possible to create checkpoints beneath the task level. In case of task failures, ReCT tries to make the most of finished partial tasks and skip them in retry attempts. The checkpointing tactics bring little overhead and intensively accelerate fault recovery process. We also observe that under some circumstances, the new execution flow in ReCT involves much less IO operations than that in Hadoop. ReCT outperforms Hadoop by 6.6% on average under no failures and 4.6% to 51.0% under different failure densities.
Keywords :
checkpointing; data handling; fault tolerance; MapReduce execution flow; ReCT; fault recovery; fault tolerance; map task failure; resilient checkpointing tactics; Checkpointing; Data structures; Delays; Educational institutions; Fault tolerance; Fault tolerant systems; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004380
Filename :
7004380
Link To Document :
بازگشت