مرکز منطقه ای اطلاع رساني علوم و فناوري - Adaptive Failure Detection via Heartbeat under Hadoop

DocumentCode :

2984038

Title :

Adaptive Failure Detection via Heartbeat under Hadoop

Author :

Zhu, Hao ; Chen, Haopeng

Author_Institution :

Sch. of Software, Shanghai Jiao Tong Univ., Shanghai, China

fYear :

2011

fDate :

12-15 Dec. 2011

Firstpage :

231

Lastpage :

238

Abstract :

Hadoop has become one popular framework to process massive data sets in a large scale cluster. However, it is observed that the detection of the failed worker is delayed, which may result in a significant increase in the completion time of jobs with different workload. To cope with it, we present two mechanisms: Adaptive interval and Reputation-based Detector that support Hadoop to detect the failed worker in the shortest time. The Adaptive interval is trying to dynamically configure the expiration time which is adaptive to the job size. The Reputation-based Detector is trying to evaluate the reputation of each worker. Once the reputation of a worker is lower than a threshold, then the worker will be considered as a failed worker. In our experiments, we demonstrate that both of these strategies have achieved great improvement in the detection of the failed worker. Specifically, the Adaptive interval has a relatively better performance with small jobs, while the Reputation-based Detector is more suitable for large jobs.

Keywords :

distributed programming; software fault tolerance; Hadoop; adaptive failure detection; adaptive interval; failed worker; job size; large scale cluster; massive data set; reputation-based detector; Detectors; Educational institutions; Fault tolerance; Fault tolerant systems; Heart beat; Heart rate variability; Runtime; Cloud computing; Hadoop; MapReduce; adaptive heartbeat; failure detection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Services Computing Conference (APSCC), 2011 IEEE Asia-Pacific

Conference_Location :

Jeju Island

Print_ISBN :

978-1-4673-0206-7

Type :

conf

DOI :

10.1109/APSCC.2011.46

Filename :

6127967

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2984038