DocumentCode :
3029062
Title :
A Two-Phase Log-Based Fault Recovery Mechanism in Master/Worker Based Computing Environment
Author :
Ting Chen ; Yongjian Wang ; Yuanqiang Huang ; Cheng Luo ; Depei Qian ; Zhongzhi Luan
Author_Institution :
Sino-German Joint Software Inst., Beihang Univ., Beijing, China
fYear :
2009
fDate :
10-12 Aug. 2009
Firstpage :
290
Lastpage :
297
Abstract :
The master/worker pattern is widely used to construct the cross-domain, large scale computing infrastructure. The applications supported by this kind of infrastructure usually features long-running, speculative execution etc. Fault recovery mechanism is significant to them especially in the wide area network environment, which consists of error prone components. Inter-node cooperation is urgent to make the recovery process more efficient. The traditional log-based rollback recovery mechanism which features independent recovery cannot fulfill the global cooperation requirement due to the waste of bandwidth and slow application data transfer which is caused by the exchange of a large amount of logs. In this paper, we propose a two-phase log-based recovery mechanism which is of merits such as space saving and global optimization and can be used as a complement of the current log-based rollback recovery approach in some specific situations. We have demonstrated the use of this mechanism in the Drug Discovery Grid environment, which is supported by China National Grid. Experiment results have proved efficiency of this mechanism.
Keywords :
fault tolerant computing; grid computing; medical computing; China National Grid; drug discovery grid environment; error prone components; fault recovery mechanism; global optimization; internode cooperation; large scale computing infrastructure; log-based rollback recovery mechanism; master-worker based computing environment; two-phase log-based fault recovery mechanism; Application software; Bandwidth; Checkpointing; Concurrent computing; Costs; Distributed computing; Distributed processing; Drugs; Large-scale systems; Wide area networks; Drug Discovery Grid; fault recovery; log-based rollback recovery; two-phase recovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing with Applications, 2009 IEEE International Symposium on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3747-4
Type :
conf
DOI :
10.1109/ISPA.2009.53
Filename :
5207921
Link To Document :
بازگشت