Title :
Fault-tolerant scheduling algorithm for precedence constrained tasks in grid computing systems with communication efficiency
Author :
Ling, Yun ; Luo, Zhenshan ; Ge, Yujia
Author_Institution :
Coll. of Comput. Sci. & Inf. Eng., Zhejiang Gongshang Univ., Hangzhou, China
Abstract :
Fault tolerance, communication efficiency and reliability are import requirements in grid computing, which often includes geographically distributed nodes co-operating in executing tasks. The study on fault-tolerance plays a key role in grid computing. In this paper, we address the problem of how to schedule DAGs in Grid with communication efficiency so that service failures can be avoided in the presence of processors faults. The challenge is that as tasks in a DAG have dependence on each other, a task must be scheduled to make sure that maximum communication efficiency and high reliability can be guaranteed due to a processor failure. We first propose our system models including task model, fault model and communication model. Then we determine the time that the primary and backup of a task can start to execute and their eligible processors to guarantee that every DAG can complete as a result of processor failure. We develop the optimal algorithm to schedule the primary and backup of every task which targets maximizing communication efficiency and guaranteeing high reliability. Finally we conduct extensive simulation experiments to quantify the performance of the proposed algorithm.
Keywords :
Computer errors; Delay; Educational institutions; Fault tolerance; Fault tolerant systems; Grid computing; Large-scale systems; Optimal scheduling; Processor scheduling; Scheduling algorithm; cloud computing; communication efficiency; fault tolerant; grid computing; reliability;
Conference_Titel :
Information Sciences and Interaction Sciences (ICIS), 2010 3rd International Conference on
Conference_Location :
Chengdu, China
Print_ISBN :
978-1-4244-7384-7
Electronic_ISBN :
978-1-4244-7386-1
DOI :
10.1109/ICICIS.2010.5534743