Title :
Area failures and reliable distributed applications
Author :
Nakechbandi, Moustafa ; Colin, Jean-yves
Author_Institution :
LITIS Lab., Le Havre Univ., Le Havre, France
Abstract :
Because fault failures tend to affect whole areas, in some cases, and not only individual computers, we propose a new, efficient scheduling algorithm for problems in which tasks with precedence constraints and communication delays have to be scheduled on a virtual heterogeneous distributed multi areas system subject to the possibility of one complete area failure. Based on an extension of the critical-path method CPM/PERT, our algorithm combines an optimal schedule when there is no failures, with some tasks duplication to provide fault-tolerance in the case of the failure of one area. Backup copies are not established for tasks that have already more than one original copy in different areas. The result is a schedule in polynomial time that is optimal when there is no area failure, and is a good reliable schedule in the case of any one area failure. We finally do some numerical experiments in which we use our algorithm on several semi-random DAGs and compare the optimal solutions with the reliable solutions found by this algorithm.
Keywords :
directed graphs; distributed processing; fault tolerant computing; scheduling; CPM; PERT; area failure; critical-path method; directed acyclic graph; fault failure; scheduling algorithm; semirandom DAG; virtual heterogeneous distributed multiareas system; Application software; Computer crashes; Computer hacking; Delay; Distributed computing; Fault tolerance; Optimal scheduling; Polynomials; Processor scheduling; Scheduling algorithm; DAG; area failure; catastrophic crash; fault tolerance; heterogeneous systems; reliable applications; scheduling with communication;
Conference_Titel :
Computer Engineering & Systems, 2009. ICCES 2009. International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-5842-4
Electronic_ISBN :
978-1-4244-5843-1
DOI :
10.1109/ICCES.2009.5383307