DocumentCode :
2529934
Title :
Fault tolerance protocols for parallel programs based on tasks replication
Author :
Aguilar, Jose ; Hernández, Marisela
Author_Institution :
Dept. de Comput., CEMISID, Merida, Venezuela
fYear :
2000
fDate :
2000
Firstpage :
397
Lastpage :
404
Abstract :
In this paper we propose a fault-tolerant mechanism for parallel programs based on task replication. We use a sequential discrete-event simulator of a distributed system subject to failures to compare a semi-active approach and a passive approach of the protocol. In our model, each time a task of a given parallel program is allocated, a copy of it is stored in a second processor, called the buddy processor. If the original processor fails, the copies of the tasks at the buddy processor will be processed, providing fault tolerance. Some performance measures, such as program execution times and processor utilization factors, are given for the different versions of the mechanism. The performance has been studied as a function of processor degradation, and program and system sizes
Keywords :
discrete event simulation; fault tolerant computing; parallel programming; performance evaluation; protocols; buddy processor; distributed system; fault tolerance protocols; fault-tolerant mechanism; parallel programs; performance measures; processor degradation; program execution times; sequential discrete-event simulator; task replication; tasks replication; Application software; Computational modeling; Computer network reliability; Computer networks; Costs; Fault tolerance; Hardware; Protocols; Redundancy; Software performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2000. Proceedings. 8th International Symposium on
Conference_Location :
San Francisco, CA
ISSN :
1526-7539
Print_ISBN :
0-7695-0728-X
Type :
conf
DOI :
10.1109/MASCOT.2000.876564
Filename :
876564
Link To Document :
بازگشت