DocumentCode
3444153
Title
Replicating statement execution for fault detection on distributed memory multiprocessors
Author
Gong, Chun ; Melhem, Rami ; Gupta, Rajiv
Author_Institution
Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
fYear
1994
fDate
12-14 Jun 1994
Firstpage
132
Lastpage
141
Abstract
A compiler-assisted methodology is proposed for fault detection on distributed-memory systems. Selected instances of program statements are replicated in a way that ensures appropriate coverage. Replication strategies for the detection of permanent and transient faults are presented. These strategies use idle processor times for replicating statement execution whenever possible. Two approaches are also discussed for implementing the proposed strategies on single-program multiple-data parallel execution platforms. The first approach replicates program statements through source-to-source program transformations while the second approach achieves the replication of program statements indirectly by replicating data on multiple processors
Keywords
distributed memory systems; fault tolerant computing; compiler-assisted methodology; distributed memory multiprocessors; distributed-memory systems; fault detection; idle processor times; permanent faults; program statements; source-to-source program transformations; statement execution replication; transient faults; Computer science; Costs; Fault detection; Hardware; Multiprocessing systems; Processor scheduling; Program processors; Random access memory; Redundancy; VLIW;
fLanguage
English
Publisher
ieee
Conference_Titel
Fault-Tolerant Parallel and Distributed Systems, 1994., Proceedings of IEEE Workshop on
Conference_Location
College Station, TX
Print_ISBN
0-8186-6807-5
Type
conf
DOI
10.1109/FTPDS.1994.494484
Filename
494484
Link To Document