• DocumentCode
    3444153
  • Title

    Replicating statement execution for fault detection on distributed memory multiprocessors

  • Author

    Gong, Chun ; Melhem, Rami ; Gupta, Rajiv

  • Author_Institution
    Dept. of Comput. Sci., Pittsburgh Univ., PA, USA
  • fYear
    1994
  • fDate
    12-14 Jun 1994
  • Firstpage
    132
  • Lastpage
    141
  • Abstract
    A compiler-assisted methodology is proposed for fault detection on distributed-memory systems. Selected instances of program statements are replicated in a way that ensures appropriate coverage. Replication strategies for the detection of permanent and transient faults are presented. These strategies use idle processor times for replicating statement execution whenever possible. Two approaches are also discussed for implementing the proposed strategies on single-program multiple-data parallel execution platforms. The first approach replicates program statements through source-to-source program transformations while the second approach achieves the replication of program statements indirectly by replicating data on multiple processors
  • Keywords
    distributed memory systems; fault tolerant computing; compiler-assisted methodology; distributed memory multiprocessors; distributed-memory systems; fault detection; idle processor times; permanent faults; program statements; source-to-source program transformations; statement execution replication; transient faults; Computer science; Costs; Fault detection; Hardware; Multiprocessing systems; Processor scheduling; Program processors; Random access memory; Redundancy; VLIW;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Parallel and Distributed Systems, 1994., Proceedings of IEEE Workshop on
  • Conference_Location
    College Station, TX
  • Print_ISBN
    0-8186-6807-5
  • Type

    conf

  • DOI
    10.1109/FTPDS.1994.494484
  • Filename
    494484