DocumentCode
1962106
Title
Reaching efficient fault-tolerance for cooperative applications
Author
Sobe, Peter
Author_Institution
Inst. of Comput. Eng., Med. Univ. of Lubeck, Germany
fYear
2000
fDate
2000
Firstpage
48
Lastpage
57
Abstract
Cooperative applications are widely used, e.g. as parallel calculations or distributed information processing systems. Whereby such applications meet the users demand and offer a performance improvement, the susceptibility to faults of any used computer node is raised. Often a single fault may cause a complete application failure. On the other hand, the redundancy in distributed systems can be utilized for fast fault detection and recovery. So, we followed an approach that is based an duplication of each application process to detect crashes and faulty functions of single computer nodes. We concentrate on two aspects of efficient fault-tolerance-fast fault detection and recovery without delaying the application progress significantly. The contribution of this work is first a new fault detecting protocol for duplicated processes. Secondly, we enhance a roll forward recovery scheme so that it is applicable to a set of cooperative processes in conformity to the protocol
Keywords
fault tolerant computing; performance evaluation; protocols; application failure; cooperative applications; distributed information processing systems; fast fault detection; fault detection; fault-tolerance; parallel calculations; performance improvement; protocol; roll forward recovery scheme; Access protocols; Application software; Biomedical engineering; Fault detection; Fault tolerance; Fault tolerant systems; Memory architecture; Message passing; Real time systems; Routing protocols;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Performance and Dependability Symposium, 2000. IPDS 2000. Proceedings. IEEE International
Conference_Location
Chicago, IL
ISSN
1087-2191
Print_ISBN
0-7695-0553-8
Type
conf
DOI
10.1109/IPDS.2000.839463
Filename
839463
Link To Document