Title :
Implementation of a customizable fault tolerance framework
Author :
Yen, I. Ling ; Ahmed, Iftikhar ; Jagannath, Ramanujam ; Kundu, Sreeparna
Author_Institution :
Texas Univ., Dallas, TX, USA
Abstract :
While there has been significant advances in fault tolerance research, the effort has focused on the design of individual fault-tolerant systems or methodologies. Recently, some research has been initiated to develop fault tolerance paradigms that can be used to provide a spectrum of fault tolerance levels. In this paper, we present the design of a fault tolerance framework that can be used to support a wide spectrum of applications with various fault tolerance requirements, various criticality levels, and various system models. The framework is designed to be parameterizable so that the user can configure it to obtain the desired features. Also, the framework is designed to be an off-the-shelf component such that application programs can be integrated within it easily to obtain the fault-tolerant version of the application system. A specialized N-modular redundancy (SNMR) scheme has been developed to serve as the primary approach for achieving efficient and cost-effective fault tolerance for the framework. In most cases, the SNMR scheme yields better performance and lower cost in providing fault tolerance as compared with conventional NMR schemes. It also enhances the scalability and customizability of the general replication method. This paper discusses the major concept of the SNMR framework and the main issues in the design and implementation of the framework, including an object-oriented overall system design and the interface protocol class hierarchy. The interface protocol class hierarchy provides a nice paradigm for the implementation of customizable, highly reusable, and easily extensible SNMR framework
Keywords :
data encapsulation; distributed processing; object-oriented programming; redundancy; software fault tolerance; cost-effective fault tolerance; customizable fault tolerance framework; fault tolerance paradigms; interface protocol class hierarchy; object-oriented overall system design; off-the-shelf component; replication method; specialized N-modular redundancy scheme; Computer science; Costs; Drives; Fault tolerance; Fault tolerant systems; Nuclear magnetic resonance; Object oriented modeling; Protocols; Redundancy; Scalability;
Conference_Titel :
Object-Oriented Real-time Distributed Computing, 1998. (ISORC 98) Proceedings. 1998 First International Symposium on
Conference_Location :
Kyoto
Print_ISBN :
0-8186-8430-5
DOI :
10.1109/ISORC.1998.666793