DocumentCode :
1336191
Title :
The PSTR/SNS scheme for real-time fault tolerance via active object replication and network surveillance
Author :
Kim, K.H. ; Subbaraman, Chittur
Author_Institution :
California Univ., Irvine, CA, USA
Volume :
12
Issue :
2
fYear :
2000
Firstpage :
145
Lastpage :
159
Abstract :
The TMO (Time-triggered Message-triggered Object) scheme was formulated as a major extension of the conventional object structuring schemes with the idealistic goal of facilitating general-form design and timeliness-guaranteed design of complex real-time application systems. Recently, as a new scheme for realizing TMO-structured distributed and parallel computer systems that are capable of both hardware and software fault tolerance, we have formulated and demonstrated the PSTR (Primary-Shadow TMO Replication) scheme. An important new extension of the PSTR scheme discussed in this paper is an integration of the PSTR scheme and a network surveillance (NS) scheme. This extension results in a significant improvement in the fault coverage and recovery time bound achieved. The NS scheme adopted is a recently-developed scheme that is effective in a wide range of point-to-point networks, and it is called the SNS (Supervisor-based Network Surveillance) scheme. The integration of the PSTR scheme and the SNS scheme is called the PSTR/SNS scheme. The recovery time bound of the PSTR/SNS scheme is analyzed on the basis of an implementation model that can be easily adapted to various commercial operating system kernels
Keywords :
computer network management; computer network reliability; distributed object management; fault tolerant computing; network operating systems; object-oriented databases; operating system kernels; real-time systems; replicated databases; surveillance; system recovery; PSTR/SNS scheme; TMO-structured distributed computer systems; TMO-structured parallel computer systems; active object replication; commercial operating system kernels; complex real-time application systems; fault coverage; general-form design; implementation model; object structuring schemes; point-to-point networks; primary-shadow TMO replication scheme; real-time fault tolerance; recovery time bound; supervisor-based network surveillance scheme; time-triggered message-triggered object scheme; timeliness-guaranteed design; Application software; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; Hardware; Kernel; Operating systems; Real time systems; Surveillance;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/69.842258
Filename :
842258
Link To Document :
بازگشت