• DocumentCode
    1336191
  • Title

    The PSTR/SNS scheme for real-time fault tolerance via active object replication and network surveillance

  • Author

    Kim, K.H. ; Subbaraman, Chittur

  • Author_Institution
    California Univ., Irvine, CA, USA
  • Volume
    12
  • Issue
    2
  • fYear
    2000
  • Firstpage
    145
  • Lastpage
    159
  • Abstract
    The TMO (Time-triggered Message-triggered Object) scheme was formulated as a major extension of the conventional object structuring schemes with the idealistic goal of facilitating general-form design and timeliness-guaranteed design of complex real-time application systems. Recently, as a new scheme for realizing TMO-structured distributed and parallel computer systems that are capable of both hardware and software fault tolerance, we have formulated and demonstrated the PSTR (Primary-Shadow TMO Replication) scheme. An important new extension of the PSTR scheme discussed in this paper is an integration of the PSTR scheme and a network surveillance (NS) scheme. This extension results in a significant improvement in the fault coverage and recovery time bound achieved. The NS scheme adopted is a recently-developed scheme that is effective in a wide range of point-to-point networks, and it is called the SNS (Supervisor-based Network Surveillance) scheme. The integration of the PSTR scheme and the SNS scheme is called the PSTR/SNS scheme. The recovery time bound of the PSTR/SNS scheme is analyzed on the basis of an implementation model that can be easily adapted to various commercial operating system kernels
  • Keywords
    computer network management; computer network reliability; distributed object management; fault tolerant computing; network operating systems; object-oriented databases; operating system kernels; real-time systems; replicated databases; surveillance; system recovery; PSTR/SNS scheme; TMO-structured distributed computer systems; TMO-structured parallel computer systems; active object replication; commercial operating system kernels; complex real-time application systems; fault coverage; general-form design; implementation model; object structuring schemes; point-to-point networks; primary-shadow TMO replication scheme; real-time fault tolerance; recovery time bound; supervisor-based network surveillance scheme; time-triggered message-triggered object scheme; timeliness-guaranteed design; Application software; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; Hardware; Kernel; Operating systems; Real time systems; Surveillance;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.842258
  • Filename
    842258