• DocumentCode
    1346476
  • Title

    DEPEND: a simulation-based environment for system level dependability analysis

  • Author

    Goswami, K.K.

  • Author_Institution
    OnLive! Technol., Cupertino, CA
  • Volume
    46
  • Issue
    1
  • fYear
    1997
  • fDate
    1/1/1997 12:00:00 AM
  • Firstpage
    60
  • Lastpage
    74
  • Abstract
    The paper presents the rationale for a functional simulation tool, called DEPEND, which provides an integrated design and fault injection environment for system level dependability analysis. The paper discusses the issues and problems of developing such a tool, and describes how DEPEND tackles them. Techniques developed to simulate realistic fault scenarios, reduce simulation time explosion, and handle the large fault model and component domain associated with system level analysis are presented. Examples are used to motivate and illustrate the benefits of this tool. To further illustrate its capabilities, DEPEND is used to simulate the Unix-based Tandem triple-modular-redundancy (TMR) based prototype fault-tolerant system and to evaluate how well it handles near-coincident errors caused by correlated and latent faults. Issues such as memory scrubbing, re-integration policies, and workload dependent repair times, which affect how the system handles near-coincident errors, are also evaluated. Unlike any other simulation-based dependability studies, the accuracy of the simulation model is validated by comparing the results of the simulations with measurements obtained from fault injection experiments conducted on a production Tandem machine
  • Keywords
    digital simulation; fault tolerant computing; object-oriented programming; systems analysis; DEPEND; Unix-based Tandem triple-modular-redundancy; fault-tolerant system; functional simulation tool; integrated design and fault injection environment; latent faults; production Tandem machine; simulation model; simulation-based environment; system level analysis; system level dependability analysis; Analytical models; Error analysis; Explosions; Fault tolerant systems; Hardware; Helium; Object oriented modeling; Production; Prototypes; Virtual prototyping;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.559803
  • Filename
    559803